Commit Graph

419 Commits

Author SHA1 Message Date
Yonghong Song b488c8d328 dwarf_loader: Support btf_type_tag attribute
LLVM patches ([1] for clang, [2] and [3] for BPF backend)
added support for btf_type_tag attributes. The following is
an example:

  [$ ~] cat t.c
  #define __tag1 __attribute__((btf_type_tag("tag1")))
  #define __tag2 __attribute__((btf_type_tag("tag2")))
  int __tag1 * __tag1 __tag2 *g __attribute__((section(".data..percpu")));
  [$ ~] clang -O2 -g -c t.c
  [$ ~] llvm-dwarfdump --debug-info t.o
  t.o:    file format elf64-x86-64
  ...
  0x0000001e:   DW_TAG_variable
                  DW_AT_name      ("g")
                  DW_AT_type      (0x00000033 "int **")
                  DW_AT_external  (true)
                  DW_AT_decl_file ("/home/yhs/t.c")
                  DW_AT_decl_line (3)
                  DW_AT_location  (DW_OP_addr 0x0)
  0x00000033:   DW_TAG_pointer_type
                  DW_AT_type      (0x0000004b "int *")
  0x00000038:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_type_tag")
                    DW_AT_const_value     ("tag1")
  0x00000041:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_type_tag")
                    DW_AT_const_value     ("tag2")
  0x0000004a:     NULL
  0x0000004b:   DW_TAG_pointer_type
                  DW_AT_type      (0x0000005a "int")
  0x00000050:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_type_tag")
                    DW_AT_const_value     ("tag1")
  0x00000059:     NULL
  0x0000005a:   DW_TAG_base_type
                  DW_AT_name      ("int")
                  DW_AT_encoding  (DW_ATE_signed)
                  DW_AT_byte_size (0x04)
  0x00000061:   NULL

From the above example, you can see that DW_TAG_pointer_type may contain
one or more DW_TAG_LLVM_annotation btf_type_tag tags.  If
DW_TAG_LLVM_annotation tags are present inside DW_TAG_pointer_type, for
BTF encoding, pahole will need to follow [3] to generate a type chain
like:

  var -> ptr -> tag2 -> tag1 -> ptr -> tag1 -> int

This patch implemented dwarf_loader support. If a pointer type contains
DW_TAG_LLVM_annotation tags, a new type btf_type_tag_ptr_type will be
created which will store the pointer tag itself and all
DW_TAG_LLVM_annotation tags.  During recoding stage, the type chain will
be formed properly based on the above example.

An option "--skip_encoding_btf_type_tag" is added to disable
this new functionality.

  [1] https://reviews.llvm.org/D111199
  [2] https://reviews.llvm.org/D113222
  [3] https://reviews.llvm.org/D113496

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-23 20:37:51 -03:00
Douglas Raillard 772725a77d dwarves_fprintf: Move cacheline_size into struct conf_fprintf
Remove the global variable and turn it into a member in struct
conf_fprintf, so that it can be used by other parts of the code.

Signed-off-by: Douglas Raillard <douglas.raillard@arm.com>
Cc: dwarves@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-28 10:17:59 -03:00
Yonghong Song c52f6421f2 btf: Rename btf_tag to btf_decl_tag
Kernel commit ([1]) renamed btf_tag to btf_decl_tag for uapi btf.h and
libbpf api's. The reason is a new clang attribute, btf_type_tag, is
introduced ([2]).  Renaming btf_tag to btf_decl_tag makes it easier to
distinghish from btf_type_tag.

I also pulled in latest libbpf repo since it contains renamed libbpf api
function btf__add_decl_tag().

  [1] https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/
  [2] https://reviews.llvm.org/D111199

Signed-off-by: Yonghong Song <yhs@fb.com>
[ Minor fixups to cope with --skip_missing ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-27 09:35:23 -03:00
Douglas Raillard 5282feee6d pahole: Add --skip_missing option
Add a --skip_missing option that allows pahole to keep going in case one
of the type passed to -C (e.g. via a file) does not exist.

This is useful for intropsection software such as debugging kernel
modules that can handle various kernel configurations and versions for
which some recently added types are missing. The consumer of the header
becomes responsible of gating the uses of the type with #ifdef
CONFIG_XXX, rather than pahole bailing out on the first unknown type.

Committer testing:

Before:

  $ pahole tcp_splice_state,xxfrm_policy_queue,list_head tcp.o
  struct tcp_splice_state {
  	struct pipe_inode_info *   pipe;                 /*     0     8 */
  	size_t                     len;                  /*     8     8 */
  	unsigned int               flags;                /*    16     4 */

  	/* size: 24, cachelines: 1, members: 3 */
  	/* padding: 4 */
  	/* last cacheline: 24 bytes */
  };
  pahole: type 'xxfrm_policy_queue' not found
  $

After:

  $ pahole --help |& grep skip
        --skip=COUNT           Skip COUNT input records
        --skip_encoding_btf_tag   Do not encode TAGs in BTF.
        --skip_encoding_btf_vars   Do not encode VARs in BTF.
        --skip_missing         skip missing types passed to -C rather than stop
  $ pahole --skip_missing tcp_splice_state,xxfrm_policy_queue,list_head tcp.o
  struct tcp_splice_state {
  	struct pipe_inode_info *   pipe;                 /*     0     8 */
  	size_t                     len;                  /*     8     8 */
  	unsigned int               flags;                /*    16     4 */

  	/* size: 24, cachelines: 1, members: 3 */
  	/* padding: 4 */
  	/* last cacheline: 24 bytes */
  };
  struct list_head {
  	struct list_head *         next;                 /*     0     8 */
  	struct list_head *         prev;                 /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* last cacheline: 16 bytes */
  };
  pahole: type 'xxfrm_policy_queue' not found
  $

Signed-off-by: Douglas Raillard <douglas.raillard@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: dwarves@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-26 11:29:55 -03:00
Yonghong Song aa8c494e65 dwarf_loader: Parse DWARF tag DW_TAG_LLVM_annotation
Parse the DWARF tag DW_TAG_LLVM_annotation. Only record annotations with
btf_tag name which corresponds to btf_tag attributes in C code. Such
information will be used later by the btf_encoder for BTF conversion.

The LLVM implementation only supports btf_tag annotations on
struct/union, func, func parameter and variable ([1]).  So we only check
existence of corresponding DW tags in these places.

A flag "--skip_encoding_btf_tag" is introduced if for whatever reason
this feature needs to be disabled.

 [1] https://reviews.llvm.org/D106614

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Link: https://lore.kernel.org/r/20210922021326.2287095-1-yhs@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-27 17:06:56 -03:00
Arnaldo Carvalho de Melo 9f0809e6a8 pahole: Introduce --ptr_table_stats
Useful while developing to help in tuning the ptr tables (types, tags,
functions, maybe some more in the future).

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo c59e996c97 pahole: Fix races in accessing type information in live CUs
When using multithreaded DWARF loading we can't really freely access
some tables, as they may grow and lead to stale data accesses generating
segfaults.

So use a type comparision that takes into account just the immutable
information for structs, unions.

This isn't enough to discern if two types with the same name are really
the same, as we need to look at the member types to figure that out.

So if there are types for which member types need to be checked, leave
it for when all CUs were processed and are thus completely immutable to
resort and fully compare such types.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo c34b6c6cc9 pahole: Add missing limits.h include to get ULLONG_MAX definition
Found while compiling on a musl libc system.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo eba3e874ad pahole: Consider type members's names when comparing unions, structs
The last one was:

  $ btfdiff vmlinux
  --- /tmp/btfdiff.dwarf.VUXlsB	2021-08-06 18:11:51.371012024 -0300
  +++ /tmp/btfdiff.btf.CeZ7hA	2021-08-06 18:11:51.604017029 -0300
  @@ -48226,8 +48226,8 @@ struct intel_ir_data {
   	/* last cacheline: 56 bytes */
   };
   struct intel_pad_context {
  -	u32                        padctrl0;             /*     0     4 */
  -	u32                        padctrl1;             /*     4     4 */
  +	u32                        conf0;                /*     0     4 */
  +	u32                        val;                  /*     4     4 */

   	/* size: 8, cachelines: 1, members: 2 */
   	/* last cacheline: 8 bytes */
  $

That now is covered as well. Please report if you see some other corner
case (some attribute(__aligned__(N)))) perhaps? :)).

Now 'btfdiff vmlinux' is clean.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo f61d458c91 pahole: Consider type members's types when comparing unions, structs
But this still doesn't cover all types in the kernel, at least not for
btfdiff's needs, which is to have the output of pahole for BTF and DWARF
to be the same, so if we have two types that are ABI equal, it will
still complain if...

  $ btfdiff vmlinux
  --- /tmp/btfdiff.dwarf.VUXlsB	2021-08-06 18:11:51.371012024 -0300
  +++ /tmp/btfdiff.btf.CeZ7hA	2021-08-06 18:11:51.604017029 -0300
  @@ -48226,8 +48226,8 @@ struct intel_ir_data {
   	/* last cacheline: 56 bytes */
   };
   struct intel_pad_context {
  -	u32                        padctrl0;             /*     0     4 */
  -	u32                        padctrl1;             /*     4     4 */
  +	u32                        conf0;                /*     0     4 */
  +	u32                        val;                  /*     4     4 */

   	/* size: 8, cachelines: 1, members: 2 */
   	/* last cacheline: 8 bytes */
  $

The name of some members are different :-\ Consider it in the next
patch and possibly add a knob to consider both types equal, i.e. don't
compare member names, just size, number of members and types of pairs of
members (at each offset in both types).

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 815041d6dc pahole: Improve the type sorting routine to consider multiple types with same name
Out of these different Linux kernel types with the same name (in different object files):

  $ pahole --sizes | sed -r 's/([^\t]+)\t.*/\1/g' | sort | uniq -c | grep -v ' 1 '
        2 chksum_desc_ctx
        2 controller
        2 debug_buffer
        2 dir_entry
        2 disklabel
        2 dma_chan
        2 dma_heap_attachment
        2 d_partition
        2 elf_thread_core_info
        2 intel_community_context
        2 intel_pad_context
        3 irq_info
        2 irte
        2 map_info
        2 mm_slot
        2 netlbl_domhsh_walk_arg
        2 node
        2 pci_root_info
        2 perf_aux_event
        2 pmc_dev
        2 pmc_reg_map
        2 remap_data
        2 slot
        2 sw842_param
        2 syscall_tp_t
        3 urb_priv
        2 walk_control
        3 workspace
  $

Only this one needs a more involved type comparision:

  $ btfdiff vmlinux
  --- /tmp/btfdiff.dwarf.Pksrlr	2021-08-06 16:42:34.823259365 -0300
  +++ /tmp/btfdiff.btf.KOAuwd	2021-08-06 16:42:35.032264038 -0300
  @@ -31035,7 +31035,7 @@ struct elf_note_info {
   	struct memelfnote          auxv;                 /*    56    24 */
   	/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
   	struct memelfnote          files;                /*    80    24 */
  -	compat_siginfo_t           csigdata;             /*   104   128 */
  +	siginfo_t                  csigdata;             /*   104   128 */
   	/* --- cacheline 3 boundary (192 bytes) was 40 bytes ago --- */
   	size_t                     size;                 /*   232     8 */
   	int                        thread_notes;         /*   240     4 */
  $

It has the same size, number of members.

And this is not always, it all depends on the order in which the btf
encoder gets it from one of the DWARF loading threads:

  $ pahole -j12 --btf_encode vmlinux
  $ btfdiff vmlinux
  $

No changes, but then:

  $ btfdiff vmlinux
  $ perf stat pahole -j12 --btf_encode vmlinux

   Performance counter stats for 'pahole -j12 --btf_encode vmlinux':

           17,920.75 msec task-clock:u              #    2.995 CPUs utilized
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              78,004      page-faults:u             #    4.353 K/sec
      42,677,746,170      cycles:u                  #    2.381 GHz                      (83.37%)
         480,920,924      stalled-cycles-frontend:u #    1.13% frontend cycles idle     (83.33%)
       6,470,001,379      stalled-cycles-backend:u  #   15.16% backend cycles idle      (83.39%)
      96,468,468,147      instructions:u            #    2.26  insn per cycle
                                                    #    0.07  stalled cycles per insn  (83.33%)
      19,757,801,968      branches:u                #    1.103 G/sec                    (83.27%)
         143,118,731      branch-misses:u           #    0.72% of all branches          (83.32%)

         5.984348164 seconds time elapsed

        17.234929000 seconds user
         0.398715000 seconds sys

  $ btfdiff vmlinux
  --- /tmp/btfdiff.dwarf.b9FEZI	2021-08-06 16:46:08.810043718 -0300
  +++ /tmp/btfdiff.btf.IawvDY	2021-08-06 16:46:09.026048548 -0300
  @@ -31035,7 +31035,7 @@ struct elf_note_info {
   	struct memelfnote          auxv;                 /*    56    24 */
   	/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
   	struct memelfnote          files;                /*    80    24 */
  -	compat_siginfo_t           csigdata;             /*   104   128 */
  +	siginfo_t                  csigdata;             /*   104   128 */
   	/* --- cacheline 3 boundary (192 bytes) was 40 bytes ago --- */
   	size_t                     size;                 /*   232     8 */
   	int                        thread_notes;         /*   240     4 */
  $

Next cset will take that into account by traversing both types looking
for differences in the type for a field.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 23ea62817c pahole: Move case fallthru comment to after the statement
In this case we have:

	case foo: {
	}
        case bar:

The fallthru comment has to be _after_ the closing curly brace, fix it
and avoid this warning (from clang, but probably from gcc too):

  /var/home/acme/git/pahole/pahole.c:573:40: warning: this statement may fall through [-Wimplicit-fallthrough=]
    573 |                 case DW_TAG_base_type: {
        |                                        ^
  /var/home/acme/git/pahole/pahole.c:582:17: note: here
    582 |                 case DW_TAG_pointer_type:
        |                 ^~~~

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 21b2933f01 pahole: Fix signedness of ternary expression operator
To address this clang warning:

  /var/home/acme/git/pahole/pahole.c: In function ‘type__instance_read_once’:
  /var/home/acme/git/pahole/pahole.c:1933:78: warning: operand of ‘?:’ changes signedness from ‘int’ to ‘uint32_t’ {aka ‘unsigned int’} due to unsignedness of other operand [-Wsign-compare]
   1933 |         return fread(instance->instance, instance->type->size, 1, fp) != 1 ? -1 : instance->type->size;

Fixes: e3e5a4626c ("pahole: Make sure the header is read only once")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 54c1e93b8e pahole: Use the 'prototypes' parameter in prototypes__load()
It was using &class_names directly while it was also being passed as the
'prototypes' argument, use the argument.

Fixes: 823739b56f ("pahole: Convert class_names into a list of struct prototypes")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 3895127ce6 pahole: Clarify that currently --nr_methods doesn't work together witn -C
It should, as its natural to do:

  $ pahole --nr_methods -C sock

And have it traverse all functions in all compilation units and show how
many of them have 'struct sock *' as one of its arguments, but more
changes are needed to have this in place and it is easy enough to do:

  $ pahole --nr_methods | grep -w sock

  $ pahole --nr_methods  | grep -w sock
  sock	1005
  $

And with BTF, its super fast too.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 2ea46285ac pahole: No need to store the class name in 'struct structure'
As we by now already store the 'struct class' it comes from and
class->name is now a string, no point in storing a duplicate name.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 4d8551396d pahole: Multithreaded DWARF loading requires elfutils >= 0.178
According to Mark Wieelard and as per testing, elfutils' libdw version
must be at least 0.178 for multithreaded DWARF loading.

Check that and emit a warning and then continue using just a single
thread, this allows for asking for multithreading in things like the
Linux Kernel makefiles while still working on older systems, such as
centos:7, where the elfutils version is 0.176.

Mark also provided this info for people using centos:7 (and
equivalents):

''Note that on centos7 if you install centos-release-scl you can get the
various devtoolset packages that do contain newer gcc and elfutils. The
latest are devtoolset-10-gcc (gcc-10.2.1) and devtoolset-10-elfutils-devel
(elfutils-0.182).

After installing you can use them with "scl enable devtoolset-10 bash"
which sets up the environment with the new devtools as default.''

A quick attempt at using a lock around all libdw functions ended up
being a too heavy big hammer, making the multithreaded DWARF loader to
be worse than using just a single thread.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo d2d83be1e2 pahole: Allow tweaking the size of the loader hash tables
To experiment with different sizes as time goes by and the number of symbols in
the kernel grows.

The current default, 15, is suboptimal for the fedora rawhide kernel, we can do
better using 12.

Default: 15:

  $ sudo ~acme/bin/perf stat -d -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,107.73 msec task-clock                #    2.749 CPUs utilized            ( +-  0.05% )
               1,723      context-switches          #  212.562 /sec                     ( +-  1.86% )
                   5      cpu-migrations            #    0.641 /sec                     ( +- 46.07% )
              68,802      page-faults               #    8.486 K/sec                    ( +-  0.05% )
      29,221,590,880      cycles                    #    3.604 GHz                      ( +-  0.04% )
      63,438,138,612      instructions              #    2.17  insn per cycle           ( +-  0.00% )
      15,125,172,105      branches                  #    1.866 G/sec                    ( +-  0.00% )
         119,983,284      branch-misses             #    0.79% of all branches          ( +-  0.06% )
      13,964,248,638      L1-dcache-loads           #    1.722 G/sec                    ( +-  0.00% )
         375,110,346      L1-dcache-load-misses     #    2.69% of all L1-dcache accesses( +-  0.01% )
          91,712,402      LLC-loads                 #   11.312 M/sec                    ( +-  0.14% )
          28,025,289      LLC-load-misses           #   30.56% of all LL-cache accesses ( +-  0.23% )

             2.94980 +- 0.00193 seconds time elapsed  ( +-  0.07% )

  $

New default, to be set in an upcoming patch, 12:

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=12 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=12 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,687.31 msec task-clock                #    2.704 CPUs utilized            ( +-  0.02% )
               1,677      context-switches          #  218.126 /sec                     ( +-  0.70% )
                   4      cpu-migrations            #    0.468 /sec                     ( +- 18.84% )
              67,827      page-faults               #    8.823 K/sec                    ( +-  0.03% )
      27,711,744,058      cycles                    #    3.605 GHz                      ( +-  0.02% )
      63,032,539,630      instructions              #    2.27  insn per cycle           ( +-  0.00% )
      15,062,001,666      branches                  #    1.959 G/sec                    ( +-  0.00% )
         127,728,818      branch-misses             #    0.85% of all branches          ( +-  0.07% )
      13,972,184,314      L1-dcache-loads           #    1.818 G/sec                    ( +-  0.00% )
         364,962,883      L1-dcache-load-misses     #    2.61% of all L1-dcache accesses( +-  0.02% )
          83,969,109      LLC-loads                 #   10.923 M/sec                    ( +-  0.13% )
          19,141,055      LLC-load-misses           #   22.80% of all LL-cache accesses ( +-  0.25% )

            2.842440 +- 0.000952 seconds time elapsed  ( +-  0.03% )

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=11 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=11 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,704.29 msec task-clock                #    2.702 CPUs utilized            ( +-  0.05% )
               1,676      context-switches          #  217.515 /sec                     ( +-  1.04% )
                   2      cpu-migrations            #    0.286 /sec                     ( +- 17.01% )
              67,813      page-faults               #    8.802 K/sec                    ( +-  0.05% )
      27,786,710,102      cycles                    #    3.607 GHz                      ( +-  0.05% )
      63,027,795,038      instructions              #    2.27  insn per cycle           ( +-  0.00% )
      15,066,316,987      branches                  #    1.956 G/sec                    ( +-  0.00% )
         130,431,772      branch-misses             #    0.87% of all branches          ( +-  0.20% )
      13,981,516,517      L1-dcache-loads           #    1.815 G/sec                    ( +-  0.00% )
         369,525,466      L1-dcache-load-misses     #    2.64% of all L1-dcache accesses( +-  0.03% )
          83,328,524      LLC-loads                 #   10.816 M/sec                    ( +-  0.27% )
          18,704,020      LLC-load-misses           #   22.45% of all LL-cache accesses ( +-  0.18% )

             2.85109 +- 0.00281 seconds time elapsed  ( +-  0.10% )

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=8 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=8 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,190.55 msec task-clock                #    2.774 CPUs utilized            ( +-  0.03% )
               1,607      context-switches          #  196.226 /sec                     ( +-  0.67% )
                   3      cpu-migrations            #    0.317 /sec                     ( +- 15.38% )
              67,869      page-faults               #    8.286 K/sec                    ( +-  0.05% )
      29,511,213,192      cycles                    #    3.603 GHz                      ( +-  0.02% )
      63,347,196,598      instructions              #    2.15  insn per cycle           ( +-  0.00% )
      15,198,023,498      branches                  #    1.856 G/sec                    ( +-  0.00% )
         131,113,100      branch-misses             #    0.86% of all branches          ( +-  0.14% )
      14,118,162,884      L1-dcache-loads           #    1.724 G/sec                    ( +-  0.00% )
         422,048,384      L1-dcache-load-misses     #    2.99% of all L1-dcache accesses( +-  0.01% )
         105,878,910      LLC-loads                 #   12.927 M/sec                    ( +-  0.05% )
          21,022,664      LLC-load-misses           #   19.86% of all LL-cache accesses ( +-  0.20% )

            2.952678 +- 0.000858 seconds time elapsed  ( +-  0.03% )

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=13 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=13 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,728.71 msec task-clock                #    2.707 CPUs utilized            ( +-  0.07% )
               1,661      context-switches          #  214.887 /sec                     ( +-  0.70% )
                   2      cpu-migrations            #    0.259 /sec                     ( +- 22.36% )
              67,893      page-faults               #    8.785 K/sec                    ( +-  0.04% )
      27,874,322,843      cycles                    #    3.607 GHz                      ( +-  0.07% )
      63,079,425,815      instructions              #    2.26  insn per cycle           ( +-  0.00% )
      15,067,279,408      branches                  #    1.950 G/sec                    ( +-  0.00% )
         125,706,874      branch-misses             #    0.83% of all branches          ( +-  1.00% )
      13,967,177,801      L1-dcache-loads           #    1.807 G/sec                    ( +-  0.00% )
         363,566,754      L1-dcache-load-misses     #    2.60% of all L1-dcache accesses( +-  0.02% )
          86,583,482      LLC-loads                 #   11.203 M/sec                    ( +-  0.13% )
          20,629,871      LLC-load-misses           #   23.83% of all LL-cache accesses ( +-  0.21% )

             2.85551 +- 0.00124 seconds time elapsed  ( +-  0.04% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo a2f1e69848 core: Use obstacks: take 2
Allow asking for obstacks to be used, as for use cases like the btf
encoder where its all allocate sequentially + free everything at
cu__delete(), so obstacks are applicable and provide a good speedup:

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

           10,445.75 msec task-clock:u              #    2.864 CPUs utilized            ( +-  0.08% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             761,926      page-faults:u             #   72.941 K/sec                    ( +-  0.00% )
      31,946,591,661      cycles:u                  #    3.058 GHz                      ( +-  0.05% )
      69,103,520,880      instructions:u            #    2.16  insn per cycle           ( +-  0.00% )
      16,353,763,143      branches:u                #    1.566 G/sec                    ( +-  0.00% )
         122,309,098      branch-misses:u           #    0.75% of all branches          ( +-  0.12% )

             3.64689 +- 0.00437 seconds time elapsed  ( +-  0.12% )

  $ perf record --call-graph lbr pahole -j --btf_encode_detached vmlinux-j.btf vmlinux
  [ perf record: Woken up 52 times to write data ]
  [ perf record: Captured and wrote 13.151 MB perf.data (43058 samples) ]
  $
  $ perf report --no-children
  Samples: 43K of event 'cycles:u', Event count (approx.): 31938442091
    Overhead  Command  Shared Object         Symbol
  +   22.98%  pahole   libdw-0.185.so        [.] __libdw_find_attr
  +    6.69%  pahole   libdwarves.so.1.0.0   [.] cu__hash.isra.0
  +    5.82%  pahole   libdwarves.so.1.0.0   [.] hashmap__insert
  +    5.16%  pahole   libc.so.6             [.] __libc_calloc
  +    5.01%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_is_equiv
  +    3.39%  pahole   libc.so.6             [.] _int_malloc
  +    2.82%  pahole   libc.so.6             [.] __strcmp_avx2
  +    2.22%  pahole   libdw-0.185.so        [.] __libdw_form_val_compute_len
  +    2.13%  pahole   libdw-0.185.so        [.] dwarf_attr
  +    2.08%  pahole   [unknown]             [k] 0xffffffffa0e010a7
  +    1.98%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__find_type_by_ref
  +    1.98%  pahole   libdwarves.so.1.0.0   [.] btf__dedup
  +    1.92%  pahole   libc.so.6             [.] pthread_rwlock_unlock@@GLIBC_2.34
  +    1.92%  pahole   libdwarves.so.1.0.0   [.] btf__add_field
  +    1.92%  pahole   libdwarves.so.1.0.0   [.] list__for_all_tags
  +    1.61%  pahole   libdwarves.so.1.0.0   [.] btf_encoder__encode_cu
  +    1.49%  pahole   libdwarves.so.1.0.0   [.] die__process_class
  +    1.44%  pahole   libc.so.6             [.] pthread_rwlock_tryrdlock@@GLIBC_2.34
  +    1.24%  pahole   libdw-0.185.so        [.] dwarf_siblingof
  +    1.18%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_ref_type
  +    1.12%  pahole   libdwarves.so.1.0.0   [.] strs_hash_fn
  +    1.11%  pahole   libdwarves.so.1.0.0   [.] attr_numeric
  +    1.01%  pahole   libdwarves.so.1.0.0   [.] tag__size

After:

  $ perf stat -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,114.11 msec task-clock:u              #    2.747 CPUs utilized            ( +-  0.09% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              68,792      page-faults:u             #    8.478 K/sec                    ( +-  0.05% )
      28,705,283,249      cycles:u                  #    3.538 GHz                      ( +-  0.09% )
      63,013,653,035      instructions:u            #    2.20  insn per cycle           ( +-  0.00% )
      15,039,319,384      branches:u                #    1.853 G/sec                    ( +-  0.00% )
         118,272,350      branch-misses:u           #    0.79% of all branches          ( +-  0.41% )

             2.95368 +- 0.00221 seconds time elapsed  ( +-  0.07% )

  $
  $ perf record --call-graph lbr pahole -j --btf_encode_detached vmlinux-j.btf vmlinux
  [ perf record: Woken up 40 times to write data ]
  [ perf record: Captured and wrote 10.426 MB perf.data (33733 samples) ]
  $
  $ perf report --no-children
  Samples: 33K of event 'cycles:u', Event count (approx.): 28860426071
    Overhead  Command  Shared Object         Symbol
  +   26.10%  pahole   libdw-0.185.so        [.] __libdw_find_attr
  +    6.13%  pahole   libdwarves.so.1.0.0   [.] cu__hash.isra.0
  +    5.83%  pahole   libdwarves.so.1.0.0   [.] hashmap__insert
  +    5.52%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_is_equiv
  +    3.04%  pahole   libc.so.6             [.] __strcmp_avx2
  +    2.45%  pahole   libdw-0.185.so        [.] __libdw_form_val_compute_len
  +    2.31%  pahole   libdwarves.so.1.0.0   [.] btf__dedup
  +    2.30%  pahole   libdw-0.185.so        [.] dwarf_attr
  +    2.19%  pahole   libc.so.6             [.] pthread_rwlock_unlock@@GLIBC_2.34
  +    2.08%  pahole   libdwarves.so.1.0.0   [.] list__for_all_tags
  +    2.07%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__find_type_by_ref
  +    1.96%  pahole   libdwarves.so.1.0.0   [.] btf__add_field
  +    1.67%  pahole   libc.so.6             [.] pthread_rwlock_tryrdlock@@GLIBC_2.34
  +    1.63%  pahole   libdwarves.so.1.0.0   [.] btf_encoder__encode_cu
  +    1.52%  pahole   libdwarves.so.1.0.0   [.] die__process_class
  +    1.51%  pahole   libdwarves.so.1.0.0   [.] attr_type
  +    1.36%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_ref_type
  +    1.32%  pahole   libdwarves.so.1.0.0   [.] strs_hash_fn
  +    1.25%  pahole   libdw-0.185.so        [.] dwarf_siblingof
  +    1.24%  pahole   libdwarves.so.1.0.0   [.] namespace__recode_dwarf_types
  +    1.17%  pahole   libdwarves.so.1.0.0   [.] attr_numeric
  +    1.16%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__init
  +    1.03%  pahole   libdwarves.so.1.0.0   [.] tag__init
  +    1.01%  pahole   libdwarves.so.1.0.0   [.] tag__size

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 9d0a7ee0c3 pahole: Ignore DW_TAG_label when encoding BTF
As it will not be used, so don't waste cycles/memory parsing them:

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,487.54 msec task-clock:u              #    2.855 CPUs utilized            ( +-  0.31% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             762,431      page-faults:u             #   72.699 K/sec                    ( +-  0.00% )
      31,994,949,358      cycles:u                  #    3.051 GHz                      ( +-  0.09% )
      69,129,157,311      instructions:u            #    2.16  insn per cycle           ( +-  0.00% )
      16,359,974,001      branches:u                #    1.560 G/sec                    ( +-  0.00% )
         122,800,385      branch-misses:u           #    0.75% of all branches          ( +-  0.23% )

             3.67286 +- 0.00917 seconds time elapsed  ( +-  0.25% )

  $

After:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,431.47 msec task-clock:u              #    2.865 CPUs utilized            ( +-  0.04% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             761,982      page-faults:u             #   73.046 K/sec                    ( +-  0.00% )
      31,885,756,148      cycles:u                  #    3.057 GHz                      ( +-  0.04% )
      69,103,456,079      instructions:u            #    2.17  insn per cycle           ( +-  0.00% )
      16,353,867,606      branches:u                #    1.568 G/sec                    ( +-  0.00% )
         122,023,818      branch-misses:u           #    0.75% of all branches          ( +-  0.09% )

             3.64095 +- 0.00194 seconds time elapsed  ( +-  0.05% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 51ba831929 pahole: Ignore DW_TAG_inline_expansion when encoding BTF
XXX: for now leave this commented out, see comments in the source code.

As it will not be used, so don't waste cycles/memory parsing them:

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,973.13 msec task-clock:u              #    2.906 CPUs utilized            ( +-  0.13% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,927      page-faults:u             #   72.352 K/sec                    ( +-  0.00% )
      33,585,562,298      cycles:u                  #    3.061 GHz                      ( +-  0.17% )
      72,687,766,428      instructions:u            #    2.16  insn per cycle           ( +-  0.15% )
      17,198,056,478      branches:u                #    1.567 G/sec                    ( +-  0.16% )
         129,011,360      branch-misses:u           #    0.75% of all branches          ( +-  0.53% )

              3.7760 +- 0.0158 seconds time elapsed  ( +-  0.42% )

  $

After:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,487.54 msec task-clock:u              #    2.855 CPUs utilized            ( +-  0.31% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             762,431      page-faults:u             #   72.699 K/sec                    ( +-  0.00% )
      31,994,949,358      cycles:u                  #    3.051 GHz                      ( +-  0.09% )
      69,129,157,311      instructions:u            #    2.16  insn per cycle           ( +-  0.00% )
      16,359,974,001      branches:u                #    1.560 G/sec                    ( +-  0.00% )
         122,800,385      branch-misses:u           #    0.75% of all branches          ( +-  0.23% )

             3.67286 +- 0.00917 seconds time elapsed  ( +-  0.25% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:25 -03:00
Arnaldo Carvalho de Melo 20757745f0 pahole: Allow encoding BTF with parallel DWARF loading
By adding a lock to serialize access to btf_encoder__encode_cu().

This works and allows a speedup in BTF encoding, but its too brute
force, the right thing to do is have per-thread BTF encoders and then
at the end merge everything in a last pass.

But pick the low hanging fruits now.

On a machine with 4 cores, no HT:

  $ grep "model name" -m1 /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Non-parallel:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf vmlinux' (5 runs):

            8,580.19 msec task-clock:u              #    1.000 CPUs utilized            ( +-  0.08% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             795,451      page-faults:u             #   92.708 K/sec                    ( +-  0.00% )
      29,151,924,821      cycles:u                  #    3.398 GHz                      ( +-  0.11% )
      70,947,245,709      instructions:u            #    2.43  insn per cycle           ( +-  0.00% )
      16,791,160,182      branches:u                #    1.957 G/sec                    ( +-  0.00% )
         120,793,994      branch-misses:u           #    0.72% of all branches          ( +-  1.04% )

             8.58192 +- 0.00686 seconds time elapsed  ( +-  0.08% )
  $

Parallel:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux-j.btf -j vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux-j.btf -j vmlinux' (5 runs):

           10,962.45 msec task-clock:u              #    2.914 CPUs utilized            ( +-  0.15% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,915      page-faults:u             #   72.421 K/sec                    ( +-  0.00% )
      33,552,130,646      cycles:u                  #    3.061 GHz                      ( +-  0.16% )
      72,778,320,572      instructions:u            #    2.17  insn per cycle           ( +-  0.12% )
      17,220,541,136      branches:u                #    1.571 G/sec                    ( +-  0.13% )
         129,353,767      branch-misses:u           #    0.75% of all branches          ( +-  0.48% )

              3.7614 +- 0.0141 seconds time elapsed  ( +-  0.38% )

  $

That CPUs utilized should go all the way to 4 when we parallelize the
BTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:39:02 -03:00
Arnaldo Carvalho de Melo d133569bd0 pahole: No need to read DW_AT_alignment when encoding BTF
No need to read the DW_AT_alignment, not used in BTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:38:58 -03:00
Arnaldo Carvalho de Melo 3e1c7a2077 pahole: Introduce --sort
To ask for sorting output, initially by name.

This is needed in 'btfdiff' to diff the output of 'pahole -F dwarf
--jobs N', where N threads will go on consuming DWARF compile units and
and pretty printing them, producing a non deterministic output.

So we need to sort the output for both BTF and DWARF, and then diff
them.

This is still not enough for some cases where different types have the
same name, things like "usb_priv" that exists in multiple DWARF compile
units, the first processed is "winning", i.e. being the only one
considered.

I have to look how BTF handles this to adopt a similar algorithm and
keep btfdiff usable as a regression test for the BTF and DWARF loader
and the BTF encoder.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 967290bc71 pahole: Store the class id in 'struct structure' as well
Needed to defer calling printing classes to after we have all sorted out
by name with the upcoming 'pahole --sort' option, needed to make it
possible to compare 'pahole -F btf' with 'pahole -F dwarf -j', as the
multithreaded DWARF loader will not have all classes in a deterministic
order. This is needed for 'btfdiff'.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 5365c45177 pahole: Keep class + cu in tree of structures
We'll use it for ordering by name.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 75d4748861 pahole: Disable parallell BTF encoding for now
Introduce first parallell DWARF loading, test it, then move on to use it
together with BTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 1c60f71daa pahole: Add locking for the structures list and rbtree
Prep work for multithreaded DWARF loading, when there will be concurrent
access to this data structure.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo caa219dffc core: base_type__name() doesn't need a 'cu' arg
Another simplification made possible by using a plain char string
instead of string_t, that was only needed in the core as prep work
for CTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 0f54ca9c82 core: class__clone() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 2b2014187b core: class__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 33e0d5f874 pahole: Introduce --prettify option
The use of isatty(0) to switch into pretty printing is problematic as
reported by Bernd Buschinski, that ran into problems with his scripts:

========================================================================
  I am using pahole 1.21 and I recently noticed that I no longer have
  any pahole output in several scripts.

  Using (on the command line):

    $ pahole -V -E -C my_struct /path/to/my/debug.o

  works fine and gives the expected output.

  But:

    $ parallel -j 1 pahole -V -E -C my_struct ::: /path/to/my/debug.o

  gives nothing, no stderr, no stdout and ret code 0.

  After testing some versions, it works fine in 1.17 and no longer works in 1.18.
========================================================================

Since the pretty printer broke existing scripts, and its a relatively
new feature, lets switch to using a explicit command line option to
activate the pretty printer, i.e. where we used:

  $ pahole --header elf64_hdr < /bin/bash

We now use one of:

  ⬢[acme@toolbox pahole]$ pahole --header elf64_hdr --prettify=/bin/bash
  {
  	.e_ident = { 127, 69, 76, 70, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
  	.e_type = 3,
  	.e_machine = 62,
  	.e_version = 1,
  	.e_entry = 204016,
  	.e_phoff = 64,
  	.e_shoff = 1388096,
  	.e_flags = 0,
  	.e_ehsize = 64,
  	.e_phentsize = 56,
  	.e_phnum = 13,
  	.e_shentsize = 64,
  	.e_shnum = 31,
  	.e_shstrndx = 30,
  },
  ⬢[acme@toolbox pahole]$ pahole --header elf64_hdr --prettify /bin/bash
  {
  	.e_ident = { 127, 69, 76, 70, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
  	.e_type = 3,
  	.e_machine = 62,
  	.e_version = 1,
  	.e_entry = 204016,
  	.e_phoff = 64,
  	.e_shoff = 1388096,
  	.e_flags = 0,
  	.e_ehsize = 64,
  	.e_phentsize = 56,
  	.e_phnum = 13,
  	.e_shentsize = 64,
  	.e_shnum = 31,
  	.e_shstrndx = 30,
  },
  ⬢[acme@toolbox pahole]$ pahole --header elf64_hdr --prettify - < /bin/bash
  {
  	.e_ident = { 127, 69, 76, 70, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
  	.e_type = 3,
  	.e_machine = 62,
  	.e_version = 1,
  	.e_entry = 204016,
  	.e_phoff = 64,
  	.e_shoff = 1388096,
  	.e_flags = 0,
  	.e_ehsize = 64,
  	.e_phentsize = 56,
  	.e_phnum = 13,
  	.e_shentsize = 64,
  	.e_shnum = 31,
  	.e_shstrndx = 30,
  },
  ⬢[acme@toolbox pahole]$ pahole --header elf64_hdr --prettify=- < /bin/bash
  {
  	.e_ident = { 127, 69, 76, 70, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
  	.e_type = 3,
  	.e_machine = 62,
  	.e_version = 1,
  	.e_entry = 204016,
  	.e_phoff = 64,
  	.e_shoff = 1388096,
  	.e_flags = 0,
  	.e_ehsize = 64,
  	.e_phentsize = 56,
  	.e_phnum = 13,
  	.e_shentsize = 64,
  	.e_shnum = 31,
  	.e_shstrndx = 30,
  },
  ⬢[acme@toolbox pahole]$

Reported-by: Bernd Buschinski <b.buschinski@googlemail.com>
Report-Link: https://lore.kernel.org/dwarves/CACN-hLVoz2tWrtgDLabOv6S1-H_8RD2fh8SV6EnADF1ikMxrmw@mail.gmail.com/
Tested-by-by: Bernd Buschinski <b.buschinski@googlemail.com>
Test-Link: https://lore.kernel.org/dwarves/CACN-hLXgHWdBkyMz+w58qX8DaV+WJ1mj1qheGBHbPv4fqozi5w@mail.gmail.com/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo bc36e94f32 pahole: Try harder to resolve the --header type when pretty printing
Go on processing CUs till we have everything sorted out, which includes
the --header type.

On a file with DWARF info where the header type was the last to be found
it wasn't being resolved and the tool fails to resolve header variable
references and emits this misleading error message:

  ⬢[acme@toolbox pahole]$ pahole ~/bin/perf --header=perf_file_header --seek_bytes '$header.data.offset' --size_bytes='$header.data.size' -C 'perf_event_header(sizeof,type,type_enum=perf_event_type)' < perf.data
  pahole: --seek_bytes ($header.data.offset) makes reference to --header but it wasn't specified
  ⬢[acme@toolbox pahole]$

And that 'struct perf_file_header' _is_ in one of the CUs in ~/bin/perf:

  ⬢[acme@toolbox pahole]$ pahole ~/bin/perf -C perf_file_header
  struct perf_file_header {
  	u64                        magic;                /*     0     8 */
  	u64                        size;                 /*     8     8 */
  	u64                        attr_size;            /*    16     8 */
  	struct perf_file_section   attrs;                /*    24    16 */
  	struct perf_file_section   data;                 /*    40    16 */
  	struct perf_file_section   event_types;          /*    56    16 */
  	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
  	long unsigned int          adds_features[4];     /*    72    32 */

  	/* size: 104, cachelines: 2, members: 7 */
  	/* last cacheline: 40 bytes */
  };
  ⬢[acme@toolbox pahole]$

With this fix all the records are printed.

This probably wasn't noticed before because most tests were made with a
~/bin/perf file with BTF information, i.e. just one "CU", so the logic
of deferring the pretty printing till everything gets resolved wasn't
being exercised properly.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo fcfa2141c3 pahole: Make prototype__stdio_fprintf_value() receive a FILE to read raw data from
So far its just from stdin, but shouldn't.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 2d35630fa5 pahole: Make pipe_seek() honour the 'fp' arg instead of hardcoding stdin
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 9aa01472d9 pahole: Rename 'fp' to 'output' in prototype__stdio_fprintf_value()
As we'll also have another FILE pointer for input.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 472b940180 pahole: Use the supplied 'fp' argument in type__instance_read_once()
It was unconditionally reading from 'stdin', when a 'fp' is supplied.

Fix this as now we'll stop unconditionally reading from stdin for the
pretty printer.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo ced4c34c37 core: Remove strings.c, unused
We were using this just for the ctf_encoder, that never really got
complete, so ditch it.

For BTF the strings table is done by libbpf, so we don't need it there
either.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:11 -03:00
Arnaldo Carvalho de Melo f8d571934b pahole: Add missing bpf/btf.h include
We get it by accident, via pahole_strings.h, and that is going away, fix
it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo f4a77d0390 pahole: Use conf_load.kabi_prefix
Should work just as before, i.e. we hook at wher we read strings from
DWARF.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo e974d1b240 pahole: class_member_filter__new() doesn't need a 'struct cu *' argument
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 0275e8d249 pahole: class_member_filter__parse() doesn't need a 'struct cu *' argument
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 90183e8e4d pahole: tag__real_sizeof() doesn't need a 'struct cu *' argument
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 5cb9192738 pahole: Rename tag__fprintf_hexdump_value() to instance__fprintf_hexdump_value()
As it acts only on an instance, doesn't need neither a 'struct tag' nor
a 'struct cu'.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 75c769a900 pahole: enumerations__lookup_entry_from_value() doesn't need to return a CU anymore
As it will not be used in the caller.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 1edca26552 pahole: enumeration__lookup_entry_from_value() doesn't need a 'cu' argument
With the conversion of ->name members to plain char strings, no need
to use 'cu' to get the old string_t index and find the per-cu string
table.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo e18c60d793 pahole: enumeration__lookup_value() doesn't need a 'cu' argument
With the conversion of ->name members to plain char strings, no need
to use 'cu' to get the old string_t index and find the per-cu string
table.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 4b877c8e67 pahole: enumeration__lookup_enumerator() doesn't need a 'cu' argument
With the conversion of ->name members to plain char strings, no need
to use 'cu' to get the old string_t index and find the per-cu string
table.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 96243fdd79 core: enumerator__name() doesn't need a 'cu' argument, ditch it
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo c127d25daf core: class__name() doesn't need a cu arg
Now that namespace->name is a real char string.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00