Commit Graph

232 Commits

Author SHA1 Message Date
Kui-Feng Lee 96d2c5c323 dwarf_loader: Prepare and pass per-thread data to worker threads
Add interfaces to allow users of dwarf_loader to prepare and pass
per-thread data to steal-functions running on worker threads.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Link: https://lore.kernel.org/r/20220126192039.2840752-3-kuifeng@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-28 16:30:10 -03:00
Kui-Feng Lee 724c8fddd7 dwarf_loader: Receive per-thread data on worker threads
Add arguments to steal and thread_exit callbacks of conf_load to
receive per-thread data.

Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Link: https://lore.kernel.org/r/20220126192039.2840752-2-kuifeng@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-28 16:19:29 -03:00
Yonghong Song b488c8d328 dwarf_loader: Support btf_type_tag attribute
LLVM patches ([1] for clang, [2] and [3] for BPF backend)
added support for btf_type_tag attributes. The following is
an example:

  [$ ~] cat t.c
  #define __tag1 __attribute__((btf_type_tag("tag1")))
  #define __tag2 __attribute__((btf_type_tag("tag2")))
  int __tag1 * __tag1 __tag2 *g __attribute__((section(".data..percpu")));
  [$ ~] clang -O2 -g -c t.c
  [$ ~] llvm-dwarfdump --debug-info t.o
  t.o:    file format elf64-x86-64
  ...
  0x0000001e:   DW_TAG_variable
                  DW_AT_name      ("g")
                  DW_AT_type      (0x00000033 "int **")
                  DW_AT_external  (true)
                  DW_AT_decl_file ("/home/yhs/t.c")
                  DW_AT_decl_line (3)
                  DW_AT_location  (DW_OP_addr 0x0)
  0x00000033:   DW_TAG_pointer_type
                  DW_AT_type      (0x0000004b "int *")
  0x00000038:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_type_tag")
                    DW_AT_const_value     ("tag1")
  0x00000041:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_type_tag")
                    DW_AT_const_value     ("tag2")
  0x0000004a:     NULL
  0x0000004b:   DW_TAG_pointer_type
                  DW_AT_type      (0x0000005a "int")
  0x00000050:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_type_tag")
                    DW_AT_const_value     ("tag1")
  0x00000059:     NULL
  0x0000005a:   DW_TAG_base_type
                  DW_AT_name      ("int")
                  DW_AT_encoding  (DW_ATE_signed)
                  DW_AT_byte_size (0x04)
  0x00000061:   NULL

From the above example, you can see that DW_TAG_pointer_type may contain
one or more DW_TAG_LLVM_annotation btf_type_tag tags.  If
DW_TAG_LLVM_annotation tags are present inside DW_TAG_pointer_type, for
BTF encoding, pahole will need to follow [3] to generate a type chain
like:

  var -> ptr -> tag2 -> tag1 -> ptr -> tag1 -> int

This patch implemented dwarf_loader support. If a pointer type contains
DW_TAG_LLVM_annotation tags, a new type btf_type_tag_ptr_type will be
created which will store the pointer tag itself and all
DW_TAG_LLVM_annotation tags.  During recoding stage, the type chain will
be formed properly based on the above example.

An option "--skip_encoding_btf_type_tag" is added to disable
this new functionality.

  [1] https://reviews.llvm.org/D111199
  [2] https://reviews.llvm.org/D113222
  [3] https://reviews.llvm.org/D113496

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-23 20:37:51 -03:00
Yonghong Song a0cc68687f dutil: Move DW_TAG_LLVM_annotation definition to dutil.h
Move DW_TAG_LLVM_annotation definition from dwarf_load.c to dutil.h as
it will be used later for btf_encoder.c.  There is no functionality
change for this patch.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-23 20:37:43 -03:00
Arnaldo Carvalho de Melo 0135ccd632 dwarf_loader: Warn about DW_TAG_skeleton_unit and give a workaround
$ pahole ~/c/split/foo.o
  WARNING: DW_TAG_skeleton_unit used, please look for a .dwo file and use it instead.
           A future version of pahole will support do this automagically.
  $

Reported-by: https://twitter.com/trass3r
Link: https://github.com/acmel/dwarves/issues/23
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-12 15:20:09 -03:00
Arnaldo Carvalho de Melo 7af9ed4aed dwarf_loader: Print the hexadecimal value for unexpected tags in die__process()
So that we can get it from user reports, i.e. instead of:

  die__process: DW_TAG_compile_unit, DW_TAG_type_unit or DW_TAG_partial_unit expected got INVALID

We now get:

  die__process: DW_TAG_compile_unit, DW_TAG_type_unit or DW_TAG_partial_unit expected got INVALID (0x4a)

That we can then look in dwarf.h and notice that there is this new:

     DW_TAG_skeleton_unit = 0x4a,

Now lets go support it...

Reported-by: https://twitter.com/trass3r
Link: https://github.com/acmel/dwarves/issues/23
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-12 15:02:28 -03:00
Yonghong Song 468b4196f6 dwarf_loader: support typedef DW_TAG_LLVM_annotation
llvm commit ([1]) added support for btf_decl_tag attribute
with typedef declaration. Eventually, DW_TAG_LLVM_annotation
tag may appear inside dwarf typedef declaration tag.

kernel support for typedef BTF_KIND_DECL_TAG support
is introduced in [2]. There is no additional libbpf
change needed as the previous libbpf BTF_KIND_DECL_TAG
support is generic enough to cover new typedef use
cases.

This patch added parsing of DW_TAG_LLVM_annotation
for dwarf typedef decl.

  $ cat t.c
  $ clang -O2 -g -c t.c
  $ llvm-dwarfdump --debug-info t.o
    ......
    0x00000033:   DW_TAG_typedef
                    DW_AT_type      (0x00000051 "structure ")
                    DW_AT_name      ("__t")
                    DW_AT_decl_file ("/home/yhs/t.c")
                    DW_AT_decl_line (3)

    0x0000003e:     DW_TAG_LLVM_annotation
                      DW_AT_name    ("btf_decl_tag")
                      DW_AT_const_value     ("tag1")

    0x00000047:     DW_TAG_LLVM_annotation
                      DW_AT_name    ("btf_decl_tag")
                      DW_AT_const_value     ("tag2")

    0x00000050:     NULL

Previously, pahole will issue a warning if typedef tag
contains any child tag. I removed this warning since
it is not true any more. Note that dwarf standard doesn't
prevent typedef decl tag from having nested tags.
In the future if we need to process any tag inside
typedef tag, we can just add code to process it.

  [1] https://reviews.llvm.org/D110127
  [2] https://lore.kernel.org/bpf/20211021195628.4018847-1-yhs@fb.com

Signed-off-by: Yonghong Song <yhs@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-11 09:31:27 -03:00
Yonghong Song c52f6421f2 btf: Rename btf_tag to btf_decl_tag
Kernel commit ([1]) renamed btf_tag to btf_decl_tag for uapi btf.h and
libbpf api's. The reason is a new clang attribute, btf_type_tag, is
introduced ([2]).  Renaming btf_tag to btf_decl_tag makes it easier to
distinghish from btf_type_tag.

I also pulled in latest libbpf repo since it contains renamed libbpf api
function btf__add_decl_tag().

  [1] https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/
  [2] https://reviews.llvm.org/D111199

Signed-off-by: Yonghong Song <yhs@fb.com>
[ Minor fixups to cope with --skip_missing ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-27 09:35:23 -03:00
Ilya Leoshkevich 3cde0135ca dwarf_loader: Fix heap overflow when accessing variable specification
Variables can be allocated with or without specification, however,
tag__recode_dwarf_type() always tries accessing it, leading to heap read
overflows and subsequent logic bugs.

Fix by introducing a bit that tracks whether or not specification is
present.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-26 11:29:55 -03:00
Arnaldo Carvalho de Melo a9c99e9881 dwarves: Introduce conf_load->thread_exit() callback
Will be called when a thread exits, initially only in the DWARF loader,
so that pahole can call the btf_encoder associated with the exiting
thread to do the dedup as the last step done in parallel.

Then we'll iterate the btf_encoders list and combine everything into the
first btf_encoder instance that gets then written to disk.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-10-14 17:37:25 -03:00
Yonghong Song aa8c494e65 dwarf_loader: Parse DWARF tag DW_TAG_LLVM_annotation
Parse the DWARF tag DW_TAG_LLVM_annotation. Only record annotations with
btf_tag name which corresponds to btf_tag attributes in C code. Such
information will be used later by the btf_encoder for BTF conversion.

The LLVM implementation only supports btf_tag annotations on
struct/union, func, func parameter and variable ([1]).  So we only check
existence of corresponding DW tags in these places.

A flag "--skip_encoding_btf_tag" is introduced if for whatever reason
this feature needs to be disabled.

 [1] https://reviews.llvm.org/D106614

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Link: https://lore.kernel.org/r/20210922021326.2287095-1-yhs@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-27 17:06:56 -03:00
Arnaldo Carvalho de Melo 38df86db2b dwarf_loader: cus__load_debug_types() doesn't use its 'cus' arg, remove it
But since it is still related to cus processing, remove that arg and
rename it to __cus__load_debug_types().

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 16d646c07e dwarf_loader: Rename finalize_cu_immediately() to cus__finalize() to follow convention
Follow convention by renaming it  to cu__finalize() as it operates on a
'cus' instance.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 90599e6177 dwarf_loader: Remove unused 'dcu' argument from finalize_cu_immediately()
Not used at all, ditch it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 6fd4377a0d dwarf_loader: Remove unused 'dcus' argument from cu__finalize()
Not used at all, ditch it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 2bb04ecf79 dwarf_loader: Remove unused 'cus' argument from finalize_cu()
And follow convention and rename it to cu__finalize() as it operates on
a 'cu' instance.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 9ada372a21 dwarf_loader: Fix signed/unsigned comparision in tag__recode_dwarf_bitfield()
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 6edae3e768 dwarf_loader: Make hash table size default to 12, faster than 15
The sweet spot for recent kernels, the default is 15 in the tests below,
changing to 12 reduces the time elapsed, make it the new default.

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

  $ sudo perf stat -d -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,101.71 msec task-clock                #    2.752 CPUs utilized            ( +-  0.06% )
               1,682      context-switches          #  207.610 /sec                     ( +-  0.98% )
                   5      cpu-migrations            #    0.592 /sec                     ( +- 15.31% )
              68,870      page-faults               #    8.501 K/sec                    ( +-  0.02% )
      29,205,269,606      cycles                    #    3.605 GHz                      ( +-  0.05% )
      63,448,636,788      instructions              #    2.17  insn per cycle           ( +-  0.00% )
      15,127,493,299      branches                  #    1.867 G/sec                    ( +-  0.00% )
         120,362,476      branch-misses             #    0.80% of all branches          ( +-  0.11% )
      13,967,000,698      L1-dcache-loads           #    1.724 G/sec                    ( +-  0.00% )
         375,052,289      L1-dcache-load-misses     #    2.69% of all L1-dcache accesses  ( +-  0.03% )
          91,506,061      LLC-loads                 #   11.295 M/sec                    ( +-  0.10% )
          27,905,809      LLC-load-misses           #   30.50% of all LL-cache accesses  ( +-  0.16% )

             2.94445 +- 0.00188 seconds time elapsed  ( +-  0.06% )

  $ sudo perf stat -d -r5 pahole --hashbits 12 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits 12 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,681.15 msec task-clock                #    2.702 CPUs utilized            ( +-  0.05% )
               1,660      context-switches          #  216.114 /sec                     ( +-  1.02% )
                   3      cpu-migrations            #    0.365 /sec                     ( +- 13.36% )
              67,794      page-faults               #    8.826 K/sec                    ( +-  0.05% )
      27,692,748,327      cycles                    #    3.605 GHz                      ( +-  0.04% )
      63,041,363,409      instructions              #    2.28  insn per cycle           ( +-  0.00% )
      15,063,798,404      branches                  #    1.961 G/sec                    ( +-  0.00% )
         127,461,737      branch-misses             #    0.85% of all branches          ( +-  0.11% )
      13,974,527,710      L1-dcache-loads           #    1.819 G/sec                    ( +-  0.00% )
         364,775,664      L1-dcache-load-misses     #    2.61% of all L1-dcache accesses  ( +-  0.01% )
          83,685,127      LLC-loads                 #   10.895 M/sec                    ( +-  0.14% )
          19,073,967      LLC-load-misses           #   22.79% of all LL-cache accesses  ( +-  0.30% )

            2.842468 +- 0.000561 seconds time elapsed  ( +-  0.02% )

  $ sudo perf stat -d -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64' (5 runs):

            9,512.30 msec task-clock                #    2.741 CPUs utilized            ( +-  0.54% )
               1,964      context-switches          #  206.469 /sec                     ( +-  2.60% )
                   7      cpu-migrations            #    0.736 /sec                     ( +- 37.25% )
              81,611      page-faults               #    8.579 K/sec                    ( +-  0.08% )
      34,294,568,812      cycles                    #    3.605 GHz                      ( +-  0.53% )
      72,897,384,015      instructions              #    2.13  insn per cycle           ( +-  0.15% )
      17,386,180,039      branches                  #    1.828 G/sec                    ( +-  0.15% )
         136,142,139      branch-misses             #    0.78% of all branches          ( +-  1.06% )
      16,020,787,096      L1-dcache-loads           #    1.684 G/sec                    ( +-  0.19% )
         430,392,585      L1-dcache-load-misses     #    2.69% of all L1-dcache accesses  ( +-  0.37% )
         107,401,567      LLC-loads                 #   11.291 M/sec                    ( +-  0.30% )
          35,172,977      LLC-load-misses           #   32.75% of all LL-cache accesses  ( +-  0.48% )

              3.4710 +- 0.0243 seconds time elapsed  ( +-  0.70% )

  $ sudo perf stat -d -r5 pahole --hashbits 12 -j --btf_encode_detached vmlinux-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64

   Performance counter stats for 'pahole --hashbits 12 -j --btf_encode_detached vmlinux-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64' (5 runs):

            8,929.50 msec task-clock                #    2.700 CPUs utilized            ( +-  0.04% )
               1,907      context-switches          #  213.539 /sec                     ( +-  0.68% )
                   4      cpu-migrations            #    0.426 /sec                     ( +- 30.46% )
              80,661      page-faults               #    9.033 K/sec                    ( +-  0.03% )
      32,213,009,827      cycles                    #    3.607 GHz                      ( +-  0.03% )
      72,345,614,657      instructions              #    2.25  insn per cycle           ( +-  0.00% )
      17,290,227,666      branches                  #    1.936 G/sec                    ( +-  0.00% )
         142,108,954      branch-misses             #    0.82% of all branches          ( +-  0.09% )
      15,998,190,852      L1-dcache-loads           #    1.792 G/sec                    ( +-  0.00% )
         417,872,772      L1-dcache-load-misses     #    2.61% of all L1-dcache accesses  ( +-  0.02% )
          98,061,829      LLC-loads                 #   10.982 M/sec                    ( +-  0.24% )
          24,750,223      LLC-load-misses           #   25.24% of all LL-cache accesses  ( +-  0.17% )

             3.30670 +- 0.00185 seconds time elapsed  ( +-  0.06% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo ff7bd7083f core: Allow sizing the loader hash table
For now this will only apply to the dwarf loader, for experimenting as
time passes and kernels grow bigger or with more symbols.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 8eebf70d05 dwarf_loader: Use a per-CU frontend cache for the latest lookup result
Using a debug patch I found that for the Linux (vmlinux from fedora
rawhide) we get this number of hits:

  nr_saved_lookups=2661460

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -d -r1 pahole -j --btf_encode_detached vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64':

            9,515.95 msec task-clock:u              #    2.731 CPUs utilized
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              81,634      page-faults:u             #    8.579 K/sec
      33,468,454,452      cycles:u                  #    3.517 GHz
      72,279,667,117      instructions:u            #    2.16  insn per cycle
      17,256,208,904      branches:u                #    1.813 G/sec
         132,775,067      branch-misses:u           #    0.77% of all branches
      15,840,427,579      L1-dcache-loads:u         #    1.665 G/sec
         417,209,398      L1-dcache-load-misses:u   #    2.63% of all L1-dcache accesses
         105,099,756      LLC-loads:u               #   11.045 M/sec
          35,027,985      LLC-load-misses:u         #   33.33% of all LL-cache accesses

         3.484851710 seconds time elapsed

         9.353155000 seconds user
         0.190730000 seconds sys

  $

After:

  $ perf stat -d -r1 pahole -j --btf_encode_detached \
	vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64-j.btf \
	vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64':

            9,416.17 msec task-clock:u              #    2.744 CPUs utilized
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              81,461      page-faults:u             #    8.651 K/sec
      33,330,006,641      cycles:u                  #    3.540 GHz
      72,301,897,397      instructions:u            #    2.17  insn per cycle
      17,263,694,358      branches:u                #    1.833 G/sec
         133,414,373      branch-misses:u           #    0.77% of all branches
      15,860,141,450      L1-dcache-loads:u         #    1.684 G/sec
         418,816,079      L1-dcache-load-misses:u   #    2.64% of all L1-dcache accesses
         104,960,787      LLC-loads:u               #   11.147 M/sec
          34,629,758      LLC-load-misses:u         #   32.99% of all LL-cache accesses

         3.431376846 seconds time elapsed

         9.294489000 seconds user
         0.146507000 seconds sys

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo a2f1e69848 core: Use obstacks: take 2
Allow asking for obstacks to be used, as for use cases like the btf
encoder where its all allocate sequentially + free everything at
cu__delete(), so obstacks are applicable and provide a good speedup:

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

           10,445.75 msec task-clock:u              #    2.864 CPUs utilized            ( +-  0.08% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             761,926      page-faults:u             #   72.941 K/sec                    ( +-  0.00% )
      31,946,591,661      cycles:u                  #    3.058 GHz                      ( +-  0.05% )
      69,103,520,880      instructions:u            #    2.16  insn per cycle           ( +-  0.00% )
      16,353,763,143      branches:u                #    1.566 G/sec                    ( +-  0.00% )
         122,309,098      branch-misses:u           #    0.75% of all branches          ( +-  0.12% )

             3.64689 +- 0.00437 seconds time elapsed  ( +-  0.12% )

  $ perf record --call-graph lbr pahole -j --btf_encode_detached vmlinux-j.btf vmlinux
  [ perf record: Woken up 52 times to write data ]
  [ perf record: Captured and wrote 13.151 MB perf.data (43058 samples) ]
  $
  $ perf report --no-children
  Samples: 43K of event 'cycles:u', Event count (approx.): 31938442091
    Overhead  Command  Shared Object         Symbol
  +   22.98%  pahole   libdw-0.185.so        [.] __libdw_find_attr
  +    6.69%  pahole   libdwarves.so.1.0.0   [.] cu__hash.isra.0
  +    5.82%  pahole   libdwarves.so.1.0.0   [.] hashmap__insert
  +    5.16%  pahole   libc.so.6             [.] __libc_calloc
  +    5.01%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_is_equiv
  +    3.39%  pahole   libc.so.6             [.] _int_malloc
  +    2.82%  pahole   libc.so.6             [.] __strcmp_avx2
  +    2.22%  pahole   libdw-0.185.so        [.] __libdw_form_val_compute_len
  +    2.13%  pahole   libdw-0.185.so        [.] dwarf_attr
  +    2.08%  pahole   [unknown]             [k] 0xffffffffa0e010a7
  +    1.98%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__find_type_by_ref
  +    1.98%  pahole   libdwarves.so.1.0.0   [.] btf__dedup
  +    1.92%  pahole   libc.so.6             [.] pthread_rwlock_unlock@@GLIBC_2.34
  +    1.92%  pahole   libdwarves.so.1.0.0   [.] btf__add_field
  +    1.92%  pahole   libdwarves.so.1.0.0   [.] list__for_all_tags
  +    1.61%  pahole   libdwarves.so.1.0.0   [.] btf_encoder__encode_cu
  +    1.49%  pahole   libdwarves.so.1.0.0   [.] die__process_class
  +    1.44%  pahole   libc.so.6             [.] pthread_rwlock_tryrdlock@@GLIBC_2.34
  +    1.24%  pahole   libdw-0.185.so        [.] dwarf_siblingof
  +    1.18%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_ref_type
  +    1.12%  pahole   libdwarves.so.1.0.0   [.] strs_hash_fn
  +    1.11%  pahole   libdwarves.so.1.0.0   [.] attr_numeric
  +    1.01%  pahole   libdwarves.so.1.0.0   [.] tag__size

After:

  $ perf stat -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,114.11 msec task-clock:u              #    2.747 CPUs utilized            ( +-  0.09% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              68,792      page-faults:u             #    8.478 K/sec                    ( +-  0.05% )
      28,705,283,249      cycles:u                  #    3.538 GHz                      ( +-  0.09% )
      63,013,653,035      instructions:u            #    2.20  insn per cycle           ( +-  0.00% )
      15,039,319,384      branches:u                #    1.853 G/sec                    ( +-  0.00% )
         118,272,350      branch-misses:u           #    0.79% of all branches          ( +-  0.41% )

             2.95368 +- 0.00221 seconds time elapsed  ( +-  0.07% )

  $
  $ perf record --call-graph lbr pahole -j --btf_encode_detached vmlinux-j.btf vmlinux
  [ perf record: Woken up 40 times to write data ]
  [ perf record: Captured and wrote 10.426 MB perf.data (33733 samples) ]
  $
  $ perf report --no-children
  Samples: 33K of event 'cycles:u', Event count (approx.): 28860426071
    Overhead  Command  Shared Object         Symbol
  +   26.10%  pahole   libdw-0.185.so        [.] __libdw_find_attr
  +    6.13%  pahole   libdwarves.so.1.0.0   [.] cu__hash.isra.0
  +    5.83%  pahole   libdwarves.so.1.0.0   [.] hashmap__insert
  +    5.52%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_is_equiv
  +    3.04%  pahole   libc.so.6             [.] __strcmp_avx2
  +    2.45%  pahole   libdw-0.185.so        [.] __libdw_form_val_compute_len
  +    2.31%  pahole   libdwarves.so.1.0.0   [.] btf__dedup
  +    2.30%  pahole   libdw-0.185.so        [.] dwarf_attr
  +    2.19%  pahole   libc.so.6             [.] pthread_rwlock_unlock@@GLIBC_2.34
  +    2.08%  pahole   libdwarves.so.1.0.0   [.] list__for_all_tags
  +    2.07%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__find_type_by_ref
  +    1.96%  pahole   libdwarves.so.1.0.0   [.] btf__add_field
  +    1.67%  pahole   libc.so.6             [.] pthread_rwlock_tryrdlock@@GLIBC_2.34
  +    1.63%  pahole   libdwarves.so.1.0.0   [.] btf_encoder__encode_cu
  +    1.52%  pahole   libdwarves.so.1.0.0   [.] die__process_class
  +    1.51%  pahole   libdwarves.so.1.0.0   [.] attr_type
  +    1.36%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_ref_type
  +    1.32%  pahole   libdwarves.so.1.0.0   [.] strs_hash_fn
  +    1.25%  pahole   libdw-0.185.so        [.] dwarf_siblingof
  +    1.24%  pahole   libdwarves.so.1.0.0   [.] namespace__recode_dwarf_types
  +    1.17%  pahole   libdwarves.so.1.0.0   [.] attr_numeric
  +    1.16%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__init
  +    1.03%  pahole   libdwarves.so.1.0.0   [.] tag__init
  +    1.01%  pahole   libdwarves.so.1.0.0   [.] tag__size

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo dca86fb8c2 dwarf_loader: Add comment on why we can't ignore lexblocks
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo d40c5f1e20 core: Allow ignoring DW_TAG_label
As the BTF encoder doesn't use this information, so no need parsing it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 9038638891 core: Allow ignoring DW_TAG_inline_expansion
As the BTF encoder doesn't use this information, so no need parsing it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:39:31 -03:00
Arnaldo Carvalho de Melo 21a41e5386 dwarf_loader: Allow asking not to read the DW_AT_alignment attribute
As this isn't present in most types or struct members, which ends up
making dwarf_attr() call libdw_find_attr() that will do a linear search
on all the attributes.

We don't use this in the BTF encoder, so no point in reading that.

This will be used in pahole in the following cset.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:38:09 -03:00
Arnaldo Carvalho de Melo 1ef1639039 dwarf_loader: Do not look for non-C DWARF attributes in C CUs
Avoid looking for attributes that doesn't apply to the C language, such
as DW_AT_virtuality (virtual, pure_virtual), DW_AT_accessibility
(public, protected, private) and DW_AT_const_value.

Looking for those attributes in class_member__new() makes
libdw_find_attr() linearly search all attributes for a die, which
appears on profiling.

Before:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf -j vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf -j vmlinux' (5 runs):

           11,239.99 msec task-clock:u              #    2.921 CPUs utilized    ( +-  0.08% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,897      page-faults:u             #   70.631 K/sec            ( +-  0.00% )
      34,593,518,484      cycles:u                  #    3.078 GHz              ( +-  0.05% )
      75,592,805,563      instructions:u            #    2.19  insn per cycle   ( +-  0.00% )
      17,923,046,622      branches:u                #    1.595 G/sec            ( +-  0.00% )
         131,080,371      branch-misses:u           #    0.73% of all branches  ( +-  0.18% )

              3.84794 +- 0.00327 seconds time elapsed  ( +-  0.09% )
  $

After:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf -j vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf -j vmlinux' (5 runs):

           11,178.28 msec task-clock:u              #    2.929 CPUs utilized            ( +-  0.12% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,890      page-faults:u             #   71.021 K/sec                    ( +-  0.00% )
      34,378,886,265      cycles:u                  #    3.076 GHz                      ( +-  0.13% )
      75,523,849,140      instructions:u            #    2.20  insn per cycle           ( +-  0.12% )
      17,907,573,910      branches:u                #    1.602 G/sec                    ( +-  0.12% )
         130,137,529      branch-misses:u           #    0.73% of all branches          ( +-  0.50% )

              3.8165 +- 0.0137 seconds time elapsed  ( +-  0.36% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 1caed1c443 dwarf_loader: Add a lock around dwarf_decl_file() and dwarf_decl_line() calls
As this ends up racing on a tsearch() call, probably for some libdw
cache that gets updated/lookedup in concurrent pahole threads (-j N).

This cures the following, a patch for libdw will be cooked up and sent.

  (gdb) run -j -I -F dwarf vmlinux > /dev/null
  Starting program: /var/home/acme/git/pahole/build/pahole -j -I -F dwarf vmlinux > /dev/null
  warning: Expected absolute pathname for libpthread in the inferior, but got .gnu_debugdata for /lib64/libpthread.so.0.
  warning: File "/usr/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
  warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
  [New LWP 844789]
  [New LWP 844790]
  [New LWP 844791]
  [New LWP 844792]
  [New LWP 844793]
  [New LWP 844794]
  [New LWP 844795]
  [New LWP 844796]
  [New LWP 844797]
  [New LWP 844798]
  [New LWP 844799]
  [New LWP 844800]
  [New LWP 844801]
  [New LWP 844802]
  [New LWP 844803]
  [New LWP 844804]
  [New LWP 844805]
  [New LWP 844806]
  [New LWP 844807]
  [New LWP 844808]
  [New LWP 844809]
  [New LWP 844810]
  [New LWP 844811]
  [New LWP 844812]
  [New LWP 844813]
  [New LWP 844814]

  Thread 2 "pahole" received signal SIGSEGV, Segmentation fault.
  [Switching to LWP 844789]
  0x00007ffff7dfa321 in ?? () from /lib64/libc.so.6
  (gdb) bt
  #0  0x00007ffff7dfa321 in ?? () from /lib64/libc.so.6
  #1  0x00007ffff7dfa4bb in ?? () from /lib64/libc.so.6
  #2  0x00007ffff7f5eaa6 in __libdw_getsrclines (dbg=0x4a7f90, debug_line_offset=10383710, comp_dir=0x7ffff3c29f01 "/var/home/acme/git/build/v5.13.0-rc6+", address_size=address_size@entry=8, linesp=linesp@entry=0x7fffcfe04ba0, filesp=filesp@entry=0x7fffcfe04ba8)
      at dwarf_getsrclines.c:1129
  #3  0x00007ffff7f5ed14 in dwarf_getsrclines (cudie=cudie@entry=0x7fffd210caf0, lines=lines@entry=0x7fffd210cac0, nlines=nlines@entry=0x7fffd210cac8) at dwarf_getsrclines.c:1213
  #4  0x00007ffff7f64883 in dwarf_decl_file (die=<optimized out>) at dwarf_decl_file.c:66
  #5  0x0000000000425f24 in tag__init (tag=0x7fff0421b710, cu=0x7fffcc001e40, die=0x7fffd210cd30) at /var/home/acme/git/pahole/dwarf_loader.c:476
  #6  0x00000000004262ec in namespace__init (namespace=0x7fff0421b710, die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:576
  #7  0x00000000004263ac in type__init (type=0x7fff0421b710, die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:595
  #8  0x00000000004264d1 in type__new (die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:614
  #9  0x0000000000427ba6 in die__create_new_typedef (die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:1212
  #10 0x0000000000428df5 in __die__process_tag (die=0x7fffd210cd30, cu=0x7fffcc001e40, top_level=1, fn=0x45cee0 <__FUNCTION__.10> "die__process_unit", conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:1823
  #11 0x0000000000428ea1 in die__process_unit (die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:1848
  #12 0x0000000000429e45 in die__process (die=0x7fffd210ce20, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:2311
  #13 0x0000000000429ecb in die__process_and_recode (die=0x7fffd210ce20, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:2326
  #14 0x000000000042a9d6 in dwarf_cus__create_and_process_cu (dcus=0x7fffffffddc0, cu_die=0x7fffd210ce20, pointer_size=8 '\b') at /var/home/acme/git/pahole/dwarf_loader.c:2644
  #15 0x000000000042ab28 in dwarf_cus__process_cu_thread (arg=0x7fffffffddc0) at /var/home/acme/git/pahole/dwarf_loader.c:2687
  #16 0x00007ffff7ed6299 in start_thread () from /lib64/libpthread.so.0
  #17 0x00007ffff7dfe353 in ?? () from /lib64/libc.so.6
  (gdb)
  (gdb) fr 2
  1085
  (gdb) list files_lines_compare
  1086    static int
  1087    files_lines_compare (const void *p1, const void *p2)
  1088    {
  1089	  const struct files_lines_s *t1 = p1;
  1090	  const struct files_lines_s *t2 = p2;
  1091
  1092	  if (t1->debug_line_offset < t2->debug_line_offset)
  (gdb)
  1093        return -1;
  1094	  if (t1->debug_line_offset > t2->debug_line_offset)
  1095        return 1;
  1096
  1097	  return 0;
  1098    }
  1099
  1100    int
  1101    internal_function
  1102    __libdw_getsrclines (Dwarf *dbg, Dwarf_Off debug_line_offset,
  (gdb) list __libdw_getsrclines
  1100    int
  1101    internal_function
  1102    __libdw_getsrclines (Dwarf *dbg, Dwarf_Off debug_line_offset,
  1103                         const char *comp_dir, unsigned address_size,
  1104                         Dwarf_Lines **linesp, Dwarf_Files **filesp)
  1105    {
  1106	  struct files_lines_s fake = { .debug_line_offset = debug_line_offset };
  1107	  struct files_lines_s **found = tfind (&fake, &dbg->files_lines,
  1108                                            files_lines_compare);
  1109	  if (found == NULL)
  (gdb)
  1110        {
  1111          Elf_Data *data = __libdw_checked_get_data (dbg, IDX_debug_line);
  1112          if (data == NULL
  1113              || __libdw_offset_in_section (dbg, IDX_debug_line,
  1114                                            debug_line_offset, 1) != 0)
  1115            return -1;
  1116
  1117          const unsigned char *linep = data->d_buf + debug_line_offset;
  1118          const unsigned char *lineendp = data->d_buf + data->d_size;
  1119
  (gdb)
  1120          struct files_lines_s *node = libdw_alloc (dbg, struct files_lines_s,
  1121                                                    sizeof *node, 1);
  1122
  1123          if (read_srclines (dbg, linep, lineendp, comp_dir, address_size,
  1124                             &node->lines, &node->files) != 0)
  1125            return -1;
  1126
  1127          node->debug_line_offset = debug_line_offset;
  1128
  1129          found = tsearch (node, &dbg->files_lines, files_lines_compare);
  (gdb)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 2b45e1b6d0 dwarf_loader: Defer freeing libdw Dwfl handler
So that 'pahole --sort -F dwarf' can defer printing all classes to when
it has all of them processed and sorted.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo fb99cad539 dwarf_loader: Parallel DWARF loading
Tested so far with a typical Linux kernel vmlinux file.

Testing it:

  ⬢[acme@toolbox pahole]$ perf stat -r5 pahole -F dwarf vmlinux > /dev/null

   Performance counter stats for 'pahole -F dwarf vmlinux' (5 runs):

            5,675.97 msec task-clock:u              #    1.000 CPUs utilized            ( +-  0.36% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             736,865      page-faults:u             #  129.898 K/sec                    ( +-  0.00% )
      21,921,617,854      cycles:u                  #    3.864 GHz                      ( +-  0.23% )  (83.34%)
         206,308,275      stalled-cycles-frontend:u #    0.95% frontend cycles idle     ( +-  4.59% )  (83.33%)
       2,186,772,169      stalled-cycles-backend:u  #   10.02% backend cycles idle      ( +-  0.46% )  (83.33%)
      62,272,507,248      instructions:u            #    2.85  insn per cycle
                                                    #    0.03  stalled cycles per insn  ( +-  0.03% )  (83.34%)
      14,967,758,961      branches:u                #    2.639 G/sec                    ( +-  0.03% )  (83.33%)
          65,688,710      branch-misses:u           #    0.44% of all branches          ( +-  0.29% )  (83.33%)

              5.6750 +- 0.0203 seconds time elapsed  ( +-  0.36% )

  ⬢[acme@toolbox pahole]$ perf stat -r5 pahole -F dwarf -j12 vmlinux > /dev/null

   Performance counter stats for 'pahole -F dwarf -j12 vmlinux' (5 runs):

           18,015.77 msec task-clock:u              #    7.669 CPUs utilized            ( +-  2.49% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             739,157      page-faults:u             #   40.726 K/sec                    ( +-  0.01% )
      26,673,502,570      cycles:u                  #    1.470 GHz                      ( +-  0.44% )  (83.12%)
         734,106,744      stalled-cycles-frontend:u #    2.80% frontend cycles idle     ( +-  2.30% )  (83.65%)
       2,258,159,917      stalled-cycles-backend:u  #    8.60% backend cycles idle      ( +-  1.51% )  (83.62%)
      63,347,827,742      instructions:u            #    2.41  insn per cycle
                                                    #    0.04  stalled cycles per insn  ( +-  0.03% )  (83.32%)
      15,242,840,672      branches:u                #  839.841 M/sec                    ( +-  0.03% )  (83.22%)
          73,860,851      branch-misses:u           #    0.48% of all branches          ( +-  0.51% )  (83.09%)

               2.349 +- 0.116 seconds time elapsed  ( +-  4.93% )

  ⬢[acme@toolbox pahole]$

Since this is done in 12 threads and pahole prints as it finishes
processing each CU, the output is not anymore deterministically the same
for all runs.

I'll add a mode where one can ask for the structures to be kept into a
data structure to sort before printing, so that btfdiff can use it with
-j and continue working.

Also since it prints the first struct with a given name, and there are
multiple structures with a given name in the kernel, we get differences
even when we ask just for the sizes (so that we get just one line per
struct):

  ⬢[acme@toolbox pahole]$ pahole -F dwarf --sizes vmlinux > /tmp/pahole--sizes.txt
  ⬢[acme@toolbox pahole]$ pahole -F dwarf -j12 --sizes vmlinux > /tmp/pahole--sizes-j12.txt
  ⬢[acme@toolbox pahole]$ diff -u /tmp/pahole--sizes.txt /tmp/pahole--sizes-j12.txt | head
  --- /tmp/pahole--sizes.txt	2021-07-01 21:56:49.260958678 -0300
  +++ /tmp/pahole--sizes-j12.txt	2021-07-01 21:57:00.322209241 -0300
  @@ -1,20 +1,9 @@
  -list_head	16	0
  -hlist_head	8	0
  -hlist_node	16	0
  -callback_head	16	0
  -file_system_type	72	1
  -qspinlock	4	0
  -qrwlock	8	0
  ⬢[acme@toolbox pahole]$

We can't compare it that way, lets sort both and then try again:

  ⬢[acme@toolbox pahole]$ sort /tmp/pahole--sizes.txt > /tmp/pahole--sizes.txt.sorted
  ⬢[acme@toolbox pahole]$ sort /tmp/pahole--sizes-j12.txt > /tmp/pahole--sizes-j12.txt.sorted
  ⬢[acme@toolbox pahole]$ diff -u /tmp/pahole--sizes.txt.sorted /tmp/pahole--sizes-j12.txt.sorted
  --- /tmp/pahole--sizes.txt.sorted	2021-07-01 21:57:13.841515467 -0300
  +++ /tmp/pahole--sizes-j12.txt.sorted	2021-07-01 21:57:16.771581840 -0300
  @@ -1116,7 +1116,7 @@
   child_latency_info	48	1
   chipset	32	1
   chksum_ctx	4	0
  -chksum_desc_ctx	4	0
  +chksum_desc_ctx	2	0
   cipher_alg	32	0
   cipher_context	16	0
   cipher_test_sglists	1184	0
  @@ -1589,7 +1589,7 @@
   ddebug_query	40	0
   ddebug_table	40	1
   deadline_data	120	1
  -debug_buffer	72	0
  +debug_buffer	64	0
   debugfs_blob_wrapper	16	0
   debugfs_devm_entry	16	0
   debugfs_fsdata	48	1
  @@ -3291,7 +3291,7 @@
   integrity_sysfs_entry	32	0
   intel_agp_driver_description	24	1
   intel_community	96	1
  -intel_community_context	68	0
  +intel_community_context	16	0
   intel_early_ops	16	0
   intel_excl_cntrs	536	0
   intel_excl_states	260	0
  @@ -3619,7 +3619,7 @@
   irqtime	24	0
   irq_work	24	0
   ir_table	16	0
  -irte	4	0
  +irte	16	0
   irte_ga	16	0
   irte_ga_hi	8	0
   irte_ga_lo	8	0
  @@ -4909,7 +4909,7 @@
   pci_platform_pm_ops	64	0
   pci_pme_device	24	0
   pci_raw_ops	16	0
  -pci_root_info	104	0
  +pci_root_info	120	1
   pci_root_res	80	0
   pci_saved_state	64	0
   pciserial_board	24	0
  @@ -5132,10 +5132,10 @@
   pmc_clk	24	0
   pmc_clk_data	24	0
   pmc_data	16	0
  -pmc_dev	144	4
  +pmc_dev	40	1
   pm_clk_notifier_block	32	0
   pm_clock_entry	40	0
  -pmc_reg_map	136	0
  +pmc_reg_map	40	0
   pmic_table	12	0
   pm_message	4	0
   pm_nl_pernet	80	1
  @@ -6388,7 +6388,7 @@
   sw842_hlist_node2	24	0
   sw842_hlist_node4	24	0
   sw842_hlist_node8	32	0
  -sw842_param	59496	2
  +sw842_param	48	1
   swait_queue	24	0
   swait_queue_head	24	1
   swap_cgroup	2	0
  @@ -7942,7 +7942,7 @@
   uprobe_trace_entry_head	8	0
   uprobe_xol_ops	32	0
   urb	184	0
  -urb_priv	32	1
  +urb_priv	8	0
   usb2_lpm_parameters	8	0
   usb3_lpm_parameters	16	0
   usb_anchor	56	0
  ⬢[acme@toolbox pahole]$

I'll check one by one, but looks kinda legit.

Now to fiddle with thread affinities. And then move to threaded BTF
encoding, that at a first test with a single btf_lock in the pahole
stealer ended up producing corrupt BTF, valid just up to a point.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 46ad8c0158 dwarf_loader: Introduce 'dwarf_cus' to group all the DWARF specific per-cus state
Will help reusing in the upcoming multithreading mode.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo d963af9fd8 dwarf_loader: Factor common bits for creating and processing CU
Will be used for the multithreaded loading

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 7569e46d35 core: namespace__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo de4e8b7f17 core: {tag,function,lexblock}__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

They call each other, so do the three at once.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 789ed4e3a2 core: ftype__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 6340cb4627 core: enumeration__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 33e44f5295 core: type__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 2b2014187b core: class__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 50916756d5 core: class_member__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 63992cb02a core: Use namespace->name in class__clone()
Now that we stopped using string indexes, no need for that, just set
namespace->name with the new class name.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 2b9bd83e63 dwarf_loader: Make attr_suffix() handle kabi_prefix
Since we're going to get rid of strings.c.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo daaafeb35f dwarf_loader: Pass conf_load to functions calling attr_string()
As we'll implement that kabi_prefix thing there and without using global
variables.

This is because we're stopping usage of strings.c, where the kabi_prefix
feature was implemented.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo a388aaf489 dwarf_loader: Remove unused strings variable and debug_fmt_ops->{init,exit}()
No need to create that object anymore.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 3d3b7b3287 core: Remove unused debug_fmt_ops->dwarf__strings_ptr()
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo a201149e18 dwarf_loader: No need to strdup() what dwarf_formstring() returns
Conversation with Mark Wieelard, elfutils developer:

  acme | ultimately dwarf_attr->valp for strings point to what? DIE memory that is always there?
  acme | I'm working on pahole and need to keep a pointer to what it returns
  acme | I was strdup()ing what dwarf_formstring() returns, tried removing the strdup() and instead point to what dwarf_formstring() returns and it worked, but I want to know for sure
   mjw | ah, yeah
   mjw | the memory/string returned by dwarf_formstring() is owned by the Dwarf. So long as the Dwarf is active (dwarf_end() hasn't been called and the underlying Elf is valid of course) you can use that string.
  acme | cool!
  acme | I'll use your explanation in the commit log message
  acme | thanks!
  acme | I'm also working on multithreading DWARF loading
   mjw | in most cases it will point directly into the .debug_str section, but it can also be the .debug_line_str section or a string embedded in the .debug_info section, etc.
   mjw | in all cases the Dwarf is responsible for keeping the memory life.

Before:

⬢[acme@toolbox pahole]$ rm -f vmlinux.btf ; perf stat -r5 pahole --btf_encode_detached vmlinux.btf vmlinux && perf stat -r5 btfdiff vmlinux vmlinux.btf

 Performance counter stats for 'pahole --btf_encode_detached vmlinux.btf vmlinux' (5 runs):

          7,802.91 msec task-clock:u              #    0.989 CPUs utilized            ( +-  0.60% )
                 0      context-switches:u        #    0.000 /sec
                 0      cpu-migrations:u          #    0.000 /sec
           871,574      page-faults:u             #  110.568 K/sec                    ( +-  0.00% )
    29,924,977,089      cycles:u                  #    3.796 GHz                      ( +-  0.60% )  (83.32%)
       455,561,473      stalled-cycles-frontend:u #    1.51% frontend cycles idle     ( +-  5.55% )  (83.33%)
     3,874,761,771      stalled-cycles-backend:u  #   12.86% backend cycles idle      ( +-  2.24% )  (83.34%)
    74,812,680,221      instructions:u            #    2.48  insn per cycle
                                                  #    0.05  stalled cycles per insn  ( +-  0.02% )  (83.34%)
    17,624,163,403      branches:u                #    2.236 G/sec                    ( +-  0.03% )  (83.34%)
       128,991,472      branch-misses:u           #    0.73% of all branches          ( +-  0.07% )  (83.33%)

            7.8861 +- 0.0471 seconds time elapsed  ( +-  0.60% )

 Performance counter stats for 'btfdiff vmlinux vmlinux.btf' (5 runs):

          6,323.23 msec task-clock:u              #    1.000 CPUs utilized            ( +-  0.97% )
                 0      context-switches:u        #    0.000 /sec
                 0      cpu-migrations:u          #    0.000 /sec
           826,233      page-faults:u             #  130.852 K/sec                    ( +-  0.00% )
    23,719,098,640      cycles:u                  #    3.756 GHz                      ( +-  0.32% )  (83.35%)
       286,636,981      stalled-cycles-frontend:u #    1.21% frontend cycles idle     ( +-  2.52% )  (83.34%)
     2,821,674,085      stalled-cycles-backend:u  #   11.91% backend cycles idle      ( +-  1.20% )  (83.28%)
    64,095,069,092      instructions:u            #    2.70  insn per cycle
                                                  #    0.04  stalled cycles per insn  ( +-  0.03% )  (83.35%)
    15,398,500,941      branches:u                #    2.439 G/sec                    ( +-  0.02% )  (83.35%)
        80,187,703      branch-misses:u           #    0.52% of all branches          ( +-  0.32% )  (83.34%)

            6.3233 +- 0.0613 seconds time elapsed  ( +-  0.97% )

⬢[acme@toolbox pahole]$

After:

 static struct dwarf_off_ref attr_type(Dwarf_Die *die, uint32_t attr_name)
⬢[acme@toolbox pahole]$ rm -f vmlinux.btf ; perf stat -r5 pahole --btf_encode_detached vmlinux.btf vmlinux && perf stat -r5 btfdiff vmlinux vmlinux.btf

 Performance counter stats for 'pahole --btf_encode_detached vmlinux.btf vmlinux' (5 runs):

          7,008.59 msec task-clock:u              #    0.977 CPUs utilized            ( +-  1.03% )
                 0      context-switches:u        #    0.000 /sec
                 0      cpu-migrations:u          #    0.000 /sec
           796,469      page-faults:u             #  111.073 K/sec                    ( +-  0.00% )
    28,167,752,342      cycles:u                  #    3.928 GHz                      ( +-  0.26% )  (83.32%)
       377,704,478      stalled-cycles-frontend:u #    1.35% frontend cycles idle     ( +-  0.96% )  (83.34%)
     3,758,855,221      stalled-cycles-backend:u  #   13.43% backend cycles idle      ( +-  1.68% )  (83.34%)
    72,453,367,989      instructions:u            #    2.59  insn per cycle
                                                  #    0.05  stalled cycles per insn  ( +-  0.03% )  (83.33%)
    17,110,081,987      branches:u                #    2.386 G/sec                    ( +-  0.02% )  (83.34%)
       116,081,751      branch-misses:u           #    0.68% of all branches          ( +-  0.32% )  (83.33%)

            7.1731 +- 0.0724 seconds time elapsed  ( +-  1.01% )

 Performance counter stats for 'btfdiff vmlinux vmlinux.btf' (5 runs):

          5,768.59 msec task-clock:u              #    1.014 CPUs utilized            ( +-  0.45% )
                 0      context-switches:u        #    0.000 /sec
                 0      cpu-migrations:u          #    0.000 /sec
           751,092      page-faults:u             #  132.237 K/sec                    ( +-  0.00% )
    21,623,439,905      cycles:u                  #    3.807 GHz                      ( +-  0.46% )  (83.34%)
       221,665,165      stalled-cycles-frontend:u #    1.02% frontend cycles idle     ( +-  1.55% )  (83.30%)
     2,860,640,878      stalled-cycles-backend:u  #   13.10% backend cycles idle      ( +-  2.03% )  (83.32%)
    61,757,937,981      instructions:u            #    2.83  insn per cycle
                                                  #    0.04  stalled cycles per insn  ( +-  0.01% )  (83.37%)
    14,873,361,434      branches:u                #    2.619 G/sec                    ( +-  0.02% )  (83.36%)
        65,356,868      branch-misses:u           #    0.44% of all branches          ( +-  0.07% )  (83.35%)

            5.6884 +- 0.0282 seconds time elapsed  ( +-  0.50% )

⬢[acme@toolbox pahole]$

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo a7d789a4f8 core: Make variable->name a real string
For the threaded code we want to access strings in tags at the same time
that the string table may grow in another thread making the previous
pointer invalid, so, to avoid excessive locking, use plain strings.

The way the tools work will either consume the just produced CU straight
away or keep just one copy of each data structure when we keep all CUs
in memory, so lets try stopping using strings_t for strings.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo b5694280ec core: Make label->name a real string
For the threaded code we want to access strings in tags at the same time
that the string table may grow in another thread making the previous
pointer invalid, so, to avoid excessive locking, use plain strings.

The way the tools work will either consume the just produced CU straight
away or keep just one copy of each data structure when we keep all CUs
in memory, so lets try stopping using strings_t for strings.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 713239bc00 core: Make enumerator->name a real string
For the threaded code we want to access strings in tags at the same time
that the string table may grow in another thread making the previous
pointer invalid, so, to avoid excessive locking, use plain strings.

The way the tools work will either consume the just produced CU straight
away or keep just one copy of each data structure when we keep all CUs
in memory, so lets try stopping using strings_t for strings.

For the enumerator->name case we get the bonus of removing the last user
of dwarves__active_loader in the btf_encoder class.

This covers unions, enums, structs and classes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo b99c4008ac core: Make namespace->name a real string
For the threaded code we want to access strings in tags at the same time
that the string table may grow in another thread making the previous
pointer invalid, so, to avoid excessive locking, use plain strings.

The way the tools work will either consume the just produced CU straight
away or keep just one copy of each data structure when we keep all CUs
in memory, so lets try stopping using strings_t for strings.

For the namespace->name case we get the bonus of removing another
user of dwarves__active_loader.

This covers unions, enums, structs and classes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 379a73c6eb core: Make class_member->name a real string
For the threaded code we want to access strings in tags at the same time
that the string table may grow in another thread making the previous
pointer invalid, so, to avoid excessive locking, use plain strings.

The way the tools work will either consume the just produced CU straight
away or keep just one copy of each data structure when we keep all CUs
in memory, so lets try stopping using strings_t for strings.

For the class_member->name case we get the bonus of removing another
user of dwarves__active_loader.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00
Arnaldo Carvalho de Melo 3280cb4176 core: Make parameter->name a real string
For the threaded code we want to access strings in tags at the same time
that the string table may grow in another thread making the previous
pointer invalid, so, to avoid excessive locking, use plain strings.

The way the tools work will either consume the just produced CU straight
away or keep just one copy of each data structure when we keep all CUs
in memory, so lets try stopping using strings_t for strings.

For the parameter->name case we get the bonus of removing a user of
dwarves__active_loader.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:39:46 -03:00