Commit Graph

1871 Commits

Author SHA1 Message Date
Arnaldo Carvalho de Melo e9b83dba79 list: Adopt list_next_entry() from the Linux kernel
We'll use it to traverse the list of opaque btf_encoder entries in
pahole.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 6edae3e768 dwarf_loader: Make hash table size default to 12, faster than 15
The sweet spot for recent kernels, the default is 15 in the tests below,
changing to 12 reduces the time elapsed, make it the new default.

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

  $ sudo perf stat -d -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,101.71 msec task-clock                #    2.752 CPUs utilized            ( +-  0.06% )
               1,682      context-switches          #  207.610 /sec                     ( +-  0.98% )
                   5      cpu-migrations            #    0.592 /sec                     ( +- 15.31% )
              68,870      page-faults               #    8.501 K/sec                    ( +-  0.02% )
      29,205,269,606      cycles                    #    3.605 GHz                      ( +-  0.05% )
      63,448,636,788      instructions              #    2.17  insn per cycle           ( +-  0.00% )
      15,127,493,299      branches                  #    1.867 G/sec                    ( +-  0.00% )
         120,362,476      branch-misses             #    0.80% of all branches          ( +-  0.11% )
      13,967,000,698      L1-dcache-loads           #    1.724 G/sec                    ( +-  0.00% )
         375,052,289      L1-dcache-load-misses     #    2.69% of all L1-dcache accesses  ( +-  0.03% )
          91,506,061      LLC-loads                 #   11.295 M/sec                    ( +-  0.10% )
          27,905,809      LLC-load-misses           #   30.50% of all LL-cache accesses  ( +-  0.16% )

             2.94445 +- 0.00188 seconds time elapsed  ( +-  0.06% )

  $ sudo perf stat -d -r5 pahole --hashbits 12 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits 12 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,681.15 msec task-clock                #    2.702 CPUs utilized            ( +-  0.05% )
               1,660      context-switches          #  216.114 /sec                     ( +-  1.02% )
                   3      cpu-migrations            #    0.365 /sec                     ( +- 13.36% )
              67,794      page-faults               #    8.826 K/sec                    ( +-  0.05% )
      27,692,748,327      cycles                    #    3.605 GHz                      ( +-  0.04% )
      63,041,363,409      instructions              #    2.28  insn per cycle           ( +-  0.00% )
      15,063,798,404      branches                  #    1.961 G/sec                    ( +-  0.00% )
         127,461,737      branch-misses             #    0.85% of all branches          ( +-  0.11% )
      13,974,527,710      L1-dcache-loads           #    1.819 G/sec                    ( +-  0.00% )
         364,775,664      L1-dcache-load-misses     #    2.61% of all L1-dcache accesses  ( +-  0.01% )
          83,685,127      LLC-loads                 #   10.895 M/sec                    ( +-  0.14% )
          19,073,967      LLC-load-misses           #   22.79% of all LL-cache accesses  ( +-  0.30% )

            2.842468 +- 0.000561 seconds time elapsed  ( +-  0.02% )

  $ sudo perf stat -d -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64' (5 runs):

            9,512.30 msec task-clock                #    2.741 CPUs utilized            ( +-  0.54% )
               1,964      context-switches          #  206.469 /sec                     ( +-  2.60% )
                   7      cpu-migrations            #    0.736 /sec                     ( +- 37.25% )
              81,611      page-faults               #    8.579 K/sec                    ( +-  0.08% )
      34,294,568,812      cycles                    #    3.605 GHz                      ( +-  0.53% )
      72,897,384,015      instructions              #    2.13  insn per cycle           ( +-  0.15% )
      17,386,180,039      branches                  #    1.828 G/sec                    ( +-  0.15% )
         136,142,139      branch-misses             #    0.78% of all branches          ( +-  1.06% )
      16,020,787,096      L1-dcache-loads           #    1.684 G/sec                    ( +-  0.19% )
         430,392,585      L1-dcache-load-misses     #    2.69% of all L1-dcache accesses  ( +-  0.37% )
         107,401,567      LLC-loads                 #   11.291 M/sec                    ( +-  0.30% )
          35,172,977      LLC-load-misses           #   32.75% of all LL-cache accesses  ( +-  0.48% )

              3.4710 +- 0.0243 seconds time elapsed  ( +-  0.70% )

  $ sudo perf stat -d -r5 pahole --hashbits 12 -j --btf_encode_detached vmlinux-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64

   Performance counter stats for 'pahole --hashbits 12 -j --btf_encode_detached vmlinux-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64' (5 runs):

            8,929.50 msec task-clock                #    2.700 CPUs utilized            ( +-  0.04% )
               1,907      context-switches          #  213.539 /sec                     ( +-  0.68% )
                   4      cpu-migrations            #    0.426 /sec                     ( +- 30.46% )
              80,661      page-faults               #    9.033 K/sec                    ( +-  0.03% )
      32,213,009,827      cycles                    #    3.607 GHz                      ( +-  0.03% )
      72,345,614,657      instructions              #    2.25  insn per cycle           ( +-  0.00% )
      17,290,227,666      branches                  #    1.936 G/sec                    ( +-  0.00% )
         142,108,954      branch-misses             #    0.82% of all branches          ( +-  0.09% )
      15,998,190,852      L1-dcache-loads           #    1.792 G/sec                    ( +-  0.00% )
         417,872,772      L1-dcache-load-misses     #    2.61% of all L1-dcache accesses  ( +-  0.02% )
          98,061,829      LLC-loads                 #   10.982 M/sec                    ( +-  0.24% )
          24,750,223      LLC-load-misses           #   25.24% of all LL-cache accesses  ( +-  0.17% )

             3.30670 +- 0.00185 seconds time elapsed  ( +-  0.06% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo d2d83be1e2 pahole: Allow tweaking the size of the loader hash tables
To experiment with different sizes as time goes by and the number of symbols in
the kernel grows.

The current default, 15, is suboptimal for the fedora rawhide kernel, we can do
better using 12.

Default: 15:

  $ sudo ~acme/bin/perf stat -d -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,107.73 msec task-clock                #    2.749 CPUs utilized            ( +-  0.05% )
               1,723      context-switches          #  212.562 /sec                     ( +-  1.86% )
                   5      cpu-migrations            #    0.641 /sec                     ( +- 46.07% )
              68,802      page-faults               #    8.486 K/sec                    ( +-  0.05% )
      29,221,590,880      cycles                    #    3.604 GHz                      ( +-  0.04% )
      63,438,138,612      instructions              #    2.17  insn per cycle           ( +-  0.00% )
      15,125,172,105      branches                  #    1.866 G/sec                    ( +-  0.00% )
         119,983,284      branch-misses             #    0.79% of all branches          ( +-  0.06% )
      13,964,248,638      L1-dcache-loads           #    1.722 G/sec                    ( +-  0.00% )
         375,110,346      L1-dcache-load-misses     #    2.69% of all L1-dcache accesses( +-  0.01% )
          91,712,402      LLC-loads                 #   11.312 M/sec                    ( +-  0.14% )
          28,025,289      LLC-load-misses           #   30.56% of all LL-cache accesses ( +-  0.23% )

             2.94980 +- 0.00193 seconds time elapsed  ( +-  0.07% )

  $

New default, to be set in an upcoming patch, 12:

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=12 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=12 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,687.31 msec task-clock                #    2.704 CPUs utilized            ( +-  0.02% )
               1,677      context-switches          #  218.126 /sec                     ( +-  0.70% )
                   4      cpu-migrations            #    0.468 /sec                     ( +- 18.84% )
              67,827      page-faults               #    8.823 K/sec                    ( +-  0.03% )
      27,711,744,058      cycles                    #    3.605 GHz                      ( +-  0.02% )
      63,032,539,630      instructions              #    2.27  insn per cycle           ( +-  0.00% )
      15,062,001,666      branches                  #    1.959 G/sec                    ( +-  0.00% )
         127,728,818      branch-misses             #    0.85% of all branches          ( +-  0.07% )
      13,972,184,314      L1-dcache-loads           #    1.818 G/sec                    ( +-  0.00% )
         364,962,883      L1-dcache-load-misses     #    2.61% of all L1-dcache accesses( +-  0.02% )
          83,969,109      LLC-loads                 #   10.923 M/sec                    ( +-  0.13% )
          19,141,055      LLC-load-misses           #   22.80% of all LL-cache accesses ( +-  0.25% )

            2.842440 +- 0.000952 seconds time elapsed  ( +-  0.03% )

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=11 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=11 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,704.29 msec task-clock                #    2.702 CPUs utilized            ( +-  0.05% )
               1,676      context-switches          #  217.515 /sec                     ( +-  1.04% )
                   2      cpu-migrations            #    0.286 /sec                     ( +- 17.01% )
              67,813      page-faults               #    8.802 K/sec                    ( +-  0.05% )
      27,786,710,102      cycles                    #    3.607 GHz                      ( +-  0.05% )
      63,027,795,038      instructions              #    2.27  insn per cycle           ( +-  0.00% )
      15,066,316,987      branches                  #    1.956 G/sec                    ( +-  0.00% )
         130,431,772      branch-misses             #    0.87% of all branches          ( +-  0.20% )
      13,981,516,517      L1-dcache-loads           #    1.815 G/sec                    ( +-  0.00% )
         369,525,466      L1-dcache-load-misses     #    2.64% of all L1-dcache accesses( +-  0.03% )
          83,328,524      LLC-loads                 #   10.816 M/sec                    ( +-  0.27% )
          18,704,020      LLC-load-misses           #   22.45% of all LL-cache accesses ( +-  0.18% )

             2.85109 +- 0.00281 seconds time elapsed  ( +-  0.10% )

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=8 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=8 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,190.55 msec task-clock                #    2.774 CPUs utilized            ( +-  0.03% )
               1,607      context-switches          #  196.226 /sec                     ( +-  0.67% )
                   3      cpu-migrations            #    0.317 /sec                     ( +- 15.38% )
              67,869      page-faults               #    8.286 K/sec                    ( +-  0.05% )
      29,511,213,192      cycles                    #    3.603 GHz                      ( +-  0.02% )
      63,347,196,598      instructions              #    2.15  insn per cycle           ( +-  0.00% )
      15,198,023,498      branches                  #    1.856 G/sec                    ( +-  0.00% )
         131,113,100      branch-misses             #    0.86% of all branches          ( +-  0.14% )
      14,118,162,884      L1-dcache-loads           #    1.724 G/sec                    ( +-  0.00% )
         422,048,384      L1-dcache-load-misses     #    2.99% of all L1-dcache accesses( +-  0.01% )
         105,878,910      LLC-loads                 #   12.927 M/sec                    ( +-  0.05% )
          21,022,664      LLC-load-misses           #   19.86% of all LL-cache accesses ( +-  0.20% )

            2.952678 +- 0.000858 seconds time elapsed  ( +-  0.03% )

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=13 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=13 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,728.71 msec task-clock                #    2.707 CPUs utilized            ( +-  0.07% )
               1,661      context-switches          #  214.887 /sec                     ( +-  0.70% )
                   2      cpu-migrations            #    0.259 /sec                     ( +- 22.36% )
              67,893      page-faults               #    8.785 K/sec                    ( +-  0.04% )
      27,874,322,843      cycles                    #    3.607 GHz                      ( +-  0.07% )
      63,079,425,815      instructions              #    2.26  insn per cycle           ( +-  0.00% )
      15,067,279,408      branches                  #    1.950 G/sec                    ( +-  0.00% )
         125,706,874      branch-misses             #    0.83% of all branches          ( +-  1.00% )
      13,967,177,801      L1-dcache-loads           #    1.807 G/sec                    ( +-  0.00% )
         363,566,754      L1-dcache-load-misses     #    2.60% of all L1-dcache accesses( +-  0.02% )
          86,583,482      LLC-loads                 #   11.203 M/sec                    ( +-  0.13% )
          20,629,871      LLC-load-misses           #   23.83% of all LL-cache accesses ( +-  0.21% )

             2.85551 +- 0.00124 seconds time elapsed  ( +-  0.04% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo ff7bd7083f core: Allow sizing the loader hash table
For now this will only apply to the dwarf loader, for experimenting as
time passes and kernels grow bigger or with more symbols.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 3068ff36b7 hash: Remove unused hash_32(), hash_ptr()
We're only using hash_64(), so ditch unused parts.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 8eebf70d05 dwarf_loader: Use a per-CU frontend cache for the latest lookup result
Using a debug patch I found that for the Linux (vmlinux from fedora
rawhide) we get this number of hits:

  nr_saved_lookups=2661460

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -d -r1 pahole -j --btf_encode_detached vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64':

            9,515.95 msec task-clock:u              #    2.731 CPUs utilized
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              81,634      page-faults:u             #    8.579 K/sec
      33,468,454,452      cycles:u                  #    3.517 GHz
      72,279,667,117      instructions:u            #    2.16  insn per cycle
      17,256,208,904      branches:u                #    1.813 G/sec
         132,775,067      branch-misses:u           #    0.77% of all branches
      15,840,427,579      L1-dcache-loads:u         #    1.665 G/sec
         417,209,398      L1-dcache-load-misses:u   #    2.63% of all L1-dcache accesses
         105,099,756      LLC-loads:u               #   11.045 M/sec
          35,027,985      LLC-load-misses:u         #   33.33% of all LL-cache accesses

         3.484851710 seconds time elapsed

         9.353155000 seconds user
         0.190730000 seconds sys

  $

After:

  $ perf stat -d -r1 pahole -j --btf_encode_detached \
	vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64-j.btf \
	vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64':

            9,416.17 msec task-clock:u              #    2.744 CPUs utilized
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              81,461      page-faults:u             #    8.651 K/sec
      33,330,006,641      cycles:u                  #    3.540 GHz
      72,301,897,397      instructions:u            #    2.17  insn per cycle
      17,263,694,358      branches:u                #    1.833 G/sec
         133,414,373      branch-misses:u           #    0.77% of all branches
      15,860,141,450      L1-dcache-loads:u         #    1.684 G/sec
         418,816,079      L1-dcache-load-misses:u   #    2.64% of all L1-dcache accesses
         104,960,787      LLC-loads:u               #   11.147 M/sec
          34,629,758      LLC-load-misses:u         #   32.99% of all LL-cache accesses

         3.431376846 seconds time elapsed

         9.294489000 seconds user
         0.146507000 seconds sys

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo a2f1e69848 core: Use obstacks: take 2
Allow asking for obstacks to be used, as for use cases like the btf
encoder where its all allocate sequentially + free everything at
cu__delete(), so obstacks are applicable and provide a good speedup:

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

           10,445.75 msec task-clock:u              #    2.864 CPUs utilized            ( +-  0.08% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             761,926      page-faults:u             #   72.941 K/sec                    ( +-  0.00% )
      31,946,591,661      cycles:u                  #    3.058 GHz                      ( +-  0.05% )
      69,103,520,880      instructions:u            #    2.16  insn per cycle           ( +-  0.00% )
      16,353,763,143      branches:u                #    1.566 G/sec                    ( +-  0.00% )
         122,309,098      branch-misses:u           #    0.75% of all branches          ( +-  0.12% )

             3.64689 +- 0.00437 seconds time elapsed  ( +-  0.12% )

  $ perf record --call-graph lbr pahole -j --btf_encode_detached vmlinux-j.btf vmlinux
  [ perf record: Woken up 52 times to write data ]
  [ perf record: Captured and wrote 13.151 MB perf.data (43058 samples) ]
  $
  $ perf report --no-children
  Samples: 43K of event 'cycles:u', Event count (approx.): 31938442091
    Overhead  Command  Shared Object         Symbol
  +   22.98%  pahole   libdw-0.185.so        [.] __libdw_find_attr
  +    6.69%  pahole   libdwarves.so.1.0.0   [.] cu__hash.isra.0
  +    5.82%  pahole   libdwarves.so.1.0.0   [.] hashmap__insert
  +    5.16%  pahole   libc.so.6             [.] __libc_calloc
  +    5.01%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_is_equiv
  +    3.39%  pahole   libc.so.6             [.] _int_malloc
  +    2.82%  pahole   libc.so.6             [.] __strcmp_avx2
  +    2.22%  pahole   libdw-0.185.so        [.] __libdw_form_val_compute_len
  +    2.13%  pahole   libdw-0.185.so        [.] dwarf_attr
  +    2.08%  pahole   [unknown]             [k] 0xffffffffa0e010a7
  +    1.98%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__find_type_by_ref
  +    1.98%  pahole   libdwarves.so.1.0.0   [.] btf__dedup
  +    1.92%  pahole   libc.so.6             [.] pthread_rwlock_unlock@@GLIBC_2.34
  +    1.92%  pahole   libdwarves.so.1.0.0   [.] btf__add_field
  +    1.92%  pahole   libdwarves.so.1.0.0   [.] list__for_all_tags
  +    1.61%  pahole   libdwarves.so.1.0.0   [.] btf_encoder__encode_cu
  +    1.49%  pahole   libdwarves.so.1.0.0   [.] die__process_class
  +    1.44%  pahole   libc.so.6             [.] pthread_rwlock_tryrdlock@@GLIBC_2.34
  +    1.24%  pahole   libdw-0.185.so        [.] dwarf_siblingof
  +    1.18%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_ref_type
  +    1.12%  pahole   libdwarves.so.1.0.0   [.] strs_hash_fn
  +    1.11%  pahole   libdwarves.so.1.0.0   [.] attr_numeric
  +    1.01%  pahole   libdwarves.so.1.0.0   [.] tag__size

After:

  $ perf stat -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,114.11 msec task-clock:u              #    2.747 CPUs utilized            ( +-  0.09% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              68,792      page-faults:u             #    8.478 K/sec                    ( +-  0.05% )
      28,705,283,249      cycles:u                  #    3.538 GHz                      ( +-  0.09% )
      63,013,653,035      instructions:u            #    2.20  insn per cycle           ( +-  0.00% )
      15,039,319,384      branches:u                #    1.853 G/sec                    ( +-  0.00% )
         118,272,350      branch-misses:u           #    0.79% of all branches          ( +-  0.41% )

             2.95368 +- 0.00221 seconds time elapsed  ( +-  0.07% )

  $
  $ perf record --call-graph lbr pahole -j --btf_encode_detached vmlinux-j.btf vmlinux
  [ perf record: Woken up 40 times to write data ]
  [ perf record: Captured and wrote 10.426 MB perf.data (33733 samples) ]
  $
  $ perf report --no-children
  Samples: 33K of event 'cycles:u', Event count (approx.): 28860426071
    Overhead  Command  Shared Object         Symbol
  +   26.10%  pahole   libdw-0.185.so        [.] __libdw_find_attr
  +    6.13%  pahole   libdwarves.so.1.0.0   [.] cu__hash.isra.0
  +    5.83%  pahole   libdwarves.so.1.0.0   [.] hashmap__insert
  +    5.52%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_is_equiv
  +    3.04%  pahole   libc.so.6             [.] __strcmp_avx2
  +    2.45%  pahole   libdw-0.185.so        [.] __libdw_form_val_compute_len
  +    2.31%  pahole   libdwarves.so.1.0.0   [.] btf__dedup
  +    2.30%  pahole   libdw-0.185.so        [.] dwarf_attr
  +    2.19%  pahole   libc.so.6             [.] pthread_rwlock_unlock@@GLIBC_2.34
  +    2.08%  pahole   libdwarves.so.1.0.0   [.] list__for_all_tags
  +    2.07%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__find_type_by_ref
  +    1.96%  pahole   libdwarves.so.1.0.0   [.] btf__add_field
  +    1.67%  pahole   libc.so.6             [.] pthread_rwlock_tryrdlock@@GLIBC_2.34
  +    1.63%  pahole   libdwarves.so.1.0.0   [.] btf_encoder__encode_cu
  +    1.52%  pahole   libdwarves.so.1.0.0   [.] die__process_class
  +    1.51%  pahole   libdwarves.so.1.0.0   [.] attr_type
  +    1.36%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_ref_type
  +    1.32%  pahole   libdwarves.so.1.0.0   [.] strs_hash_fn
  +    1.25%  pahole   libdw-0.185.so        [.] dwarf_siblingof
  +    1.24%  pahole   libdwarves.so.1.0.0   [.] namespace__recode_dwarf_types
  +    1.17%  pahole   libdwarves.so.1.0.0   [.] attr_numeric
  +    1.16%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__init
  +    1.03%  pahole   libdwarves.so.1.0.0   [.] tag__init
  +    1.01%  pahole   libdwarves.so.1.0.0   [.] tag__size

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo dca86fb8c2 dwarf_loader: Add comment on why we can't ignore lexblocks
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 9d0a7ee0c3 pahole: Ignore DW_TAG_label when encoding BTF
As it will not be used, so don't waste cycles/memory parsing them:

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,487.54 msec task-clock:u              #    2.855 CPUs utilized            ( +-  0.31% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             762,431      page-faults:u             #   72.699 K/sec                    ( +-  0.00% )
      31,994,949,358      cycles:u                  #    3.051 GHz                      ( +-  0.09% )
      69,129,157,311      instructions:u            #    2.16  insn per cycle           ( +-  0.00% )
      16,359,974,001      branches:u                #    1.560 G/sec                    ( +-  0.00% )
         122,800,385      branch-misses:u           #    0.75% of all branches          ( +-  0.23% )

             3.67286 +- 0.00917 seconds time elapsed  ( +-  0.25% )

  $

After:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,431.47 msec task-clock:u              #    2.865 CPUs utilized            ( +-  0.04% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             761,982      page-faults:u             #   73.046 K/sec                    ( +-  0.00% )
      31,885,756,148      cycles:u                  #    3.057 GHz                      ( +-  0.04% )
      69,103,456,079      instructions:u            #    2.17  insn per cycle           ( +-  0.00% )
      16,353,867,606      branches:u                #    1.568 G/sec                    ( +-  0.00% )
         122,023,818      branch-misses:u           #    0.75% of all branches          ( +-  0.09% )

             3.64095 +- 0.00194 seconds time elapsed  ( +-  0.05% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo d40c5f1e20 core: Allow ignoring DW_TAG_label
As the BTF encoder doesn't use this information, so no need parsing it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 51ba831929 pahole: Ignore DW_TAG_inline_expansion when encoding BTF
XXX: for now leave this commented out, see comments in the source code.

As it will not be used, so don't waste cycles/memory parsing them:

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,973.13 msec task-clock:u              #    2.906 CPUs utilized            ( +-  0.13% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,927      page-faults:u             #   72.352 K/sec                    ( +-  0.00% )
      33,585,562,298      cycles:u                  #    3.061 GHz                      ( +-  0.17% )
      72,687,766,428      instructions:u            #    2.16  insn per cycle           ( +-  0.15% )
      17,198,056,478      branches:u                #    1.567 G/sec                    ( +-  0.16% )
         129,011,360      branch-misses:u           #    0.75% of all branches          ( +-  0.53% )

              3.7760 +- 0.0158 seconds time elapsed  ( +-  0.42% )

  $

After:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,487.54 msec task-clock:u              #    2.855 CPUs utilized            ( +-  0.31% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             762,431      page-faults:u             #   72.699 K/sec                    ( +-  0.00% )
      31,994,949,358      cycles:u                  #    3.051 GHz                      ( +-  0.09% )
      69,129,157,311      instructions:u            #    2.16  insn per cycle           ( +-  0.00% )
      16,359,974,001      branches:u                #    1.560 G/sec                    ( +-  0.00% )
         122,800,385      branch-misses:u           #    0.75% of all branches          ( +-  0.23% )

             3.67286 +- 0.00917 seconds time elapsed  ( +-  0.25% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:25 -03:00
Arnaldo Carvalho de Melo 9038638891 core: Allow ignoring DW_TAG_inline_expansion
As the BTF encoder doesn't use this information, so no need parsing it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:39:31 -03:00
Arnaldo Carvalho de Melo 20757745f0 pahole: Allow encoding BTF with parallel DWARF loading
By adding a lock to serialize access to btf_encoder__encode_cu().

This works and allows a speedup in BTF encoding, but its too brute
force, the right thing to do is have per-thread BTF encoders and then
at the end merge everything in a last pass.

But pick the low hanging fruits now.

On a machine with 4 cores, no HT:

  $ grep "model name" -m1 /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Non-parallel:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf vmlinux' (5 runs):

            8,580.19 msec task-clock:u              #    1.000 CPUs utilized            ( +-  0.08% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             795,451      page-faults:u             #   92.708 K/sec                    ( +-  0.00% )
      29,151,924,821      cycles:u                  #    3.398 GHz                      ( +-  0.11% )
      70,947,245,709      instructions:u            #    2.43  insn per cycle           ( +-  0.00% )
      16,791,160,182      branches:u                #    1.957 G/sec                    ( +-  0.00% )
         120,793,994      branch-misses:u           #    0.72% of all branches          ( +-  1.04% )

             8.58192 +- 0.00686 seconds time elapsed  ( +-  0.08% )
  $

Parallel:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux-j.btf -j vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux-j.btf -j vmlinux' (5 runs):

           10,962.45 msec task-clock:u              #    2.914 CPUs utilized            ( +-  0.15% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,915      page-faults:u             #   72.421 K/sec                    ( +-  0.00% )
      33,552,130,646      cycles:u                  #    3.061 GHz                      ( +-  0.16% )
      72,778,320,572      instructions:u            #    2.17  insn per cycle           ( +-  0.12% )
      17,220,541,136      branches:u                #    1.571 G/sec                    ( +-  0.13% )
         129,353,767      branch-misses:u           #    0.75% of all branches          ( +-  0.48% )

              3.7614 +- 0.0141 seconds time elapsed  ( +-  0.38% )

  $

That CPUs utilized should go all the way to 4 when we parallelize the
BTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:39:02 -03:00
Arnaldo Carvalho de Melo 5a85d9a450 core: Zero out unused entries when extending ptr_table array in ptr_table__add()
Otherwise we may end up accessing invalid pointers and crashing.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:39:02 -03:00
Arnaldo Carvalho de Melo d133569bd0 pahole: No need to read DW_AT_alignment when encoding BTF
No need to read the DW_AT_alignment, not used in BTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:38:58 -03:00
Arnaldo Carvalho de Melo 21a41e5386 dwarf_loader: Allow asking not to read the DW_AT_alignment attribute
As this isn't present in most types or struct members, which ends up
making dwarf_attr() call libdw_find_attr() that will do a linear search
on all the attributes.

We don't use this in the BTF encoder, so no point in reading that.

This will be used in pahole in the following cset.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:38:09 -03:00
Arnaldo Carvalho de Melo 1ef1639039 dwarf_loader: Do not look for non-C DWARF attributes in C CUs
Avoid looking for attributes that doesn't apply to the C language, such
as DW_AT_virtuality (virtual, pure_virtual), DW_AT_accessibility
(public, protected, private) and DW_AT_const_value.

Looking for those attributes in class_member__new() makes
libdw_find_attr() linearly search all attributes for a die, which
appears on profiling.

Before:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf -j vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf -j vmlinux' (5 runs):

           11,239.99 msec task-clock:u              #    2.921 CPUs utilized    ( +-  0.08% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,897      page-faults:u             #   70.631 K/sec            ( +-  0.00% )
      34,593,518,484      cycles:u                  #    3.078 GHz              ( +-  0.05% )
      75,592,805,563      instructions:u            #    2.19  insn per cycle   ( +-  0.00% )
      17,923,046,622      branches:u                #    1.595 G/sec            ( +-  0.00% )
         131,080,371      branch-misses:u           #    0.73% of all branches  ( +-  0.18% )

              3.84794 +- 0.00327 seconds time elapsed  ( +-  0.09% )
  $

After:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf -j vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf -j vmlinux' (5 runs):

           11,178.28 msec task-clock:u              #    2.929 CPUs utilized            ( +-  0.12% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,890      page-faults:u             #   71.021 K/sec                    ( +-  0.00% )
      34,378,886,265      cycles:u                  #    3.076 GHz                      ( +-  0.13% )
      75,523,849,140      instructions:u            #    2.20  insn per cycle           ( +-  0.12% )
      17,907,573,910      branches:u                #    1.602 G/sec                    ( +-  0.12% )
         130,137,529      branch-misses:u           #    0.73% of all branches          ( +-  0.50% )

              3.8165 +- 0.0137 seconds time elapsed  ( +-  0.36% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 88265eab35 core: Add cu__is_c() to check if the CU language is C
We'll use this to avoid looking for attributes that doesn't apply to the
C language, such as DW_AT_virtuality (virtual, pure_virtual) and
DW_AT_accessibility (public, protected, private),

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 1caed1c443 dwarf_loader: Add a lock around dwarf_decl_file() and dwarf_decl_line() calls
As this ends up racing on a tsearch() call, probably for some libdw
cache that gets updated/lookedup in concurrent pahole threads (-j N).

This cures the following, a patch for libdw will be cooked up and sent.

  (gdb) run -j -I -F dwarf vmlinux > /dev/null
  Starting program: /var/home/acme/git/pahole/build/pahole -j -I -F dwarf vmlinux > /dev/null
  warning: Expected absolute pathname for libpthread in the inferior, but got .gnu_debugdata for /lib64/libpthread.so.0.
  warning: File "/usr/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
  warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
  [New LWP 844789]
  [New LWP 844790]
  [New LWP 844791]
  [New LWP 844792]
  [New LWP 844793]
  [New LWP 844794]
  [New LWP 844795]
  [New LWP 844796]
  [New LWP 844797]
  [New LWP 844798]
  [New LWP 844799]
  [New LWP 844800]
  [New LWP 844801]
  [New LWP 844802]
  [New LWP 844803]
  [New LWP 844804]
  [New LWP 844805]
  [New LWP 844806]
  [New LWP 844807]
  [New LWP 844808]
  [New LWP 844809]
  [New LWP 844810]
  [New LWP 844811]
  [New LWP 844812]
  [New LWP 844813]
  [New LWP 844814]

  Thread 2 "pahole" received signal SIGSEGV, Segmentation fault.
  [Switching to LWP 844789]
  0x00007ffff7dfa321 in ?? () from /lib64/libc.so.6
  (gdb) bt
  #0  0x00007ffff7dfa321 in ?? () from /lib64/libc.so.6
  #1  0x00007ffff7dfa4bb in ?? () from /lib64/libc.so.6
  #2  0x00007ffff7f5eaa6 in __libdw_getsrclines (dbg=0x4a7f90, debug_line_offset=10383710, comp_dir=0x7ffff3c29f01 "/var/home/acme/git/build/v5.13.0-rc6+", address_size=address_size@entry=8, linesp=linesp@entry=0x7fffcfe04ba0, filesp=filesp@entry=0x7fffcfe04ba8)
      at dwarf_getsrclines.c:1129
  #3  0x00007ffff7f5ed14 in dwarf_getsrclines (cudie=cudie@entry=0x7fffd210caf0, lines=lines@entry=0x7fffd210cac0, nlines=nlines@entry=0x7fffd210cac8) at dwarf_getsrclines.c:1213
  #4  0x00007ffff7f64883 in dwarf_decl_file (die=<optimized out>) at dwarf_decl_file.c:66
  #5  0x0000000000425f24 in tag__init (tag=0x7fff0421b710, cu=0x7fffcc001e40, die=0x7fffd210cd30) at /var/home/acme/git/pahole/dwarf_loader.c:476
  #6  0x00000000004262ec in namespace__init (namespace=0x7fff0421b710, die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:576
  #7  0x00000000004263ac in type__init (type=0x7fff0421b710, die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:595
  #8  0x00000000004264d1 in type__new (die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:614
  #9  0x0000000000427ba6 in die__create_new_typedef (die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:1212
  #10 0x0000000000428df5 in __die__process_tag (die=0x7fffd210cd30, cu=0x7fffcc001e40, top_level=1, fn=0x45cee0 <__FUNCTION__.10> "die__process_unit", conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:1823
  #11 0x0000000000428ea1 in die__process_unit (die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:1848
  #12 0x0000000000429e45 in die__process (die=0x7fffd210ce20, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:2311
  #13 0x0000000000429ecb in die__process_and_recode (die=0x7fffd210ce20, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:2326
  #14 0x000000000042a9d6 in dwarf_cus__create_and_process_cu (dcus=0x7fffffffddc0, cu_die=0x7fffd210ce20, pointer_size=8 '\b') at /var/home/acme/git/pahole/dwarf_loader.c:2644
  #15 0x000000000042ab28 in dwarf_cus__process_cu_thread (arg=0x7fffffffddc0) at /var/home/acme/git/pahole/dwarf_loader.c:2687
  #16 0x00007ffff7ed6299 in start_thread () from /lib64/libpthread.so.0
  #17 0x00007ffff7dfe353 in ?? () from /lib64/libc.so.6
  (gdb)
  (gdb) fr 2
  1085
  (gdb) list files_lines_compare
  1086    static int
  1087    files_lines_compare (const void *p1, const void *p2)
  1088    {
  1089	  const struct files_lines_s *t1 = p1;
  1090	  const struct files_lines_s *t2 = p2;
  1091
  1092	  if (t1->debug_line_offset < t2->debug_line_offset)
  (gdb)
  1093        return -1;
  1094	  if (t1->debug_line_offset > t2->debug_line_offset)
  1095        return 1;
  1096
  1097	  return 0;
  1098    }
  1099
  1100    int
  1101    internal_function
  1102    __libdw_getsrclines (Dwarf *dbg, Dwarf_Off debug_line_offset,
  (gdb) list __libdw_getsrclines
  1100    int
  1101    internal_function
  1102    __libdw_getsrclines (Dwarf *dbg, Dwarf_Off debug_line_offset,
  1103                         const char *comp_dir, unsigned address_size,
  1104                         Dwarf_Lines **linesp, Dwarf_Files **filesp)
  1105    {
  1106	  struct files_lines_s fake = { .debug_line_offset = debug_line_offset };
  1107	  struct files_lines_s **found = tfind (&fake, &dbg->files_lines,
  1108                                            files_lines_compare);
  1109	  if (found == NULL)
  (gdb)
  1110        {
  1111          Elf_Data *data = __libdw_checked_get_data (dbg, IDX_debug_line);
  1112          if (data == NULL
  1113              || __libdw_offset_in_section (dbg, IDX_debug_line,
  1114                                            debug_line_offset, 1) != 0)
  1115            return -1;
  1116
  1117          const unsigned char *linep = data->d_buf + debug_line_offset;
  1118          const unsigned char *lineendp = data->d_buf + data->d_size;
  1119
  (gdb)
  1120          struct files_lines_s *node = libdw_alloc (dbg, struct files_lines_s,
  1121                                                    sizeof *node, 1);
  1122
  1123          if (read_srclines (dbg, linep, lineendp, comp_dir, address_size,
  1124                             &node->lines, &node->files) != 0)
  1125            return -1;
  1126
  1127          node->debug_line_offset = debug_line_offset;
  1128
  1129          found = tsearch (node, &dbg->files_lines, files_lines_compare);
  (gdb)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo dd13708f2f btfdiff: Use multithreaded DWARF loading
Quite a few cases of types with the same name, will add a
--exclude-types option to filter those, and study BTF dedup to see what
it does in this case.

  $ btfdiff vmlinux
  --- /tmp/btfdiff.dwarf.BgsYYn	2021-07-06 17:03:07.471814114 -0300
  +++ /tmp/btfdiff.btf.Ene2Ug	2021-07-06 17:03:07.714819609 -0300
  @@ -23627,12 +23627,15 @@ struct deadline_data {
   };
   struct debug_buffer {
   	ssize_t                    (*fill_func)(struct debug_buffer *); /*     0     8 */
  -	struct ohci_hcd *          ohci;                 /*     8     8 */
  +	struct usb_bus *           bus;                  /*     8     8 */
   	struct mutex               mutex;                /*    16    32 */
   	size_t                     count;                /*    48     8 */
  -	char *                     page;                 /*    56     8 */
  +	char *                     output_buf;           /*    56     8 */
  +	/* --- cacheline 1 boundary (64 bytes) --- */
  +	size_t                     alloc_size;           /*    64     8 */

  -	/* size: 64, cachelines: 1, members: 5 */
  +	/* size: 72, cachelines: 2, members: 6 */
  +	/* last cacheline: 8 bytes */
   };
   struct debug_reply_data {
   	struct ethnl_reply_data    base;                 /*     0     8 */
  @@ -47930,11 +47933,12 @@ struct intel_community {
   	/* last cacheline: 32 bytes */
   };
   struct intel_community_context {
  -	u32 *                      intmask;              /*     0     8 */
  -	u32 *                      hostown;              /*     8     8 */
  +	unsigned int               intr_lines[16];       /*     0    64 */
  +	/* --- cacheline 1 boundary (64 bytes) --- */
  +	u32                        saved_intmask;        /*    64     4 */

  -	/* size: 16, cachelines: 1, members: 2 */
  -	/* last cacheline: 16 bytes */
  +	/* size: 68, cachelines: 2, members: 2 */
  +	/* last cacheline: 4 bytes */
   };
   struct intel_early_ops {
   	resource_size_t            (*stolen_size)(int, int, int); /*     0     8 */
  @@ -52600,64 +52604,19 @@ struct irqtime {
   	/* size: 24, cachelines: 1, members: 4 */
   	/* last cacheline: 24 bytes */
   };
  -struct irte {
  -	union {
  -		struct {
  -			__u64      present:1;            /*     0: 0  8 */
  -			__u64      fpd:1;                /*     0: 1  8 */
  -			__u64      __res0:6;             /*     0: 2  8 */
  -			__u64      avail:4;              /*     0: 8  8 */
  -			__u64      __res1:3;             /*     0:12  8 */
  -			__u64      pst:1;                /*     0:15  8 */
  -			__u64      vector:8;             /*     0:16  8 */
  -			__u64      __res2:40;            /*     0:24  8 */
  -		};                                       /*     0     8 */
  -		struct {
  -			__u64      r_present:1;          /*     0: 0  8 */
  -			__u64      r_fpd:1;              /*     0: 1  8 */
  -			__u64      dst_mode:1;           /*     0: 2  8 */
  -			__u64      redir_hint:1;         /*     0: 3  8 */
  -			__u64      trigger_mode:1;       /*     0: 4  8 */
  -			__u64      dlvry_mode:3;         /*     0: 5  8 */
  -			__u64      r_avail:4;            /*     0: 8  8 */
  -			__u64      r_res0:4;             /*     0:12  8 */
  -			__u64      r_vector:8;           /*     0:16  8 */
  -			__u64      r_res1:8;             /*     0:24  8 */
  -			__u64      dest_id:32;           /*     0:32  8 */
  -		};                                       /*     0     8 */
  -		struct {
  -			__u64      p_present:1;          /*     0: 0  8 */
  -			__u64      p_fpd:1;              /*     0: 1  8 */
  -			__u64      p_res0:6;             /*     0: 2  8 */
  -			__u64      p_avail:4;            /*     0: 8  8 */
  -			__u64      p_res1:2;             /*     0:12  8 */
  -			__u64      p_urgent:1;           /*     0:14  8 */
  -			__u64      p_pst:1;              /*     0:15  8 */
  -			__u64      p_vector:8;           /*     0:16  8 */
  -			__u64      p_res2:14;            /*     0:24  8 */
  -			__u64      pda_l:26;             /*     0:38  8 */
  -		};                                       /*     0     8 */
  -		__u64              low;                  /*     0     8 */
  -	};                                               /*     0     8 */
  -	union {
  -		struct {
  -			__u64      sid:16;               /*     8: 0  8 */
  -			__u64      sq:2;                 /*     8:16  8 */
  -			__u64      svt:2;                /*     8:18  8 */
  -			__u64      __res3:44;            /*     8:20  8 */
  -		};                                       /*     8     8 */
  -		struct {
  -			__u64      p_sid:16;             /*     8: 0  8 */
  -			__u64      p_sq:2;               /*     8:16  8 */
  -			__u64      p_svt:2;              /*     8:18  8 */
  -			__u64      p_res3:12;            /*     8:20  8 */
  -			__u64      pda_h:32;             /*     8:32  8 */
  -		};                                       /*     8     8 */
  -		__u64              high;                 /*     8     8 */
  -	};                                               /*     8     8 */
  -
  -	/* size: 16, cachelines: 1, members: 2 */
  -	/* last cacheline: 16 bytes */
  +union irte {
  +	u32                        val;                /*     0     4 */
  +	struct {
  +		u32                valid:1;            /*     0: 0  4 */
  +		u32                no_fault:1;         /*     0: 1  4 */
  +		u32                int_type:3;         /*     0: 2  4 */
  +		u32                rq_eoi:1;           /*     0: 5  4 */
  +		u32                dm:1;               /*     0: 6  4 */
  +		u32                rsvd_1:1;           /*     0: 7  4 */
  +		u32                destination:8;      /*     0: 8  4 */
  +		u32                vector:8;           /*     0:16  4 */
  +		u32                rsvd_2:8;           /*     0:24  4 */
  +	} fields;                                      /*     0     4 */
   };
   struct irte_ga {
   	union irte_ga_lo           lo;                   /*     0     8 */
  @@ -66862,12 +66821,13 @@ struct netlbl_domhsh_tbl {
   	/* last cacheline: 16 bytes */
   };
   struct netlbl_domhsh_walk_arg {
  -	struct netlbl_audit *      audit_info;           /*     0     8 */
  -	u32                        doi;                  /*     8     4 */
  +	struct netlink_callback *  nl_cb;                /*     0     8 */
  +	struct sk_buff *           skb;                  /*     8     8 */
  +	u32                        seq;                  /*    16     4 */

  -	/* size: 16, cachelines: 1, members: 2 */
  +	/* size: 24, cachelines: 1, members: 3 */
   	/* padding: 4 */
  -	/* last cacheline: 16 bytes */
  +	/* last cacheline: 24 bytes */
   };
   struct netlbl_dommap_def {
   	u32                        type;                 /*     0     4 */
  @@ -72907,20 +72867,16 @@ struct pci_raw_ops {
   	/* last cacheline: 16 bytes */
   };
   struct pci_root_info {
  -	struct list_head           list;                 /*     0    16 */
  -	char                       name[12];             /*    16    12 */
  -
  -	/* XXX 4 bytes hole, try to pack */
  -
  -	struct list_head           resources;            /*    32    16 */
  -	struct resource            busn;                 /*    48    64 */
  -	/* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */
  -	int                        node;                 /*   112     4 */
  -	int                        link;                 /*   116     4 */
  +	struct acpi_pci_root_info  common;               /*     0    56 */
  +	struct pci_sysdata         sd;                   /*    56    40 */
  +	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
  +	bool                       mcfg_added;           /*    96     1 */
  +	u8                         start_bus;            /*    97     1 */
  +	u8                         end_bus;              /*    98     1 */

  -	/* size: 120, cachelines: 2, members: 6 */
  -	/* sum members: 116, holes: 1, sum holes: 4 */
  -	/* last cacheline: 56 bytes */
  +	/* size: 104, cachelines: 2, members: 5 */
  +	/* padding: 5 */
  +	/* last cacheline: 40 bytes */
   };
   struct pci_root_res {
   	struct list_head           list;                 /*     0    16 */
  @@ -76415,25 +76371,66 @@ struct pmc_dev {

   	/* XXX 4 bytes hole, try to pack */

  -	void *                     regmap;               /*     8     8 */
  +	void *                     regbase;              /*     8     8 */
   	const struct pmc_reg_map  * map;                 /*    16     8 */
   	struct dentry *            dbgfs_dir;            /*    24     8 */
  -	bool                       init;                 /*    32     1 */
  +	int                        pmc_xram_read_bit;    /*    32     4 */

  -	/* size: 40, cachelines: 1, members: 5 */
  -	/* sum members: 29, holes: 1, sum holes: 4 */
  -	/* padding: 7 */
  -	/* last cacheline: 40 bytes */
  +	/* XXX 4 bytes hole, try to pack */
  +
  +	struct mutex               lock;                 /*    40    32 */
  +	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
  +	bool                       check_counters;       /*    72     1 */
  +
  +	/* XXX 7 bytes hole, try to pack */
  +
  +	u64                        pc10_counter;         /*    80     8 */
  +	u64                        s0ix_counter;         /*    88     8 */
  +	int                        num_lpm_modes;        /*    96     4 */
  +	int                        lpm_en_modes[8];      /*   100    32 */
  +
  +	/* XXX 4 bytes hole, try to pack */
  +
  +	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
  +	u32 *                      lpm_req_regs;         /*   136     8 */
  +
  +	/* size: 144, cachelines: 3, members: 12 */
  +	/* sum members: 125, holes: 4, sum holes: 19 */
  +	/* last cacheline: 16 bytes */
   };
   struct pmc_reg_map {
  -	const struct pmc_bit_map  * d3_sts_0;            /*     0     8 */
  -	const struct pmc_bit_map  * d3_sts_1;            /*     8     8 */
  -	const struct pmc_bit_map  * func_dis;            /*    16     8 */
  -	const struct pmc_bit_map  * func_dis_2;          /*    24     8 */
  -	const struct pmc_bit_map  * pss;                 /*    32     8 */
  +	const struct pmc_bit_map  * * pfear_sts;         /*     0     8 */
  +	const struct pmc_bit_map  * mphy_sts;            /*     8     8 */
  +	const struct pmc_bit_map  * pll_sts;             /*    16     8 */
  +	const struct pmc_bit_map  * * slps0_dbg_maps;    /*    24     8 */
  +	const struct pmc_bit_map  * ltr_show_sts;        /*    32     8 */
  +	const struct pmc_bit_map  * msr_sts;             /*    40     8 */
  +	const struct pmc_bit_map  * * lpm_sts;           /*    48     8 */
  +	const u32                  slp_s0_offset;        /*    56     4 */
  +	const int                  slp_s0_res_counter_step; /*    60     4 */
  +	/* --- cacheline 1 boundary (64 bytes) --- */
  +	const u32                  ltr_ignore_offset;    /*    64     4 */
  +	const int                  regmap_length;        /*    68     4 */
  +	const u32                  ppfear0_offset;       /*    72     4 */
  +	const int                  ppfear_buckets;       /*    76     4 */
  +	const u32                  pm_cfg_offset;        /*    80     4 */
  +	const int                  pm_read_disable_bit;  /*    84     4 */
  +	const u32                  slps0_dbg_offset;     /*    88     4 */
  +	const u32                  ltr_ignore_max;       /*    92     4 */
  +	const u32                  pm_vric1_offset;      /*    96     4 */
  +	const int                  lpm_num_maps;         /*   100     4 */
  +	const int                  lpm_res_counter_step_x2; /*   104     4 */
  +	const u32                  lpm_sts_latch_en_offset; /*   108     4 */
  +	const u32                  lpm_en_offset;        /*   112     4 */
  +	const u32                  lpm_priority_offset;  /*   116     4 */
  +	const u32                  lpm_residency_offset; /*   120     4 */
  +	const u32                  lpm_status_offset;    /*   124     4 */
  +	/* --- cacheline 2 boundary (128 bytes) --- */
  +	const u32                  lpm_live_status_offset; /*   128     4 */
  +	const u32                  etr3_offset;          /*   132     4 */

  -	/* size: 40, cachelines: 1, members: 5 */
  -	/* last cacheline: 40 bytes */
  +	/* size: 136, cachelines: 3, members: 27 */
  +	/* last cacheline: 8 bytes */
   };
   struct pmic_table {
   	int                        address;              /*     0     4 */
  @@ -114574,12 +114571,18 @@ struct urb {
   	/* last cacheline: 56 bytes */
   };
   struct urb_priv {
  -	int                        num_tds;              /*     0     4 */
  -	int                        num_tds_done;         /*     4     4 */
  -	struct xhci_td             td[];                 /*     8     0 */
  +	struct ed *                ed;                   /*     0     8 */
  +	u16                        length;               /*     8     2 */
  +	u16                        td_cnt;               /*    10     2 */

  -	/* size: 8, cachelines: 1, members: 3 */
  -	/* last cacheline: 8 bytes */
  +	/* XXX 4 bytes hole, try to pack */
  +
  +	struct list_head           pending;              /*    16    16 */
  +	struct td *                td[];                 /*    32     0 */
  +
  +	/* size: 32, cachelines: 1, members: 5 */
  +	/* sum members: 28, holes: 1, sum holes: 4 */
  +	/* last cacheline: 32 bytes */
   };
   struct usb2_lpm_parameters {
   	unsigned int               besl;                 /*     0     4 */
  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo f95f783849 btfdiff: Use --sort for pretty printing from both BTF and DWARF
$ btfdiff vmlinux
  $

As expected, no change, both sort to the same output, now lets add
--jobs to the DWARF case.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 3e1c7a2077 pahole: Introduce --sort
To ask for sorting output, initially by name.

This is needed in 'btfdiff' to diff the output of 'pahole -F dwarf
--jobs N', where N threads will go on consuming DWARF compile units and
and pretty printing them, producing a non deterministic output.

So we need to sort the output for both BTF and DWARF, and then diff
them.

This is still not enough for some cases where different types have the
same name, things like "usb_priv" that exists in multiple DWARF compile
units, the first processed is "winning", i.e. being the only one
considered.

I have to look how BTF handles this to adopt a similar algorithm and
keep btfdiff usable as a regression test for the BTF and DWARF loader
and the BTF encoder.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 967290bc71 pahole: Store the class id in 'struct structure' as well
Needed to defer calling printing classes to after we have all sorted out
by name with the upcoming 'pahole --sort' option, needed to make it
possible to compare 'pahole -F btf' with 'pahole -F dwarf -j', as the
multithreaded DWARF loader will not have all classes in a deterministic
order. This is needed for 'btfdiff'.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 2b45e1b6d0 dwarf_loader: Defer freeing libdw Dwfl handler
So that 'pahole --sort -F dwarf' can defer printing all classes to when
it has all of them processed and sorted.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 35845e7e41 core: Provide a way to store per loader info in cus and an exit function
So that loaders such as the DWARF one can store there the DWARF handler
(Dwfl) that needs to stay live while tools use the core tags (struct
class, struct union, struct tag, etc) because they point to strings that
are managed by Dwfl, so we have to defer dwfl_end() to after tools are
done processing the core tags.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 5365c45177 pahole: Keep class + cu in tree of structures
We'll use it for ordering by name.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo fb99cad539 dwarf_loader: Parallel DWARF loading
Tested so far with a typical Linux kernel vmlinux file.

Testing it:

  ⬢[acme@toolbox pahole]$ perf stat -r5 pahole -F dwarf vmlinux > /dev/null

   Performance counter stats for 'pahole -F dwarf vmlinux' (5 runs):

            5,675.97 msec task-clock:u              #    1.000 CPUs utilized            ( +-  0.36% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             736,865      page-faults:u             #  129.898 K/sec                    ( +-  0.00% )
      21,921,617,854      cycles:u                  #    3.864 GHz                      ( +-  0.23% )  (83.34%)
         206,308,275      stalled-cycles-frontend:u #    0.95% frontend cycles idle     ( +-  4.59% )  (83.33%)
       2,186,772,169      stalled-cycles-backend:u  #   10.02% backend cycles idle      ( +-  0.46% )  (83.33%)
      62,272,507,248      instructions:u            #    2.85  insn per cycle
                                                    #    0.03  stalled cycles per insn  ( +-  0.03% )  (83.34%)
      14,967,758,961      branches:u                #    2.639 G/sec                    ( +-  0.03% )  (83.33%)
          65,688,710      branch-misses:u           #    0.44% of all branches          ( +-  0.29% )  (83.33%)

              5.6750 +- 0.0203 seconds time elapsed  ( +-  0.36% )

  ⬢[acme@toolbox pahole]$ perf stat -r5 pahole -F dwarf -j12 vmlinux > /dev/null

   Performance counter stats for 'pahole -F dwarf -j12 vmlinux' (5 runs):

           18,015.77 msec task-clock:u              #    7.669 CPUs utilized            ( +-  2.49% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             739,157      page-faults:u             #   40.726 K/sec                    ( +-  0.01% )
      26,673,502,570      cycles:u                  #    1.470 GHz                      ( +-  0.44% )  (83.12%)
         734,106,744      stalled-cycles-frontend:u #    2.80% frontend cycles idle     ( +-  2.30% )  (83.65%)
       2,258,159,917      stalled-cycles-backend:u  #    8.60% backend cycles idle      ( +-  1.51% )  (83.62%)
      63,347,827,742      instructions:u            #    2.41  insn per cycle
                                                    #    0.04  stalled cycles per insn  ( +-  0.03% )  (83.32%)
      15,242,840,672      branches:u                #  839.841 M/sec                    ( +-  0.03% )  (83.22%)
          73,860,851      branch-misses:u           #    0.48% of all branches          ( +-  0.51% )  (83.09%)

               2.349 +- 0.116 seconds time elapsed  ( +-  4.93% )

  ⬢[acme@toolbox pahole]$

Since this is done in 12 threads and pahole prints as it finishes
processing each CU, the output is not anymore deterministically the same
for all runs.

I'll add a mode where one can ask for the structures to be kept into a
data structure to sort before printing, so that btfdiff can use it with
-j and continue working.

Also since it prints the first struct with a given name, and there are
multiple structures with a given name in the kernel, we get differences
even when we ask just for the sizes (so that we get just one line per
struct):

  ⬢[acme@toolbox pahole]$ pahole -F dwarf --sizes vmlinux > /tmp/pahole--sizes.txt
  ⬢[acme@toolbox pahole]$ pahole -F dwarf -j12 --sizes vmlinux > /tmp/pahole--sizes-j12.txt
  ⬢[acme@toolbox pahole]$ diff -u /tmp/pahole--sizes.txt /tmp/pahole--sizes-j12.txt | head
  --- /tmp/pahole--sizes.txt	2021-07-01 21:56:49.260958678 -0300
  +++ /tmp/pahole--sizes-j12.txt	2021-07-01 21:57:00.322209241 -0300
  @@ -1,20 +1,9 @@
  -list_head	16	0
  -hlist_head	8	0
  -hlist_node	16	0
  -callback_head	16	0
  -file_system_type	72	1
  -qspinlock	4	0
  -qrwlock	8	0
  ⬢[acme@toolbox pahole]$

We can't compare it that way, lets sort both and then try again:

  ⬢[acme@toolbox pahole]$ sort /tmp/pahole--sizes.txt > /tmp/pahole--sizes.txt.sorted
  ⬢[acme@toolbox pahole]$ sort /tmp/pahole--sizes-j12.txt > /tmp/pahole--sizes-j12.txt.sorted
  ⬢[acme@toolbox pahole]$ diff -u /tmp/pahole--sizes.txt.sorted /tmp/pahole--sizes-j12.txt.sorted
  --- /tmp/pahole--sizes.txt.sorted	2021-07-01 21:57:13.841515467 -0300
  +++ /tmp/pahole--sizes-j12.txt.sorted	2021-07-01 21:57:16.771581840 -0300
  @@ -1116,7 +1116,7 @@
   child_latency_info	48	1
   chipset	32	1
   chksum_ctx	4	0
  -chksum_desc_ctx	4	0
  +chksum_desc_ctx	2	0
   cipher_alg	32	0
   cipher_context	16	0
   cipher_test_sglists	1184	0
  @@ -1589,7 +1589,7 @@
   ddebug_query	40	0
   ddebug_table	40	1
   deadline_data	120	1
  -debug_buffer	72	0
  +debug_buffer	64	0
   debugfs_blob_wrapper	16	0
   debugfs_devm_entry	16	0
   debugfs_fsdata	48	1
  @@ -3291,7 +3291,7 @@
   integrity_sysfs_entry	32	0
   intel_agp_driver_description	24	1
   intel_community	96	1
  -intel_community_context	68	0
  +intel_community_context	16	0
   intel_early_ops	16	0
   intel_excl_cntrs	536	0
   intel_excl_states	260	0
  @@ -3619,7 +3619,7 @@
   irqtime	24	0
   irq_work	24	0
   ir_table	16	0
  -irte	4	0
  +irte	16	0
   irte_ga	16	0
   irte_ga_hi	8	0
   irte_ga_lo	8	0
  @@ -4909,7 +4909,7 @@
   pci_platform_pm_ops	64	0
   pci_pme_device	24	0
   pci_raw_ops	16	0
  -pci_root_info	104	0
  +pci_root_info	120	1
   pci_root_res	80	0
   pci_saved_state	64	0
   pciserial_board	24	0
  @@ -5132,10 +5132,10 @@
   pmc_clk	24	0
   pmc_clk_data	24	0
   pmc_data	16	0
  -pmc_dev	144	4
  +pmc_dev	40	1
   pm_clk_notifier_block	32	0
   pm_clock_entry	40	0
  -pmc_reg_map	136	0
  +pmc_reg_map	40	0
   pmic_table	12	0
   pm_message	4	0
   pm_nl_pernet	80	1
  @@ -6388,7 +6388,7 @@
   sw842_hlist_node2	24	0
   sw842_hlist_node4	24	0
   sw842_hlist_node8	32	0
  -sw842_param	59496	2
  +sw842_param	48	1
   swait_queue	24	0
   swait_queue_head	24	1
   swap_cgroup	2	0
  @@ -7942,7 +7942,7 @@
   uprobe_trace_entry_head	8	0
   uprobe_xol_ops	32	0
   urb	184	0
  -urb_priv	32	1
  +urb_priv	8	0
   usb2_lpm_parameters	8	0
   usb3_lpm_parameters	16	0
   usb_anchor	56	0
  ⬢[acme@toolbox pahole]$

I'll check one by one, but looks kinda legit.

Now to fiddle with thread affinities. And then move to threaded BTF
encoding, that at a first test with a single btf_lock in the pahole
stealer ended up producing corrupt BTF, valid just up to a point.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 75d4748861 pahole: Disable parallell BTF encoding for now
Introduce first parallell DWARF loading, test it, then move on to use it
together with BTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 1c60f71daa pahole: Add locking for the structures list and rbtree
Prep work for multithreaded DWARF loading, when there will be concurrent
access to this data structure.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 46ad8c0158 dwarf_loader: Introduce 'dwarf_cus' to group all the DWARF specific per-cus state
Will help reusing in the upcoming multithreading mode.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo d963af9fd8 dwarf_loader: Factor common bits for creating and processing CU
Will be used for the multithreaded loading

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 0c5bf70cc1 fprintf: class__vtable_fprintf() doesn't need a 'cu' arg
Another simplification made possible by using a plain char string
instead of string_t, that was only needed in the core as prep work
for CTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 38ff86b149 fprintf: string_type__fprintf() doesn't need a 'cu' arg
Another simplification made possible by using a plain char string
instead of string_t, that was only needed in the core as prep work
for CTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo a75c342ac2 core: Ditch tag__free_orig_info(), unused
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 80fe32fd29 core: variable__name() doesn't need a 'cu' arg
Another simplification made possible by using a plain char string
instead of string_t, that was only needed in the core as prep work
for CTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo caa219dffc core: base_type__name() doesn't need a 'cu' arg
Another simplification made possible by using a plain char string
instead of string_t, that was only needed in the core as prep work
for CTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 7569e46d35 core: namespace__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo de4e8b7f17 core: {tag,function,lexblock}__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

They call each other, so do the three at once.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 789ed4e3a2 core: ftype__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 6340cb4627 core: enumeration__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 33e44f5295 core: type__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 0f54ca9c82 core: class__clone() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 2b2014187b core: class__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo f40900eba6 core: type__delete_class_members() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 50916756d5 core: class_member__delete() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 2e50463c3a core: type__clone_members() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo a66208355e core: class_member__clone() doesn't need a 'cu' arg
Since we stopped using per-cu obstacks we don't need it. If we ever
want to use it we can do per thread obstacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 33e0d5f874 pahole: Introduce --prettify option
The use of isatty(0) to switch into pretty printing is problematic as
reported by Bernd Buschinski, that ran into problems with his scripts:

========================================================================
  I am using pahole 1.21 and I recently noticed that I no longer have
  any pahole output in several scripts.

  Using (on the command line):

    $ pahole -V -E -C my_struct /path/to/my/debug.o

  works fine and gives the expected output.

  But:

    $ parallel -j 1 pahole -V -E -C my_struct ::: /path/to/my/debug.o

  gives nothing, no stderr, no stdout and ret code 0.

  After testing some versions, it works fine in 1.17 and no longer works in 1.18.
========================================================================

Since the pretty printer broke existing scripts, and its a relatively
new feature, lets switch to using a explicit command line option to
activate the pretty printer, i.e. where we used:

  $ pahole --header elf64_hdr < /bin/bash

We now use one of:

  ⬢[acme@toolbox pahole]$ pahole --header elf64_hdr --prettify=/bin/bash
  {
  	.e_ident = { 127, 69, 76, 70, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
  	.e_type = 3,
  	.e_machine = 62,
  	.e_version = 1,
  	.e_entry = 204016,
  	.e_phoff = 64,
  	.e_shoff = 1388096,
  	.e_flags = 0,
  	.e_ehsize = 64,
  	.e_phentsize = 56,
  	.e_phnum = 13,
  	.e_shentsize = 64,
  	.e_shnum = 31,
  	.e_shstrndx = 30,
  },
  ⬢[acme@toolbox pahole]$ pahole --header elf64_hdr --prettify /bin/bash
  {
  	.e_ident = { 127, 69, 76, 70, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
  	.e_type = 3,
  	.e_machine = 62,
  	.e_version = 1,
  	.e_entry = 204016,
  	.e_phoff = 64,
  	.e_shoff = 1388096,
  	.e_flags = 0,
  	.e_ehsize = 64,
  	.e_phentsize = 56,
  	.e_phnum = 13,
  	.e_shentsize = 64,
  	.e_shnum = 31,
  	.e_shstrndx = 30,
  },
  ⬢[acme@toolbox pahole]$ pahole --header elf64_hdr --prettify - < /bin/bash
  {
  	.e_ident = { 127, 69, 76, 70, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
  	.e_type = 3,
  	.e_machine = 62,
  	.e_version = 1,
  	.e_entry = 204016,
  	.e_phoff = 64,
  	.e_shoff = 1388096,
  	.e_flags = 0,
  	.e_ehsize = 64,
  	.e_phentsize = 56,
  	.e_phnum = 13,
  	.e_shentsize = 64,
  	.e_shnum = 31,
  	.e_shstrndx = 30,
  },
  ⬢[acme@toolbox pahole]$ pahole --header elf64_hdr --prettify=- < /bin/bash
  {
  	.e_ident = { 127, 69, 76, 70, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
  	.e_type = 3,
  	.e_machine = 62,
  	.e_version = 1,
  	.e_entry = 204016,
  	.e_phoff = 64,
  	.e_shoff = 1388096,
  	.e_flags = 0,
  	.e_ehsize = 64,
  	.e_phentsize = 56,
  	.e_phnum = 13,
  	.e_shentsize = 64,
  	.e_shnum = 31,
  	.e_shstrndx = 30,
  },
  ⬢[acme@toolbox pahole]$

Reported-by: Bernd Buschinski <b.buschinski@googlemail.com>
Report-Link: https://lore.kernel.org/dwarves/CACN-hLVoz2tWrtgDLabOv6S1-H_8RD2fh8SV6EnADF1ikMxrmw@mail.gmail.com/
Tested-by-by: Bernd Buschinski <b.buschinski@googlemail.com>
Test-Link: https://lore.kernel.org/dwarves/CACN-hLXgHWdBkyMz+w58qX8DaV+WJ1mj1qheGBHbPv4fqozi5w@mail.gmail.com/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo bc36e94f32 pahole: Try harder to resolve the --header type when pretty printing
Go on processing CUs till we have everything sorted out, which includes
the --header type.

On a file with DWARF info where the header type was the last to be found
it wasn't being resolved and the tool fails to resolve header variable
references and emits this misleading error message:

  ⬢[acme@toolbox pahole]$ pahole ~/bin/perf --header=perf_file_header --seek_bytes '$header.data.offset' --size_bytes='$header.data.size' -C 'perf_event_header(sizeof,type,type_enum=perf_event_type)' < perf.data
  pahole: --seek_bytes ($header.data.offset) makes reference to --header but it wasn't specified
  ⬢[acme@toolbox pahole]$

And that 'struct perf_file_header' _is_ in one of the CUs in ~/bin/perf:

  ⬢[acme@toolbox pahole]$ pahole ~/bin/perf -C perf_file_header
  struct perf_file_header {
  	u64                        magic;                /*     0     8 */
  	u64                        size;                 /*     8     8 */
  	u64                        attr_size;            /*    16     8 */
  	struct perf_file_section   attrs;                /*    24    16 */
  	struct perf_file_section   data;                 /*    40    16 */
  	struct perf_file_section   event_types;          /*    56    16 */
  	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
  	long unsigned int          adds_features[4];     /*    72    32 */

  	/* size: 104, cachelines: 2, members: 7 */
  	/* last cacheline: 40 bytes */
  };
  ⬢[acme@toolbox pahole]$

With this fix all the records are printed.

This probably wasn't noticed before because most tests were made with a
~/bin/perf file with BTF information, i.e. just one "CU", so the logic
of deferring the pretty printing till everything gets resolved wasn't
being exercised properly.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo fcfa2141c3 pahole: Make prototype__stdio_fprintf_value() receive a FILE to read raw data from
So far its just from stdin, but shouldn't.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00