Commit Graph

1493 Commits

Author SHA1 Message Date
Andrii Nakryiko 29fce8dc85 strings: use BTF's string APIs for strings management
Switch strings container to using struct btf and its
btf__add_str()/btf__find_str() APIs, which do equivalent internal string
deduplication. This turns out to be a very significantly faster than using
tsearch functions. To satisfy CTF encoding use case, some hacky string size
fetching approach is utilized, as libbpf doesn't provide direct API to get
total string section size and to copy over just strings data section.

BEFORE:
         22,624.28 msec task-clock                #    1.000 CPUs utilized
                85      context-switches          #    0.004 K/sec
                 3      cpu-migrations            #    0.000 K/sec
           622,545      page-faults               #    0.028 M/sec
    68,177,206,387      cycles                    #    3.013 GHz                      (24.99%)
   114,370,031,619      instructions              #    1.68  insn per cycle           (25.01%)
    26,125,001,179      branches                  # 1154.733 M/sec                    (25.01%)
       458,861,243      branch-misses             #    1.76% of all branches          (25.00%)
    24,533,455,967      L1-dcache-loads           # 1084.386 M/sec                    (25.02%)
       973,500,214      L1-dcache-load-misses     #    3.97% of all L1-dcache hits    (25.05%)
       338,773,561      LLC-loads                 #   14.974 M/sec                    (25.02%)
        12,651,196      LLC-load-misses           #    3.73% of all LL-cache hits     (25.00%)

      22.628910615 seconds time elapsed

      21.341063000 seconds user
       1.283763000 seconds sys

AFTER:
         18,362.97 msec task-clock                #    1.000 CPUs utilized
                37      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
           626,281      page-faults               #    0.034 M/sec
    52,480,619,000      cycles                    #    2.858 GHz                      (25.00%)
   104,736,434,384      instructions              #    2.00  insn per cycle           (25.01%)
    23,878,428,465      branches                  # 1300.358 M/sec                    (25.01%)
       252,669,685      branch-misses             #    1.06% of all branches          (25.03%)
    21,829,390,952      L1-dcache-loads           # 1188.772 M/sec                    (25.04%)
       638,086,339      L1-dcache-load-misses     #    2.92% of all L1-dcache hits    (25.02%)
       212,327,435      LLC-loads                 #   11.563 M/sec                    (25.00%)
        14,578,117      LLC-load-misses           #    6.87% of all LL-cache hits     (25.00%)

      18.364427347 seconds time elapsed

      16.985494000 seconds user
       1.377959000 seconds sys

Committer testing:

Before:

  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

            8,735.92 msec task-clock:u              #    0.998 CPUs utilized            ( +-  0.34% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             353,978      page-faults:u             #    0.041 M/sec                    ( +-  0.00% )
      34,722,167,335      cycles:u                  #    3.975 GHz                      ( +-  0.12% )  (83.33%)
         555,981,118      stalled-cycles-frontend:u #    1.60% frontend cycles idle     ( +-  1.53% )  (83.33%)
       5,215,370,531      stalled-cycles-backend:u  #   15.02% backend cycles idle      ( +-  1.31% )  (83.33%)
      72,615,773,119      instructions:u            #    2.09  insn per cycle
                                                    #    0.07  stalled cycles per insn  ( +-  0.02% )  (83.34%)
      16,624,959,121      branches:u                # 1903.057 M/sec                    ( +-  0.01% )  (83.33%)
         229,962,327      branch-misses:u           #    1.38% of all branches          ( +-  0.07% )  (83.33%)

              8.7503 +- 0.0301 seconds time elapsed  ( +-  0.34% )

  $

After:

  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

            7,302.31 msec task-clock:u              #    0.998 CPUs utilized            ( +-  1.16% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             355,884      page-faults:u             #    0.049 M/sec                    ( +-  0.00% )
      29,150,861,078      cycles:u                  #    3.992 GHz                      ( +-  0.35% )  (83.33%)
         478,705,326      stalled-cycles-frontend:u #    1.64% frontend cycles idle     ( +-  2.70% )  (83.33%)
       5,351,001,796      stalled-cycles-backend:u  #   18.36% backend cycles idle      ( +-  1.20% )  (83.33%)
      65,835,888,022      instructions:u            #    2.26  insn per cycle
                                                    #    0.08  stalled cycles per insn  ( +-  0.03% )  (83.33%)
      15,025,195,460      branches:u                # 2057.594 M/sec                    ( +-  0.05% )  (83.34%)
         141,209,214      branch-misses:u           #    0.94% of all branches          ( +-  0.15% )  (83.33%)

              7.3140 +- 0.0851 seconds time elapsed  ( +-  1.16% )

  $

16.04% less cycles, keep the patches coming! :-)

Had to add this patch tho:

  +++ b/dwarf_loader.c
  @@ -2159,7 +2159,7 @@ static unsigned long long dwarf_tag__orig_id(const struct tag *tag,
   static const char *dwarf__strings_ptr(const struct cu *cu __unused,
   				      strings_t s)
   {
  -	return strings__ptr(strings, s);
  +	return s ? strings__ptr(strings, s) : NULL;
   }

To keep preexisting behaviour and to do what the BTF specific
strings_ptr method does:

  static const char *btf_elf__strings_ptr(const struct cu *cu, strings_t s)
  {
          return btf_elf__string(cu->priv, s);
  }

  const char *btf_elf__string(struct btf_elf *btfe, uint32_t ref)
  {
          const char *s = btf__str_by_offset(btfe->btf, ref);

          return s && s[0] == '\0' ? NULL : s;
  }

With these adjustments, btfdiff on a vmlinux with BTF and DWARF is again
clean, i.e. pretty printing from BTF matches what we get when using
DWARF.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-20 17:17:51 -03:00
Arnaldo Carvalho de Melo 75f3520fed strings: Rename strings.h to avoid clashing with /usr/include/strings.h
This was detected with:

  In file included from /home/acme/git/pahole/strings.h:9,
                   from /usr/include/string.h:432,
                   from /home/acme/git/pahole/lib/bpf/src/libbpf_common.h:12,
                   from /home/acme/git/pahole/lib/bpf/src/libbpf.h:20,
                   from /home/acme/git/pahole/lib/bpf/src/ringbuf.c:20:
  /home/acme/git/pahole/lib/bpf/src/btf.h:33:11: error: expected ‘;’ before ‘void’
     33 | LIBBPF_API void btf__free(struct btf *btf);
        |           ^~~~~
        |           ;

libbpf_common.h has:

  #include <string.h>

  #ifndef LIBBPF_API
  #define LIBBPF_API __attribute__((visibility("default")))
  #endif

So before defining LIBBPF_API it includes libc's string.h that in turn
includes pahole's strings.h and now it includes:

  #include "lib/bpf/src/btf.h"

That will need the LIBBPF_API, b00m.

So lets just rename pahole's strings.h to pahole_strings.h to avoid this
pitfall.

This patch was moved to before this problem takes place so that we keep
everything bisectable.

Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-20 17:11:34 -03:00
Andrii Nakryiko bba7151e0f dwarf_loader: increase the size of lookup hash map
One of the primary use cases for using pahole is BTF deduplication during
Linux kernel build. That means that DWARF contains more than 5 million types
is loaded. So using a hash map with a small number of buckets is quite
expensive due to hash collisions. This patch bumps the size of the hash map
and reduces overhead of this part of the DWARF loading process.

This shaves off about 1 second out of about 20 seconds total for Linux BTF
dedup.

Committer testing:

Before:

  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

            8,953.80 msec task-clock:u              #    0.998 CPUs utilized            ( +-  0.09% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             353,855      page-faults:u             #    0.040 M/sec                    ( +-  0.00% )
      35,775,730,539      cycles:u                  #    3.996 GHz                      ( +-  0.07% )  (83.33%)
         579,534,836      stalled-cycles-frontend:u #    1.62% frontend cycles idle     ( +-  2.21% )  (83.33%)
       5,719,840,144      stalled-cycles-backend:u  #   15.99% backend cycles idle      ( +-  0.93% )  (83.33%)
      73,035,744,786      instructions:u            #    2.04  insn per cycle
                                                    #    0.08  stalled cycles per insn  ( +-  0.02% )  (83.34%)
      16,798,017,844      branches:u                # 1876.077 M/sec                    ( +-  0.05% )  (83.33%)
         237,777,143      branch-misses:u           #    1.42% of all branches          ( +-  0.15% )  (83.34%)

             8.97077 +- 0.00803 seconds time elapsed  ( +-  0.09% )

  $

After:

  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

            8,735.92 msec task-clock:u              #    0.998 CPUs utilized            ( +-  0.34% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             353,978      page-faults:u             #    0.041 M/sec                    ( +-  0.00% )
      34,722,167,335      cycles:u                  #    3.975 GHz                      ( +-  0.12% )  (83.33%)
         555,981,118      stalled-cycles-frontend:u #    1.60% frontend cycles idle     ( +-  1.53% )  (83.33%)
       5,215,370,531      stalled-cycles-backend:u  #   15.02% backend cycles idle      ( +-  1.31% )  (83.33%)
      72,615,773,119      instructions:u            #    2.09  insn per cycle
                                                    #    0.07  stalled cycles per insn  ( +-  0.02% )  (83.34%)
      16,624,959,121      branches:u                # 1903.057 M/sec                    ( +-  0.01% )  (83.33%)
         229,962,327      branch-misses:u           #    1.38% of all branches          ( +-  0.07% )  (83.33%)

              8.7503 +- 0.0301 seconds time elapsed  ( +-  0.34% )

  $

2.94% less cycles, good :-)

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-09 13:05:43 -03:00
Andrii Nakryiko 2e719cca66 btf_encoder: revamp how per-CPU variables are encoded
Right now to encode per-CPU variables in BTF, pahole iterates complete vmlinux
symbol table for each CU. There are 2500 CUs for a typical kernel image.
Overall, to encode 287 per-CPU variables pahole spends more than 10% of its CPU
budget, this is incredibly wasteful.

This patch revamps how this is done. Now it pre-processes symbol table once
before any of per-CU processing starts. It remembers each per-CPU variable
symbol, including its address, size, and name. Then during processing each CU,
binary search is used to correlate DWARF variable with per-CPU symbols and
figure out if variable belongs to per-CPU data section. If the match is found,
BTF_KIND_VAR is emitted and var_secinfo is recorded, just like before. At the
very end, after all CUs are processed, BTF_KIND_DATASEC is emitted with sorted
variables.

This change makes per-CPU variables generation overhead pretty negligible and
returns back about 10% of CPU usage.

Performance counter stats for './pahole -J /home/andriin/linux-build/default/vmlinux':

BEFORE:
      19.160149000 seconds user
       1.304873000 seconds sys

         24,114.05 msec task-clock                #    0.999 CPUs utilized
                83      context-switches          #    0.003 K/sec
                 0      cpu-migrations            #    0.000 K/sec
           622,417      page-faults               #    0.026 M/sec
    72,897,315,125      cycles                    #    3.023 GHz                      (25.02%)
   127,807,316,959      instructions              #    1.75  insn per cycle           (25.01%)
    29,087,179,117      branches                  # 1206.234 M/sec                    (25.01%)
       464,105,921      branch-misses             #    1.60% of all branches          (25.01%)
    30,252,119,368      L1-dcache-loads           # 1254.543 M/sec                    (25.01%)
     1,156,336,207      L1-dcache-load-misses     #    3.82% of all L1-dcache hits    (25.05%)
       343,373,503      LLC-loads                 #   14.240 M/sec                    (25.02%)
        12,044,977      LLC-load-misses           #    3.51% of all LL-cache hits     (25.01%)

      24.136198321 seconds time elapsed

      22.729693000 seconds user
       1.384859000 seconds sys

AFTER:
      16.781455000 seconds user
       1.343956000 seconds sys

         23,398.77 msec task-clock                #    1.000 CPUs utilized
                86      context-switches          #    0.004 K/sec
                 0      cpu-migrations            #    0.000 K/sec
           622,420      page-faults               #    0.027 M/sec
    68,395,641,468      cycles                    #    2.923 GHz                      (25.05%)
   114,241,327,034      instructions              #    1.67  insn per cycle           (25.01%)
    26,330,711,718      branches                  # 1125.303 M/sec                    (25.01%)
       465,926,869      branch-misses             #    1.77% of all branches          (25.00%)
    24,662,984,772      L1-dcache-loads           # 1054.029 M/sec                    (25.00%)
     1,054,052,064      L1-dcache-load-misses     #    4.27% of all L1-dcache hits    (25.00%)
       340,970,622      LLC-loads                 #   14.572 M/sec                    (25.00%)
        16,032,297      LLC-load-misses           #    4.70% of all LL-cache hits     (25.03%)

      23.402259654 seconds time elapsed

      21.916437000 seconds user
       1.482826000 seconds sys

Committer testing:

  $ grep 'model name' -m1 /proc/cpuinfo
  model name	: AMD Ryzen 9 3900X 12-Core Processor
  $

Before:

  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

            9,730.28 msec task-clock:u              #    0.998 CPUs utilized            ( +-  0.54% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             353,854      page-faults:u             #    0.036 M/sec                    ( +-  0.00% )
      39,721,726,459      cycles:u                  #    4.082 GHz                      ( +-  0.07% )  (83.33%)
         626,010,654      stalled-cycles-frontend:u #    1.58% frontend cycles idle     ( +-  0.91% )  (83.33%)
       7,518,333,691      stalled-cycles-backend:u  #   18.93% backend cycles idle      ( +-  0.56% )  (83.33%)
      85,477,123,093      instructions:u            #    2.15  insn per cycle
                                                    #    0.09  stalled cycles per insn  ( +-  0.02% )  (83.34%)
      19,346,085,683      branches:u                # 1988.235 M/sec                    ( +-  0.03% )  (83.34%)
         237,291,787      branch-misses:u           #    1.23% of all branches          ( +-  0.15% )  (83.33%)

              9.7465 +- 0.0524 seconds time elapsed  ( +-  0.54% )

  $

After:

  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

            8,953.80 msec task-clock:u              #    0.998 CPUs utilized            ( +-  0.09% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             353,855      page-faults:u             #    0.040 M/sec                    ( +-  0.00% )
      35,775,730,539      cycles:u                  #    3.996 GHz                      ( +-  0.07% )  (83.33%)
         579,534,836      stalled-cycles-frontend:u #    1.62% frontend cycles idle     ( +-  2.21% )  (83.33%)
       5,719,840,144      stalled-cycles-backend:u  #   15.99% backend cycles idle      ( +-  0.93% )  (83.33%)
      73,035,744,786      instructions:u            #    2.04  insn per cycle
                                                    #    0.08  stalled cycles per insn  ( +-  0.02% )  (83.34%)
      16,798,017,844      branches:u                # 1876.077 M/sec                    ( +-  0.05% )  (83.33%)
         237,777,143      branch-misses:u           #    1.42% of all branches          ( +-  0.15% )  (83.34%)

             8.97077 +- 0.00803 seconds time elapsed  ( +-  0.09% )

  $

Indeed, about 10% shaved, not bad.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Hao Luo <haoluo@google.com>
Cc: Oleg Rombakh <olegrom@google.com>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-09 12:57:35 -03:00
Andrii Nakryiko 0258a47ef9 btf_encoder: Discard CUs after BTF encoding
When doing BTF encoding/deduping, DWARF CUs are never used after BTF encoding
is done, so there is no point in wasting memory and keeping them in memory. So
discard them immediately.

Committer testing:

  $ pahole -J vmlinux
  $ ./btfdiff vmlinux
  $

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-09 12:49:42 -03:00
Andrii Nakryiko 3c913e18b2 btf_encoder: Fix emitting __ARRAY_SIZE_TYPE__ as index range type
Fix the logic of determining if __ARRAY_SIZE_TYPE__ needs to be emitted.
Previously, such type could be emitted unnecessarily due to some
particular CU not having an int type in it. That would happen even if
there was no array type in that CU. Fix it by keeping track of 'int'
type across CUs and only emitting __ARRAY_SIZE_TYPE__ if a given CU has
array type, but we still haven't found 'int' type.

Testing against vmlinux shows that now there are no __ARRAY_SIZE_TYPE__
integers emitted.

Committer testing:

  $ pahole -J vmlinux
  $ ./btfdiff vmlinux
  $

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-09 12:49:40 -03:00
Andrii Nakryiko 48efa92933 btf_encoder: Use libbpf APIs to encode BTF type info
Switch to use libbpf's BTF writing APIs to encode BTF. This reconciles
btf_elf's use of internal struct btf from libbpf for both loading and
encoding BTF type info. This change also saves a considerable amount of
memory used for DWARF to BTF conversion due to avoiding extra memory
copy between gobuffers and libbpf's struct btf. Now that pahole uses
libbpf's struct btf, it's possible to further utilize libbpf's features
and APIs, e.g., for handling endianness conversion, for dumping raw BTF
type info during encoding. These features might be implemented in the
follow up patches.

Committer notes:

Built with 'cmake -DCMAKE_BUILD_TYPE=Release'

Before:

  $ cp ~/git/build/bpf-next-v5.9.0-rc8+/vmlinux .
  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

           10,065.20 msec task-clock:u              #    0.998 CPUs utilized            ( +-  0.68% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             514,596      page-faults:u             #    0.051 M/sec                    ( +-  0.00% )
      40,098,447,225      cycles:u                  #    3.984 GHz                      ( +-  0.26% )  (83.33%)
         547,247,149      stalled-cycles-frontend:u #    1.36% frontend cycles idle     ( +-  2.00% )  (83.33%)
       6,493,462,167      stalled-cycles-backend:u  #   16.19% backend cycles idle      ( +-  1.53% )  (83.33%)
      86,338,929,286      instructions:u            #    2.15  insn per cycle
                                                    #    0.08  stalled cycles per insn  ( +-  0.01% )  (83.34%)
      19,859,060,127      branches:u                # 1973.043 M/sec                    ( +-  0.02% )  (83.33%)
         288,389,742      branch-misses:u           #    1.45% of all branches          ( +-  0.13% )  (83.33%)

             10.0831 +- 0.0683 seconds time elapsed  ( +-  0.68% )

  $

After:

  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

           10,043.94 msec task-clock:u              #    0.998 CPUs utilized            ( +-  0.69% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             412,035      page-faults:u             #    0.041 M/sec                    ( +-  0.00% )
      39,985,610,202      cycles:u                  #    3.981 GHz                      ( +-  0.18% )  (83.33%)
         657,352,766      stalled-cycles-frontend:u #    1.64% frontend cycles idle     ( +-  2.79% )  (83.33%)
       7,387,740,861      stalled-cycles-backend:u  #   18.48% backend cycles idle      ( +-  1.65% )  (83.33%)
      85,926,053,845      instructions:u            #    2.15  insn per cycle
                                                    #    0.09  stalled cycles per insn  ( +-  0.04% )  (83.34%)
      19,428,047,875      branches:u                # 1934.305 M/sec                    ( +-  0.05% )  (83.33%)
         240,156,838      branch-misses:u           #    1.24% of all branches          ( +-  0.14% )  (83.34%)

             10.0609 +- 0.0696 seconds time elapsed  ( +-  0.69% )

  $

  $ ./btfdiff vmlinux
  $

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-09 12:49:37 -03:00
Andrii Nakryiko 5d863aa7ce btf_loader: Use libbpf to load BTF
Switch BTF loading to completely use libbpf's own struct btf and related
APIs.

BTF encoding is still happening with pahole's own code, so these two
code paths are not sharing anything now. String fetching is happening
based on whether btfe->strings were set to non-NULL pointer by
btf_encoder.

Committer testing:

  $ cp ~/git/build/bpf-next-v5.9.0-rc8+/vmlinux .
  $ readelf -SW vmlinux  | grep BTF
    [24] .BTF      PROGBITS  ffffffff82494ac0 1694ac0 340207 00   A  0  0  1
    [25] .BTF_ids  PROGBITS  ffffffff827d4cc8 19d4cc8 0000a4 00   A  0  0  1
  $ ./btfdiff vmlinux
  $

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-09 12:43:42 -03:00
Andrii Nakryiko 0a9b89910e dwarves: Expose and maintain active debug info loader operations
Maintain a pointer to debug_fmt_ops corresponding to currently used debug info
format loader (DWARF, BTF, or CTF), to allow various parts of libdwarves to do
things like resolve string offset to actual string pointer in
a format-agnostic format. This allows to, say, load DWARF debug info, and use
it for BTF generation, without either of them making assumptions about how
strings are actually stored internally.

This is going to be used in the next patch to allow BTF loader and encoder to
use a very different way of storing strings (not a global shared gobuffer).

Committer notes:

Since it is available in multiple object files, add a dwarves__ prefix
namespace and add an extern for it in dwarves.h.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-09 12:43:40 -03:00
Andrii Nakryiko 7bc2dd07d5 btf_encoder: detect BTF encoding errors and exit
Don't silently swallow BTF encoding errors and continue onto next CU. If
any of CU fails to properly encode BTF, exit with an error message.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-09 12:43:36 -03:00
Andrii Nakryiko c35b7fa52c libbpf: Update to latest libbpf version
Pull in BTF writer APIs.

Committer testing:

  $ rm -rf build
  $ mkdir build
  $ cd build
  $ cmake ..
  -- The C compiler identification is GNU 10.2.1
  -- Check for working C compiler: /usr/lib64/ccache/cc
  -- Check for working C compiler: /usr/lib64/ccache/cc - works
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Checking availability of DWARF and ELF development libraries
  -- Looking for dwfl_module_build_id in elf
  -- Looking for dwfl_module_build_id in elf - found
  -- Found dwarf.h header: /usr/include
  -- Found elfutils/libdw.h header: /usr/include
  -- Found libdw library: /usr/lib64/libdw.so
  -- Found libelf library: /usr/lib64/libelf.so
  -- Checking availability of DWARF and ELF development libraries - done
  -- Found ZLIB: /usr/lib64/libz.so (found version "1.2.11")
  -- Submodule update
  From https://github.com/libbpf/libbpf
   * [new branch]                        libbpf-0.1.1-xdp-bug-fix -> origin/libbpf-0.1.1-xdp-bug-fix
     583bddce6b93bafa..b6dd2f2b7df4d3bd  master     -> origin/master
   * [new tag]                           v0.1.1     -> v0.1.1
   * [new tag]                           v0.0.8     -> v0.0.8
   * [new tag]                           v0.0.9     -> v0.0.9
   * [new tag]                           v0.1.0     -> v0.1.0
  Submodule path 'lib/bpf': checked out 'ff797cc905d9c5fe9acab92d2da127342b20f80f'
  -- Submodule update - done
  -- Performing Test HAVE_REALLOCARRAY_SUPPORT
  -- Performing Test HAVE_REALLOCARRAY_SUPPORT - Success
  -- Configuring done
  -- Generating done
  -- Build files have been written to: /home/acme/git/pahole/build
  $
  $ cd ..
  $ ./btfdiff vmlinux
  $
  $ cat fullcircle
  #!/bin/bash
  # SPDX-License-Identifier: GPL-2.0-only
  # Copyright © 2019 Red Hat Inc, Arnaldo Carvalho de Melo <acme@redhat.com>
  # Use pfunct to produce compilable output from a object, then do a codiff -s
  # To see if the type information generated from source code generated
  # from type information in a file compiled from the original source code matches.

  if [ $# -eq 0 ] ; then
  	echo "Usage: fullcircle <filename_with_type_info>"
  	exit 1
  fi

  file=$1

  nr_cus=$(readelf -wi ${file} | grep DW_TAG_compile_unit | wc -l)
  if [ $nr_cus -gt 1 ]; then
  	exit 0
  fi

  c_output=$(mktemp /tmp/fullcircle.XXXXXX.c)
  o_output=$(mktemp /tmp/fullcircle.XXXXXX.o)
  pfunct_bin=${PFUNCT-"pfunct"}
  codiff_bin=${CODIFF-"codiff"}

  # See how your DW_AT_producer looks like and find the
  # right regexp to get after the GCC version string, this one
  # seems good enough for Red Hat/Fedora/CentOS that look like:
  #
  #   DW_AT_producer    : (indirect string, offset: 0x3583): GNU C89 8.2.1 20181215 (Red Hat 8.2.1-6) -mno-sse -mno-mmx
  #
  # So we need from -mno-sse onwards

  CFLAGS=$(readelf -wi $file | grep -w DW_AT_producer | sed -r      's/.*\)( -[[:alnum:]]+.*)+/\1/g')

  # Check if we managed to do the sed or if this is something like GNU AS
  [ "${CFLAGS/DW_AT_producer/}" != "${CFLAGS}" ] && exit

  ${pfunct_bin} --compile $file > $c_output
  gcc $CFLAGS -c -g $c_output -o $o_output
  ${codiff_bin} -q -s $file $o_output

  rm -f $c_output $o_output
  exit 0
  [acme@five pahole]$ cp ~/git/build/bpf-next-v5.9.0-rc8+/net/ipv4/tcp_ipv4.o  .
  [acme@five pahole]$ readelf -SW tcp_ipv4.o  | grep BTF
  [acme@five pahole]$ pahole -J tcp_ipv4.o
  [acme@five pahole]$ readelf -SW tcp_ipv4.o  | grep BTF
    [105] .BTF              PROGBITS        0000000000000000 0fcf68 03ff6e 00      0   0  1
  [acme@five pahole]$ ./fullcircle tcp_ipv4.o
  [acme@five pahole]$ pahole -F btf -C tcp_sock tcp_ipv4.o
  struct tcp_sock {
  	struct inet_connection_sock inet_conn;           /*     0  1376 */
  	/* --- cacheline 21 boundary (1344 bytes) was 32 bytes ago --- */
  	u16                        tcp_header_len;       /*  1376     2 */
  	u16                        gso_segs;             /*  1378     2 */
  	__be32                     pred_flags;           /*  1380     4 */
  	u64                        bytes_received;       /*  1384     8 */
  	u32                        segs_in;              /*  1392     4 */
  	u32                        data_segs_in;         /*  1396     4 */
  	u32                        rcv_nxt;              /*  1400     4 */
  	u32                        copied_seq;           /*  1404     4 */
  	/* --- cacheline 22 boundary (1408 bytes) --- */
  	u32                        rcv_wup;              /*  1408     4 */
  	u32                        snd_nxt;              /*  1412     4 */
  	u32                        segs_out;             /*  1416     4 */
  	u32                        data_segs_out;        /*  1420     4 */
  	u64                        bytes_sent;           /*  1424     8 */
  	u64                        bytes_acked;          /*  1432     8 */
  	u32                        dsack_dups;           /*  1440     4 */
  	u32                        snd_una;              /*  1444     4 */
  	u32                        snd_sml;              /*  1448     4 */
  	u32                        rcv_tstamp;           /*  1452     4 */
  	u32                        lsndtime;             /*  1456     4 */
  	u32                        last_oow_ack_time;    /*  1460     4 */
  	u32                        compressed_ack_rcv_nxt; /*  1464     4 */
  	u32                        tsoffset;             /*  1468     4 */
  	/* --- cacheline 23 boundary (1472 bytes) --- */
  	struct list_head           tsq_node;             /*  1472    16 */
  	struct list_head           tsorted_sent_queue;   /*  1488    16 */
  	u32                        snd_wl1;              /*  1504     4 */
  	u32                        snd_wnd;              /*  1508     4 */
  	u32                        max_window;           /*  1512     4 */
  	u32                        mss_cache;            /*  1516     4 */
  	u32                        window_clamp;         /*  1520     4 */
  	u32                        rcv_ssthresh;         /*  1524     4 */
  	struct tcp_rack            rack;                 /*  1528    24 */

  	/* XXX last struct has 2 bytes of padding */

  	/* --- cacheline 24 boundary (1536 bytes) was 16 bytes ago --- */
  	u16                        advmss;               /*  1552     2 */
  	u8                         compressed_ack;       /*  1554     1 */
  	u8                         dup_ack_counter:2;    /*  1555: 0  1 */
  	u8                         tlp_retrans:1;        /*  1555: 2  1 */
  	u8                         unused:5;             /*  1555: 3  1 */
  	u32                        chrono_start;         /*  1556     4 */
  	u32                        chrono_stat[3];       /*  1560    12 */
  	u8                         chrono_type:2;        /*  1572: 0  1 */
  	u8                         rate_app_limited:1;   /*  1572: 2  1 */
  	u8                         fastopen_connect:1;   /*  1572: 3  1 */
  	u8                         fastopen_no_cookie:1; /*  1572: 4  1 */
  	u8                         is_sack_reneg:1;      /*  1572: 5  1 */
  	u8                         fastopen_client_fail:2; /*  1572: 6  1 */
  	u8                         nonagle:4;            /*  1573: 0  1 */
  	u8                         thin_lto:1;           /*  1573: 4  1 */
  	u8                         recvmsg_inq:1;        /*  1573: 5  1 */
  	u8                         repair:1;             /*  1573: 6  1 */
  	u8                         frto:1;               /*  1573: 7  1 */
  	u8                         repair_queue;         /*  1574     1 */
  	u8                         save_syn:2;           /*  1575: 0  1 */
  	u8                         syn_data:1;           /*  1575: 2  1 */
  	u8                         syn_fastopen:1;       /*  1575: 3  1 */
  	u8                         syn_fastopen_exp:1;   /*  1575: 4  1 */
  	u8                         syn_fastopen_ch:1;    /*  1575: 5  1 */
  	u8                         syn_data_acked:1;     /*  1575: 6  1 */
  	u8                         is_cwnd_limited:1;    /*  1575: 7  1 */
  	u32                        tlp_high_seq;         /*  1576     4 */
  	u32                        tcp_tx_delay;         /*  1580     4 */
  	u64                        tcp_wstamp_ns;        /*  1584     8 */
  	u64                        tcp_clock_cache;      /*  1592     8 */
  	/* --- cacheline 25 boundary (1600 bytes) --- */
  	u64                        tcp_mstamp;           /*  1600     8 */
  	u32                        srtt_us;              /*  1608     4 */
  	u32                        mdev_us;              /*  1612     4 */
  	u32                        mdev_max_us;          /*  1616     4 */
  	u32                        rttvar_us;            /*  1620     4 */
  	u32                        rtt_seq;              /*  1624     4 */
  	struct minmax              rtt_min;              /*  1628    24 */
  	u32                        packets_out;          /*  1652     4 */
  	u32                        retrans_out;          /*  1656     4 */
  	u32                        max_packets_out;      /*  1660     4 */
  	/* --- cacheline 26 boundary (1664 bytes) --- */
  	u32                        max_packets_seq;      /*  1664     4 */
  	u16                        urg_data;             /*  1668     2 */
  	u8                         ecn_flags;            /*  1670     1 */
  	u8                         keepalive_probes;     /*  1671     1 */
  	u32                        reordering;           /*  1672     4 */
  	u32                        reord_seen;           /*  1676     4 */
  	u32                        snd_up;               /*  1680     4 */
  	struct tcp_options_received rx_opt;              /*  1684    24 */
  	u32                        snd_ssthresh;         /*  1708     4 */
  	u32                        snd_cwnd;             /*  1712     4 */
  	u32                        snd_cwnd_cnt;         /*  1716     4 */
  	u32                        snd_cwnd_clamp;       /*  1720     4 */
  	u32                        snd_cwnd_used;        /*  1724     4 */
  	/* --- cacheline 27 boundary (1728 bytes) --- */
  	u32                        snd_cwnd_stamp;       /*  1728     4 */
  	u32                        prior_cwnd;           /*  1732     4 */
  	u32                        prr_delivered;        /*  1736     4 */
  	u32                        prr_out;              /*  1740     4 */
  	u32                        delivered;            /*  1744     4 */
  	u32                        delivered_ce;         /*  1748     4 */
  	u32                        lost;                 /*  1752     4 */
  	u32                        app_limited;          /*  1756     4 */
  	u64                        first_tx_mstamp;      /*  1760     8 */
  	u64                        delivered_mstamp;     /*  1768     8 */
  	u32                        rate_delivered;       /*  1776     4 */
  	u32                        rate_interval_us;     /*  1780     4 */
  	u32                        rcv_wnd;              /*  1784     4 */
  	u32                        write_seq;            /*  1788     4 */
  	/* --- cacheline 28 boundary (1792 bytes) --- */
  	u32                        notsent_lowat;        /*  1792     4 */
  	u32                        pushed_seq;           /*  1796     4 */
  	u32                        lost_out;             /*  1800     4 */
  	u32                        sacked_out;           /*  1804     4 */
  	struct hrtimer             pacing_timer;         /*  1808    64 */

  	/* XXX last struct has 4 bytes of padding */

  	/* --- cacheline 29 boundary (1856 bytes) was 16 bytes ago --- */
  	struct hrtimer             compressed_ack_timer; /*  1872    64 */

  	/* XXX last struct has 4 bytes of padding */

  	/* --- cacheline 30 boundary (1920 bytes) was 16 bytes ago --- */
  	struct sk_buff *           lost_skb_hint;        /*  1936     8 */
  	struct sk_buff *           retransmit_skb_hint;  /*  1944     8 */
  	struct rb_root             out_of_order_queue;   /*  1952     8 */
  	struct sk_buff *           ooo_last_skb;         /*  1960     8 */
  	struct tcp_sack_block      duplicate_sack[1];    /*  1968     8 */
  	struct tcp_sack_block      selective_acks[4];    /*  1976    32 */
  	/* --- cacheline 31 boundary (1984 bytes) was 24 bytes ago --- */
  	struct tcp_sack_block      recv_sack_cache[4];   /*  2008    32 */
  	struct sk_buff *           highest_sack;         /*  2040     8 */
  	/* --- cacheline 32 boundary (2048 bytes) --- */
  	int                        lost_cnt_hint;        /*  2048     4 */
  	u32                        prior_ssthresh;       /*  2052     4 */
  	u32                        high_seq;             /*  2056     4 */
  	u32                        retrans_stamp;        /*  2060     4 */
  	u32                        undo_marker;          /*  2064     4 */
  	int                        undo_retrans;         /*  2068     4 */
  	u64                        bytes_retrans;        /*  2072     8 */
  	u32                        total_retrans;        /*  2080     4 */
  	u32                        urg_seq;              /*  2084     4 */
  	unsigned int               keepalive_time;       /*  2088     4 */
  	unsigned int               keepalive_intvl;      /*  2092     4 */
  	int                        linger2;              /*  2096     4 */
  	u8                         bpf_sock_ops_cb_flags; /*  2100     1 */

  	/* XXX 1 byte hole, try to pack */

  	u16                        timeout_rehash;       /*  2102     2 */
  	u32                        rcv_ooopack;          /*  2104     4 */
  	u32                        rcv_rtt_last_tsecr;   /*  2108     4 */
  	/* --- cacheline 33 boundary (2112 bytes) --- */
  	struct {
  		u32                rtt_us;               /*  2112     4 */
  		u32                seq;                  /*  2116     4 */
  		u64                time;                 /*  2120     8 */
  	} rcv_rtt_est;                                   /*  2112    16 */
  	struct {
  		u32                space;                /*  2128     4 */
  		u32                seq;                  /*  2132     4 */
  		u64                time;                 /*  2136     8 */
  	} rcvq_space;                                    /*  2128    16 */
  	struct {
  		u32                probe_seq_start;      /*  2144     4 */
  		u32                probe_seq_end;        /*  2148     4 */
  	} mtu_probe;                                     /*  2144     8 */
  	u32                        mtu_info;             /*  2152     4 */
  	bool                       is_mptcp;             /*  2156     1 */
  	bool                       syn_smc;              /*  2157     1 */

  	/* XXX 2 bytes hole, try to pack */

  	const struct tcp_sock_af_ops  * af_specific;     /*  2160     8 */
  	struct tcp_md5sig_info *   md5sig_info;          /*  2168     8 */
  	/* --- cacheline 34 boundary (2176 bytes) --- */
  	struct tcp_fastopen_request * fastopen_req;      /*  2176     8 */
  	struct request_sock *      fastopen_rsk;         /*  2184     8 */
  	struct saved_syn *         saved_syn;            /*  2192     8 */

  	/* size: 2200, cachelines: 35, members: 140 */
  	/* sum members: 2193, holes: 2, sum holes: 3 */
  	/* sum bitfield members: 32 bits (4 bytes) */
  	/* paddings: 3, sum paddings: 10 */
  	/* last cacheline: 24 bytes */
  };
  $ pahole -F btf -C tcp_sock tcp_ipv4.o  > tcp_sock.o.before
  $

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-08 14:36:51 -03:00
Arnaldo Carvalho de Melo ef4f971a9c dwarf_loader: Conditionally define DW_AT_alignment
As there are distros where this isn't available, such as opensuse:15.2

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-08 10:17:33 -03:00
Arnaldo Carvalho de Melo cc3f9dce33 pahole: Implement --packed
To show just packed structs.

For instance, here are the top packed structures in the Linux kernel,
using BTF data:

  $ pahole --packed --sizes | sort -k2 -nr | head
  e820_table		64004	0
  boot_params		 4096	0
  btrfs_super_block	 3531	0
  efi_variable		 2084	0
  ntb_info_regs		  800	0
  tboot			  568	0
  _legacy_mbr		  512	0
  disklabel		  512	0
  btrfs_root_item	  439	0
  saved_context		  317	0
  $

If you then look at:

  $ pahole e820_table
  struct e820_table {
  	__u32                      nr_entries;           /*     0     4 */
  	struct e820_entry          entries[3200];        /*     4 64000 */

  	/* size: 64004, cachelines: 1001, members: 2 */
  	/* last cacheline: 4 bytes */
  } __attribute__((__packed__));
  $

In arch/x86/include/asm/e820/types.h we have:

  /*
   * The whole array of E820 entries:
   */
  struct e820_table {
          __u32 nr_entries;
          struct e820_entry entries[E820_MAX_ENTRIES];
  };

I.e. no explicit __packed__ attributes, but if we expand this a bit:

  $ pahole -E e820_table
  struct e820_table {
  	/* typedef __u32 */ unsigned int               nr_entries;                       /*     0     4 */
  	struct e820_entry {
  		/* typedef u64 -> __u64 */ long long unsigned int addr;                  /*     4     8 */
  		/* typedef u64 -> __u64 */ long long unsigned int size;                  /*    12     8 */
  		enum e820_type     type;                                                 /*    20     4 */
  	} __attribute__((__packed__)) entries[3200]; /*     4 64000 */

  	/* size: 64004, cachelines: 1001, members: 2 */
  	/* last cacheline: 4 bytes */
  } __attribute__((__packed__));
  $

We see that is that entries member that is packed, because:

  $ pahole e820_entry
  struct e820_entry {
  	u64                        addr;                 /*     0     8 */
  	u64                        size;                 /*     8     8 */
  	enum e820_type             type;                 /*    16     4 */

  	/* size: 20, cachelines: 1, members: 3 */
  	/* last cacheline: 20 bytes */
  } __attribute__((__packed__));
  $

In arch/x86/include/asm/e820/types.h we have:

  /*
   * A single E820 map entry, describing a memory range of [addr...addr+size-1],
   * of 'type' memory type:
   *
   * (We pack it because there can be thousands of them on large systems.)
   */
  struct e820_entry {
          u64                     addr;
          u64                     size;
          enum e820_type          type;
  } __attribute__((packed));

So yeah, it is there, BTF doesn't explicitly states it is packed (as
DWARF does) and pahole was able to infer that correctly.

Tested-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-08 10:11:31 -03:00
Arnaldo Carvalho de Melo 08f49262f4 man-pages: Fix 'coimbine' typo
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-08 09:10:34 -03:00
Arnaldo Carvalho de Melo fdc639188c dwarves: Prep v1.18
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-02 17:29:59 -03:00
Arnaldo Carvalho de Melo 70c3e66970 spec: Set the build type to 'Release'
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-02 17:29:21 -03:00
Zamir SUN 399376eba8 spec: Use more recent cmake rpm macros to fix build in fedora
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1863459
Signed-off-by: Zamir SUN <sztsian@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-02 17:24:03 -03:00
Arnaldo Carvalho de Melo 7f0a8484dc dwarf_loader: Ignore top level DW_TAG_dwarf_procedure tags
We also ignore it when it appears inside a DW_TAG_subprogram, so just
don't emit the warning that it is not supported when it appears in the
top level.

So far it doesn't look useful for what these tools do, need to revisit
it when the need arises.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-02 16:58:26 -03:00
Arnaldo Carvalho de Melo aee6808c47 btf_loader: Initialize function->lexblock.tags to fix segfault in pdwtags
pdwtags -F btf vmlinux

Was segfaulting when trying to iterate on the function main lexblock,
which was zeroed instead of INIT_LIST_HEAD'ed, fix it.

This also made pfunct misbehave when used with BTF. pahole unnafected as
it doesn't try to go thru functions in most cases.

Fixes: ccf3eebfcd ("btf_loader: Add support for BTF_KIND_FUNC")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-02 16:23:24 -03:00
Hao Luo c815d26689 btf_encoder: Handle DW_TAG_variable that has DW_AT_specification
It is found on gcc 8.2 that global percpu variables generate the
following dwarf entry in the cu where the variable is defined[1].

Take the global variable "bpf_prog_active" defined in
kernel/bpf/syscall.c as an example. The debug info for syscall.c has two
dwarf entries for "bpf_prog_active".

 > readelf -wi kernel/bpf/syscall.o

0x00013534:   DW_TAG_variable
                 DW_AT_name      ("bpf_prog_active")
                 DW_AT_decl_file
("/data/users/yhs/work/net-next/include/linux/bpf.h")
                 DW_AT_decl_line (1074)
                 DW_AT_decl_column       (0x01)
                 DW_AT_type      (0x000000d6 "int")
                 DW_AT_external  (true)
                 DW_AT_declaration       (true)

0x00021a25:   DW_TAG_variable
                 DW_AT_specification     (0x00013534 "bpf_prog_active")
                 DW_AT_decl_file
("/data/users/yhs/work/net-next/kernel/bpf/syscall.c")
                 DW_AT_decl_line (43)
                 DW_AT_location  (DW_OP_addr 0x0)

Note that second DW_TAG_variable entry contains specification that
points to the first entry. This causes problem for btf_encoder when
encoding global variables. The tag generated for the second entry
doesn't have the type and scope info. Therefore the BTF VARs encoded
using this tag has incorrect type_id and scope.

As fix, when creating variable, examine the dwarf entry. If it has
a DW_AT_specification, store the referred struct variable in a 'spec'
field. When encoding VARs, check this 'spec', if it's non-empty, follow
the pointer to use the referred var.

 [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg348144.html

Tested: Tested using gcc 4.9 and gcc 8.2. The types and scopes of global
 vars are now generated correctly.

 [21] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
 [21102] VAR 'bpf_prog_active' type_id=21, linkage=global-alloc

Signed-off-by: Hao Luo <haoluo@google.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: dwarves@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-01 15:22:41 -03:00
Arnaldo Carvalho de Melo b8068e7373 pahole: Only try using a single file name as a type name if not encoding BTF or CTF
Otherwise we end up trying to encode without any debug info and this
causes a segfault:

Before:

  $ pahole -J vmlinuz-5.9.0-rc6+
  tag__check_id_drift: subroutine_type id drift, core_id: 1145, btf_type_id: 1143, type_id_off: 0
  pahole: type 'vmlinuz-5.9.0-rc6+' not found
  libbpf: Unsupported BTF_KIND:0
  btf_elf__encode: btf__new failed!
  free(): double free detected in tcache 2
  Aborted (core dumped)
  $

The vmlinuz file doesn't contain any debugging info, fixing it we get:

  $ pahole -J vmlinuz-5.9.0-rc6+
  pahole: vmlinuz-5.9.0-rc6+: No debugging information found
  $

If debugging info is available, it all works as before:

Using /sys/kernel/btf/vmlinux

$ ls -la /sys/kernel/btf/vmlinux
-r--r--r--. 1 root root 3393761 Oct  1 09:50 /sys/kernel/btf/vmlinux

  $ pahole -E fw_cache_entry
  struct fw_cache_entry {
  	struct list_head {
  		struct list_head * next;          /*     0     8 */
  		struct list_head * prev;          /*     8     8 */
  	} list; /*     0    16 */
  	const char  *              name;          /*    16     8 */

  	/* size: 24, cachelines: 1, members: 2 */
  	/* last cacheline: 24 bytes */
  };
  $

Or explicitely asking for DWARF, where it will find the appropriate
vmlinux according to its buildid in /sys/kernel/notes:

  $ pahole -F dwarf pm_clock_entry
  struct pm_clock_entry {
  	struct list_head           node;          /*     0    16 */
  	char *                     con_id;        /*    16     8 */
  	struct clk *               clk;           /*    24     8 */
  	enum pce_status            status;        /*    32     4 */

  	/* size: 40, cachelines: 1, members: 4 */
  	/* padding: 4 */
  	/* last cacheline: 40 bytes */
  };
  $ pahole -F dwarf --expand_types pm_clock_entry
  struct pm_clock_entry {
  	struct list_head {
  		struct list_head * next;          /*     0     8 */
  		struct list_head * prev;          /*     8     8 */
  	} node; /*     0    16 */
  	char *                     con_id;        /*    16     8 */
  	struct clk *               clk;           /*    24     8 */
  	enum pce_status            status;        /*    32     4 */

  	/* size: 40, cachelines: 1, members: 4 */
  	/* padding: 4 */
  	/* last cacheline: 40 bytes */
  };
  $

Reported-by: Kevin Sheldrake <Kevin.Sheldrake@microsoft.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-01 12:03:58 -03:00
Arnaldo Carvalho de Melo 8b1c632831 libctf: Make can't get header message to appear only in verbose mode
This usually means we're trying each of the type loaders (DWARF, BTF,
CTF) on some invalid file, so no need to show that message, use verbose
mode to get it, so that we show that all loaders are being tried.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-01 10:08:25 -03:00
Arnaldo Carvalho de Melo 63e11400e8 libbtf: Make can't get header message to appear only in verbose mode
This usually means we're trying each of the type loaders (DWARF, BTF,
CTF) on some invalid file, so no need to show that message, use verbose
mode to get it, so that we show that all loaders are being tried.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-01 10:06:18 -03:00
Arnaldo Carvalho de Melo fc2b317db0 dwarf_loader: Check for unsupported_tag return in last two missing places
We need to check for this sentinel return everywhere we use
die__process_tag(), fix the last two places where this wasn't being
done: when processing DW_TAG_namespace and DW_TAG_subroutine_type.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-23 08:59:14 -03:00
Arnaldo Carvalho de Melo 2b5f4895e8 dwarf_loader: Warn user about unsupported TAGs
When die__process_tag() gets some tag it hasn't support for, it returns
the special zeroed 'unsupported_tag' struct tag pointer, and now all
places are checking for that, warn the user in all places when that
happens.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-23 08:56:16 -03:00
Arnaldo Carvalho de Melo 010a71e181 dwarf_loader: Handle unsupported_tag return in die__process_class()
When die__process_tag() gets some tag it hasn't support for, it returns
the special zeroed 'unsupported_tag' struct tag pointer, but
die__process_class() wasn't handling that, assuming since it isn't NULL
that it is a valid 'struct tag' pointer and proceeded to try to use its
->priv area, b00m.

So catch that and print an "unsuported tag FOO" for that case, this
makes the code way more robust as any unsupported tag will not cause a
segfault, just a warning.

So, for the case in the Bugtracker tag below, we don't segfault instead
we show just that DW_TAG_variant_part isn't supported when found inside
a DW_TAG_structure_type (or DW_TAG_class_type) as is the case with ADA
95.

Reported-by: Tom de Vries
Bugtracker: https://github.com/acmel/dwarves/issues/9#issuecomment-697250246
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-23 08:50:49 -03:00
Arnaldo Carvalho de Melo 3d616609ee dwarf_loader: Add minimal handling of DW_TAG_subrange_type
This was found in a ADA object, part of gdb's test suite, for now just
make the code a bit more robust when not finding a type for some struct
member, etc, which avoids segfaults and produces output from ADA
objects, but there are other problems to solve as this is a _type tag, I
need to provide some better support so that type resolution works.

  [foo.debug.gz](https://github.com/acmel/dwarves/files/5257332/foo.debug.gz)

  Foo.debug, an ada exec:
  ```
  $ ~/dwarves/build/pahole foo.debug
  die__process_unit: DW_TAG_subrange_type (0x21) @ <0x10b> not handled!
  die__process_unit: DW_TAG_subrange_type (0x21) @ <0x134> not handled!
  die__process_unit: DW_TAG_subrange_type (0x21) @ <0x148> not handled!
  die__process_class: DW_TAG_subrange_type (0x21) @ <0x201> not handled!
  Segmentation fault (core dumped)
  $

These are fixed, the warnings continue to be produced.

Reported-by: Tom de Vries
Bugtracker: https://github.com/acmel/dwarves/issues/9#issuecomment-696282005
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-22 09:51:37 -03:00
Arnaldo Carvalho de Melo 2e8cd6a435 dwarf_loader: Ignore DW_TAG_variant_part for now to fix a segfault
[simple.debug.gz](https://github.com/acmel/dwarves/files/5257290/simple.debug.gz)

Simple.debug, a rust executable:
```
$ ~/dwarves/build/pahole simple.debug
die__process_function: tag not supported 0x2f (template_type_parameter)!
die__process_class: DW_TAG_variant_part (0x33) @ <0x220> not handled!
Segmentation fault (core dumped)

Added a XXX for looking into DW_TAG_variant_part later, with that:

  $ pahole examples/rust/simple.debug
  die__process_function: tag not supported 0x2f (template_type_parameter)!
  die__process_class: tag not supported 0x33 (variant_part)!
  die__create_new_enumeration: DW_TAG_subprogram (0x2e) @ <0x2da3> not handled!
  struct (core::ptr::non_null::NonNull<u8>, core::alloc::layout::Layout) {
  	struct NonNull<u8>         __0 __attribute__((__aligned__(8))); /*     0     8 */
  	struct Layout              __1 __attribute__((__aligned__(8))); /*     8    16 */

  	/* size: 24, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 24 bytes */
  } __attribute__((__aligned__(8)));
  struct vtable {

  	/* size: 0, cachelines: 0, members: 0 */
  } __attribute__((__aligned__(8)));
  struct (&usize, &usize) {
  	usize *                    __0 __attribute__((__aligned__(8))); /*     0     8 */
  	usize *                    __1 __attribute__((__aligned__(8))); /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct &str {
  	u8 *                       data_ptr __attribute__((__aligned__(8))); /*     0     8 */
  	usize                      length __attribute__((__aligned__(8))); /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct *const [u8] {
  	u8 *                       data_ptr __attribute__((__aligned__(8))); /*     0     8 */
  	usize                      length __attribute__((__aligned__(8))); /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct *const [i32] {
  	i32 *                      data_ptr __attribute__((__aligned__(8))); /*     0     8 */
  	usize                      length __attribute__((__aligned__(8))); /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct *mut [u8] {
  	u8 *                       data_ptr __attribute__((__aligned__(8))); /*     0     8 */
  	usize                      length __attribute__((__aligned__(8))); /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct &[u8] {
  	u8 *                       data_ptr __attribute__((__aligned__(8))); /*     0     8 */
  	usize                      length __attribute__((__aligned__(8))); /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct (i32, f64) {
  	i32                        __0 __attribute__((__aligned__(4))); /*     0     4 */

  	/* XXX 4 bytes hole, try to pack */

  	f64                        __1 __attribute__((__aligned__(8))); /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* sum members: 12, holes: 1, sum holes: 4 */
  	/* forced alignments: 2, forced holes: 1, sum forced holes: 4 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct &[i32] {
  	i32 *                      data_ptr __attribute__((__aligned__(8))); /*     0     8 */
  	usize                      length __attribute__((__aligned__(8))); /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct &mut [u8] {
  	u8 *                       data_ptr __attribute__((__aligned__(8))); /*     0     8 */
  	usize                      length __attribute__((__aligned__(8))); /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct &[&str] {
  	struct &str *              data_ptr __attribute__((__aligned__(8))); /*     0     8 */
  	usize                      length __attribute__((__aligned__(8))); /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct &[core::fmt::rt::v1::Argument] {
  	struct Argument *          data_ptr __attribute__((__aligned__(8))); /*     0     8 */
  	usize                      length __attribute__((__aligned__(8))); /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct &[core::fmt::ArgumentV1] {
  	struct ArgumentV1 *        data_ptr __attribute__((__aligned__(8))); /*     0     8 */
  	usize                      length __attribute__((__aligned__(8))); /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct Opaque {

  	/* size: 0, cachelines: 0, members: 0 */
  } __attribute__((__aligned__(1)));
  struct (usize, usize) {
  	usize                      __0 __attribute__((__aligned__(8))); /*     0     8 */
  	usize                      __1 __attribute__((__aligned__(8))); /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct (core::alloc::layout::Layout, usize) {
  	struct Layout              __0 __attribute__((__aligned__(8))); /*     0    16 */
  	usize                      __1 __attribute__((__aligned__(8))); /*    16     8 */

  	/* size: 24, cachelines: 1, members: 2 */
  	/* forced alignments: 2 */
  	/* last cacheline: 24 bytes */
  } __attribute__((__aligned__(8)));
  struct (usize, bool) {
  	usize                      __0 __attribute__((__aligned__(8))); /*     0     8 */
  	bool                       __1 __attribute__((__aligned__(1))); /*     8     1 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* padding: 7 */
  	/* forced alignments: 2 */
  	/* last cacheline: 16 bytes */
  } __attribute__((__aligned__(8)));
  struct backtrace_state {
  	const char  *              filename;             /*     0     8 */
  	int                        threaded;             /*     8     4 */

  	/* XXX 4 bytes hole, try to pack */

  	void *                     lock;                 /*    16     8 */
  	fileline                   fileline_fn;          /*    24     8 */
  	void *                     fileline_data;        /*    32     8 */
  	syminfo                    syminfo_fn;           /*    40     8 */
  	void *                     syminfo_data;         /*    48     8 */
  	int                        fileline_initialization_failed; /*    56     4 */
  	int                        lock_alloc;           /*    60     4 */
  	/* --- cacheline 1 boundary (64 bytes) --- */
  	struct backtrace_freelist_struct * freelist;     /*    64     8 */

  	/* size: 72, cachelines: 2, members: 10 */
  	/* sum members: 68, holes: 1, sum holes: 4 */
  	/* last cacheline: 8 bytes */
  };
  struct timespec {
  	__time_t                   tv_sec;               /*     0     8 */
  	__syscall_slong_t          tv_nsec;              /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* last cacheline: 16 bytes */
  };
  struct stat {
  	__dev_t                    st_dev;               /*     0     8 */
  	__ino_t                    st_ino;               /*     8     8 */
  	__nlink_t                  st_nlink;             /*    16     8 */
  	__mode_t                   st_mode;              /*    24     4 */
  	__uid_t                    st_uid;               /*    28     4 */
  	__gid_t                    st_gid;               /*    32     4 */
  	int                        __pad0;               /*    36     4 */
  	__dev_t                    st_rdev;              /*    40     8 */
  	__off_t                    st_size;              /*    48     8 */
  	__blksize_t                st_blksize;           /*    56     8 */
  	/* --- cacheline 1 boundary (64 bytes) --- */
  	__blkcnt_t                 st_blocks;            /*    64     8 */
  	struct timespec            st_atim;              /*    72    16 */
  	struct timespec            st_mtim;              /*    88    16 */
  	struct timespec            st_ctim;              /*   104    16 */
  	__syscall_slong_t          __glibc_reserved[3];  /*   120    24 */

  	/* size: 144, cachelines: 3, members: 15 */
  	/* last cacheline: 16 bytes */
  };
  struct dl_phdr_info {
  	Elf64_Addr                 dlpi_addr;            /*     0     8 */
  	const char  *              dlpi_name;            /*     8     8 */
  	const Elf64_Phdr  *        dlpi_phdr;            /*    16     8 */
  	Elf64_Half                 dlpi_phnum;           /*    24     2 */

  	/* XXX 6 bytes hole, try to pack */

  	long long unsigned int     dlpi_adds;            /*    32     8 */
  	long long unsigned int     dlpi_subs;            /*    40     8 */
  	size_t                     dlpi_tls_modid;       /*    48     8 */
  	void *                     dlpi_tls_data;        /*    56     8 */

  	/* size: 64, cachelines: 1, members: 8 */
  	/* sum members: 58, holes: 1, sum holes: 6 */
  };
  struct backtrace_view {
  	const void  *              data;                 /*     0     8 */
  	void *                     base;                 /*     8     8 */
  	size_t                     len;                  /*    16     8 */

  	/* size: 24, cachelines: 1, members: 3 */
  	/* last cacheline: 24 bytes */
  };
  struct dwarf_sections {
  	const unsigned char  *     data[9];              /*     0    72 */
  	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
  	size_t                     size[9];              /*    72    72 */

  	/* size: 144, cachelines: 3, members: 2 */
  	/* last cacheline: 16 bytes */
  };
  struct debug_section_info {
  	off_t                      offset;               /*     0     8 */
  	size_t                     size;                 /*     8     8 */
  	const unsigned char  *     data;                 /*    16     8 */
  	int                        compressed;           /*    24     4 */

  	/* size: 32, cachelines: 1, members: 4 */
  	/* padding: 4 */
  	/* last cacheline: 32 bytes */
  };
  struct elf_symbol {
  	const char  *              name;                 /*     0     8 */
  	uintptr_t                  address;              /*     8     8 */
  	size_t                     size;                 /*    16     8 */

  	/* size: 24, cachelines: 1, members: 3 */
  	/* last cacheline: 24 bytes */
  };
  struct elf_syminfo_data {
  	struct elf_syminfo_data *  next;                 /*     0     8 */
  	struct elf_symbol *        symbols;              /*     8     8 */
  	size_t                     count;                /*    16     8 */

  	/* size: 24, cachelines: 1, members: 3 */
  	/* last cacheline: 24 bytes */
  };
  struct elf_ppc64_opd_data {
  	b_elf_addr                 addr;                 /*     0     8 */
  	const char  *              data;                 /*     8     8 */
  	size_t                     size;                 /*    16     8 */
  	struct backtrace_view      view;                 /*    24    24 */

  	/* size: 48, cachelines: 1, members: 4 */
  	/* last cacheline: 48 bytes */
  };
  struct phdr_data {
  	struct backtrace_state *   state;                /*     0     8 */
  	backtrace_error_callback   error_callback;       /*     8     8 */
  	void *                     data;                 /*    16     8 */
  	fileline *                 fileline_fn;          /*    24     8 */
  	int *                      found_sym;            /*    32     8 */
  	int *                      found_dwarf;          /*    40     8 */
  	const char  *              exe_filename;         /*    48     8 */
  	int                        exe_descriptor;       /*    56     4 */

  	/* size: 64, cachelines: 1, members: 8 */
  	/* padding: 4 */
  };
  struct backtrace_vector {
  	void *                     base;                 /*     0     8 */
  	size_t                     size;                 /*     8     8 */
  	size_t                     alc;                  /*    16     8 */

  	/* size: 24, cachelines: 1, members: 3 */
  	/* last cacheline: 24 bytes */
  };
  struct dwarf_buf {
  	const char  *              name;                 /*     0     8 */
  	const unsigned char  *     start;                /*     8     8 */
  	const unsigned char  *     buf;                  /*    16     8 */
  	size_t                     left;                 /*    24     8 */
  	int                        is_bigendian;         /*    32     4 */

  	/* XXX 4 bytes hole, try to pack */

  	backtrace_error_callback   error_callback;       /*    40     8 */
  	void *                     data;                 /*    48     8 */
  	int                        reported_underflow;   /*    56     4 */

  	/* size: 64, cachelines: 1, members: 8 */
  	/* sum members: 56, holes: 1, sum holes: 4 */
  	/* padding: 4 */
  };
  struct attr {
  	enum dwarf_attribute       name;                 /*     0     4 */
  	enum dwarf_form            form;                 /*     4     4 */
  	int64_t                    val;                  /*     8     8 */

  	/* size: 16, cachelines: 1, members: 3 */
  	/* last cacheline: 16 bytes */
  };
  struct abbrev {
  	uint64_t                   code;                 /*     0     8 */
  	enum dwarf_tag             tag;                  /*     8     4 */
  	int                        has_children;         /*    12     4 */
  	size_t                     num_attrs;            /*    16     8 */
  	struct attr *              attrs;                /*    24     8 */

  	/* size: 32, cachelines: 1, members: 5 */
  	/* last cacheline: 32 bytes */
  };
  struct abbrevs {
  	size_t                     num_abbrevs;          /*     0     8 */
  	struct abbrev *            abbrevs;              /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* last cacheline: 16 bytes */
  };
  struct attr_val {
  	enum attr_val_encoding     encoding;             /*     0     4 */

  	/* XXX 4 bytes hole, try to pack */

  	union {
  		uint64_t           uint;                 /*     8     8 */
  		int64_t            sint;                 /*     8     8 */
  		const char  *      string;               /*     8     8 */
  	} u;                                             /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* sum members: 12, holes: 1, sum holes: 4 */
  	/* last cacheline: 16 bytes */
  };
  struct line_header {
  	int                        version;              /*     0     4 */
  	int                        addrsize;             /*     4     4 */
  	unsigned int               min_insn_len;         /*     8     4 */
  	unsigned int               max_ops_per_insn;     /*    12     4 */
  	int                        line_base;            /*    16     4 */
  	unsigned int               line_range;           /*    20     4 */
  	unsigned int               opcode_base;          /*    24     4 */

  	/* XXX 4 bytes hole, try to pack */

  	const unsigned char  *     opcode_lengths;       /*    32     8 */
  	size_t                     dirs_count;           /*    40     8 */
  	const char  * *            dirs;                 /*    48     8 */
  	size_t                     filenames_count;      /*    56     8 */
  	/* --- cacheline 1 boundary (64 bytes) --- */
  	const char  * *            filenames;            /*    64     8 */

  	/* size: 72, cachelines: 2, members: 12 */
  	/* sum members: 68, holes: 1, sum holes: 4 */
  	/* last cacheline: 8 bytes */
  };
  struct line_header_format {
  	int                        lnct;                 /*     0     4 */
  	enum dwarf_form            form;                 /*     4     4 */

  	/* size: 8, cachelines: 1, members: 2 */
  	/* last cacheline: 8 bytes */
  };
  struct line {
  	uintptr_t                  pc;                   /*     0     8 */
  	const char  *              filename;             /*     8     8 */
  	int                        lineno;               /*    16     4 */
  	int                        idx;                  /*    20     4 */

  	/* size: 24, cachelines: 1, members: 4 */
  	/* last cacheline: 24 bytes */
  };
  struct line_vector {
  	struct backtrace_vector    vec;                  /*     0    24 */
  	size_t                     count;                /*    24     8 */

  	/* size: 32, cachelines: 1, members: 2 */
  	/* last cacheline: 32 bytes */
  };
  struct function {
  	const char  *              name;                 /*     0     8 */
  	const char  *              caller_filename;      /*     8     8 */
  	int                        caller_lineno;        /*    16     4 */

  	/* XXX 4 bytes hole, try to pack */

  	struct function_addrs *    function_addrs;       /*    24     8 */
  	size_t                     function_addrs_count; /*    32     8 */

  	/* size: 40, cachelines: 1, members: 5 */
  	/* sum members: 36, holes: 1, sum holes: 4 */
  	/* last cacheline: 40 bytes */
  };
  struct function_addrs {
  	uint64_t                   low;                  /*     0     8 */
  	uint64_t                   high;                 /*     8     8 */
  	struct function *          function;             /*    16     8 */

  	/* size: 24, cachelines: 1, members: 3 */
  	/* last cacheline: 24 bytes */
  };
  struct function_vector {
  	struct backtrace_vector    vec;                  /*     0    24 */
  	size_t                     count;                /*    24     8 */

  	/* size: 32, cachelines: 1, members: 2 */
  	/* last cacheline: 32 bytes */
  };
  struct unit {
  	const unsigned char  *     unit_data;            /*     0     8 */
  	size_t                     unit_data_len;        /*     8     8 */
  	size_t                     unit_data_offset;     /*    16     8 */
  	size_t                     low_offset;           /*    24     8 */
  	size_t                     high_offset;          /*    32     8 */
  	int                        version;              /*    40     4 */
  	int                        is_dwarf64;           /*    44     4 */
  	int                        addrsize;             /*    48     4 */

  	/* XXX 4 bytes hole, try to pack */

  	off_t                      lineoff;              /*    56     8 */
  	/* --- cacheline 1 boundary (64 bytes) --- */
  	uint64_t                   str_offsets_base;     /*    64     8 */
  	uint64_t                   addr_base;            /*    72     8 */
  	uint64_t                   rnglists_base;        /*    80     8 */
  	const char  *              filename;             /*    88     8 */
  	const char  *              comp_dir;             /*    96     8 */
  	const char  *              abs_filename;         /*   104     8 */
  	struct abbrevs             abbrevs;              /*   112    16 */
  	/* --- cacheline 2 boundary (128 bytes) --- */
  	struct line *              lines;                /*   128     8 */
  	size_t                     lines_count;          /*   136     8 */
  	struct function_addrs *    function_addrs;       /*   144     8 */
  	size_t                     function_addrs_count; /*   152     8 */

  	/* size: 160, cachelines: 3, members: 20 */
  	/* sum members: 156, holes: 1, sum holes: 4 */
  	/* last cacheline: 32 bytes */
  };
  struct unit_addrs {
  	uint64_t                   low;                  /*     0     8 */
  	uint64_t                   high;                 /*     8     8 */
  	struct unit *              u;                    /*    16     8 */

  	/* size: 24, cachelines: 1, members: 3 */
  	/* last cacheline: 24 bytes */
  };
  struct unit_addrs_vector {
  	struct backtrace_vector    vec;                  /*     0    24 */
  	size_t                     count;                /*    24     8 */

  	/* size: 32, cachelines: 1, members: 2 */
  	/* last cacheline: 32 bytes */
  };
  struct unit_vector {
  	struct backtrace_vector    vec;                  /*     0    24 */
  	size_t                     count;                /*    24     8 */

  	/* size: 32, cachelines: 1, members: 2 */
  	/* last cacheline: 32 bytes */
  };
  struct dwarf_data {
  	struct dwarf_data *        next;                 /*     0     8 */
  	struct dwarf_data *        altlink;              /*     8     8 */
  	uintptr_t                  base_address;         /*    16     8 */
  	struct unit_addrs *        addrs;                /*    24     8 */
  	size_t                     addrs_count;          /*    32     8 */
  	struct unit * *            units;                /*    40     8 */
  	size_t                     units_count;          /*    48     8 */
  	struct dwarf_sections      dwarf_sections;       /*    56   144 */
  	/* --- cacheline 3 boundary (192 bytes) was 8 bytes ago --- */
  	int                        is_bigendian;         /*   200     4 */

  	/* XXX 4 bytes hole, try to pack */

  	struct function_vector     fvec;                 /*   208    32 */

  	/* size: 240, cachelines: 4, members: 10 */
  	/* sum members: 236, holes: 1, sum holes: 4 */
  	/* last cacheline: 48 bytes */
  };
  struct pcrange {
  	uint64_t                   lowpc;                /*     0     8 */
  	int                        have_lowpc;           /*     8     4 */
  	int                        lowpc_is_addr_index;  /*    12     4 */
  	uint64_t                   highpc;               /*    16     8 */
  	int                        have_highpc;          /*    24     4 */
  	int                        highpc_is_relative;   /*    28     4 */
  	int                        highpc_is_addr_index; /*    32     4 */

  	/* XXX 4 bytes hole, try to pack */

  	uint64_t                   ranges;               /*    40     8 */
  	int                        have_ranges;          /*    48     4 */
  	int                        ranges_is_index;      /*    52     4 */

  	/* size: 56, cachelines: 1, members: 10 */
  	/* sum members: 52, holes: 1, sum holes: 4 */
  	/* last cacheline: 56 bytes */
  };
  struct stat64 {
  	__dev_t                    st_dev;               /*     0     8 */
  	__ino64_t                  st_ino;               /*     8     8 */
  	__nlink_t                  st_nlink;             /*    16     8 */
  	__mode_t                   st_mode;              /*    24     4 */
  	__uid_t                    st_uid;               /*    28     4 */
  	__gid_t                    st_gid;               /*    32     4 */
  	int                        __pad0;               /*    36     4 */
  	__dev_t                    st_rdev;              /*    40     8 */
  	__off_t                    st_size;              /*    48     8 */
  	__blksize_t                st_blksize;           /*    56     8 */
  	/* --- cacheline 1 boundary (64 bytes) --- */
  	__blkcnt64_t               st_blocks;            /*    64     8 */
  	struct timespec            st_atim;              /*    72    16 */
  	struct timespec            st_mtim;              /*    88    16 */
  	struct timespec            st_ctim;              /*   104    16 */
  	__syscall_slong_t          __glibc_reserved[3];  /*   120    24 */

  	/* size: 144, cachelines: 3, members: 15 */
  	/* last cacheline: 16 bytes */
  };

Reported-by: Tom de Vries
Bugtracker: https://github.com/acmel/dwarves/issues/9#issuecomment-696277814
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-22 09:35:39 -03:00
Arnaldo Carvalho de Melo e9e6285fd0 dwarf_loader: Skip empty CUs
Some parts are corrupt, and most are empty CUs, that were causing the
segfault, now pahole doesn't crash and pdwtags, a debug utility that
prints most dwarf tags, produces some output.

  [acme@quaco pahole]$ readelf -wi examples/asm/dw2-error.debug  | grep -i corrupt
  readelf: examples/asm/dw2-error.debug: Warning: Invalid pointer size (0) in compunit header, using 4 instead
  readelf: examples/asm/dw2-error.debug: Warning: CU at offset 90 contains corrupt or unsupported version number: 153.
  [acme@quaco pahole]$ pahole examples/asm/dw2-error.debug
  [acme@quaco pahole]$ pdwtags examples/asm/dw2-error.debug
  /* Types: */

  /* Functions: */

  /* Variables: */

  /* Types: */

  /* 1 */
  int
   /* size: 4 */

  /* 2 */
  /* tag__fprintf: const_type tag not supported! */; /* size: 4 */

  /* Functions: */

  /* Variables: */

  const int                  _IO_stdin_used; /* size: 4 */

  /* Types: */

  /* Functions: */

  /* Variables: */

  [acme@quaco pahole]$

Reported-by: Tom de Vries
Bugtracker: https://github.com/acmel/dwarves/issues/9#issuecomment-696284602
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-22 09:22:58 -03:00
Hao Luo 1abc001417 btf_encoder: Introduce option '--btf_encode_force'
Commit f3d9054ba8 ("btf_encoder: Teach pahole to store percpu
variables in vmlinux BTF.") introduced an option '-j' that makes
effort in emitting VAR entries in BTF. Before no one has been using
this flag, replace the one-letter option '-j' with a full flag name
'--btf_encode_force' to save '-j' for future uses.

Committer notes:

Added missing man page entry.

Signed-off-by: Hao Luo <haoluo@google.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-21 17:02:57 -03:00
Hao Luo da4ad2f650 btf_encoder: Allow disabling BTF var encoding.
A new feature was introduced in commit f3d9054ba8 ("btf_encoder: Teach
pahole to store percpu variables in vmlinux BTF.") which encodes kernel
percpu variables into BTF. Add a flag --skip_encoding_btf_vars to allow
users to toggle this feature off, so that the rollout of pahole v1.18
can be protected by potential bugs in this feature.

Committer notes:

Added missing man page entry.

Signed-off-by: Hao Luo <haoluo@google.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-21 17:00:16 -03:00
Arnaldo Carvalho de Melo f5847773d9 fprintf: Support DW_TAG_string_type
We don't really reconstruct source code for FORTRAN, we just print it as
if it was C:

  $ pahole examples/fortran95/derived-type.debug
  struct bar {
  	integer(kind=4)            c;                    /*     0     4 */
  	real(kind=4)               d;                    /*     4     4 */

  	/* size: 8, cachelines: 1, members: 2 */
  	/* last cacheline: 8 bytes */
  };
  struct foo {
  	real(kind=4)               a;                    /*     0     4 */
  	struct bar                 x;                    /*     4     8 */
  	string                     b[7];                 /*    12     7 */

  	/* size: 20, cachelines: 1, members: 3 */
  	/* padding: 1 */
  	/* last cacheline: 20 bytes */
  };
  $

This comes from GCC build tests:

  $ readelf -wi examples/fortran95/derived-type.debug | grep Fortran -A2
      <9c>   DW_AT_producer    : (indirect string, offset: 0x1fb): GNU Fortran2008 10.2.1 20200728 [revision c0438ced53bcf57e4ebb1c38c226e41571aca892] -mtune=generic -march=x86-64 -g -fno-stack-protector -J /home/vries/gdb_versions/devel/build/gdb/testsuite/outputs/gdb.fortran/derived-type -fintrinsic-modules-path /usr/lib64/gcc/x86_64-suse-linux/10/finclude -fpre-include=/usr/include/finclude/math-vector-fortran.h
      <a0>   DW_AT_language    : 14     (Fortran 95)
      <a1>   DW_AT_identifier_case: 2   (down_case)
      <a2>   DW_AT_name        : (indirect string, offset: 0x365): /home/vries/gdb_versions/devel/src/gdb/testsuite/gdb.fortran/derived-type.f90
  [acme@five pahole]$ readelf -wi examples/fortran95/derived-type.debug | grep DW_TAG_string_type -A2
   <1><122>: Abbrev Number: 6 (DW_TAG_string_type)
      <123>   DW_AT_byte_size   : 7
  $

Now lets see whats more that is there segfaulting pahole, but for now I
think I don't have any segfaults, so just wait a bit for Hao to submit
the patch to selectively encode the per-cpu variables in BTF and then
cut v1.18.

Reported-by: Tom de Vries
Bugtracker: https://github.com/acmel/dwarves/issues/9
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-18 18:19:22 -03:00
Arnaldo Carvalho de Melo 8b00a5cd67 dwarf_loader: Support DW_TAG_string_type
FORTRAN 95 stuff:

  $ readelf -wi examples/fortran95/derived-type.debug | grep Fortran -A2
      <9c>   DW_AT_producer    : (indirect string, offset: 0x1fb): GNU Fortran2008 10.2.1 20200728 [revision c0438ced53bcf57e4ebb1c38c226e41571aca892] -mtune=generic -march=x86-64 -g -fno-stack-protector -J /home/vries/gdb_versions/devel/build/gdb/testsuite/outputs/gdb.fortran/derived-type -fintrinsic-modules-path /usr/lib64/gcc/x86_64-suse-linux/10/finclude -fpre-include=/usr/include/finclude/math-vector-fortran.h
      <a0>   DW_AT_language    : 14	(Fortran 95)
      <a1>   DW_AT_identifier_case: 2	(down_case)
      <a2>   DW_AT_name        : (indirect string, offset: 0x365): /home/vries/gdb_versions/devel/src/gdb/testsuite/gdb.fortran/derived-type.f90
  [acme@five pahole]$ readelf -wi examples/fortran95/derived-type.debug | grep DW_TAG_string_type -A2
   <1><122>: Abbrev Number: 6 (DW_TAG_string_type)
      <123>   DW_AT_byte_size   : 7
  $

Kinda like an array, but number of entries is given by DW_AT_byte_size,
if it isn't there, then 1 byte. DW_TAG_array_type can have zero size.

Next patch will pretty print it.

Reported-by: Tom de Vries
Bugtracker: https://github.com/acmel/dwarves/issues/9
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-18 18:18:30 -03:00
Arnaldo Carvalho de Melo 0d9c3c9835 dwarves: Check if a member type wasn't found and avoid a NULL deref
This dodges a SEGFAULT at type__check_structs_at_unnatural_alignments()
so that we can finish processing, give the warnings and produce as much
as we can:

  $ pahole examples/fortran95/derived-type.debug
  die__process_unit: DW_TAG_string_type (0x12) @ <0x122> not handled!
  namespace__recode_dwarf_types: couldn't find 0x122 type for 0x116 (member)!
  struct bar {
  	integer(kind=4)            c;                    /*     0     4 */
  	real(kind=4)               d;                    /*     4     4 */

  	/* size: 8, cachelines: 1, members: 2 */
  	/* last cacheline: 8 bytes */
  };
  struct foo {
  	real(kind=4)               a;                    /*     0     4 */
  	struct bar                 x;                    /*     4     8 */
  	<ERROR(__class__fprintf:1519): 0 not found!>

  	/* size: 20, cachelines: 1, members: 3 */
  	/* padding: 8 */
  	/* last cacheline: 20 bytes */
  };
  $

Reported-by: Tom de Vries
Bugtracker: https://github.com/acmel/dwarves/issues/9
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-18 17:29:24 -03:00
Arnaldo Carvalho de Melo 2ecc308518 dwarf_loader: Bail out at DW_TAG_imported_unit tags
We need to support these in a future version, for now, just bail out to
avoid segfaults afterwards.

  $ pahole examples/sles/vmlinux-5.3.18-109.g8ff6392-default.debug
  WARNING: DW_TAG_partial_unit used, some types will not be considered!
           Probably this was optimized using a tool like 'dwz'
           A future version of pahole will take support this.
  $

Reported-by: Tom de Vries
Bugtracker: https://github.com/acmel/dwarves/issues/10
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-18 17:28:52 -03:00
Arnaldo Carvalho de Melo 8c92fd2981 dwarf_loader: Ignore entries in a DW_TAG_partial_unit, for now
We will have to keep all CUs in memory and do lookups in imported units,
for now, just don't segfault.

Reported-by: Tom de Vries
Bugtracker: https://github.com/acmel/dwarves/issues/10
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-18 17:28:12 -03:00
Arnaldo Carvalho de Melo 4cfd420f7e README: Add instructions to do a cross build
This was on a ubuntu:18.04 system, using their cross build packages and
elfutils and zlib cross built from sources.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-17 09:52:02 -03:00
Arnaldo Carvalho de Melo 9e495f68c6 dwarf_loader: Move vaddr to conditional where it is used
To avoid build failures in architectures where HAVE_DWFL_MODULE_BUILD_ID
isn't defined.

Noticed while cross building for s390x.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-17 09:40:37 -03:00
Arnaldo Carvalho de Melo 69fce76207 pahole: Use "%s" in a snprintf call
To address this clang 11 build error:

[ 86%] Building C object CMakeFiles/pahole.dir/pahole.c.o
/home/acme/git/pahole/pahole.c:1626:33: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security]
                        snprintf(name, sizeof(name), enumerator__name(enumerator, cu_enumerator));
                                                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/acme/git/pahole/pahole.c:1626:33: note: treat the string as an argument to avoid this
                        snprintf(name, sizeof(name), enumerator__name(enumerator, cu_enumerator));
                                                     ^
                                                     "%s",
1 error generated.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-12 16:33:12 -03:00
Arnaldo Carvalho de Melo 22f93766cf pahole: Support multiple types for pretty printing
For now one has to specify them in the order they appear in the file,
i.e. for perf.data files where we have:

  $ pahole --hex ~/bin/perf --header=perf_file_header < perf.data
  {
  	.magic = 0x32454c4946524550,
  	.size = 0x68,
  	.attr_size = 0x88,
  	.attrs = {
  		.offset = 0x168,
  		.size = 0x220,
  	},
  	.data = {
  		.offset = 0x388,
  		.size = 0x306698,
  	},
  	.event_types = {
  		.offset = 0,
  		.size = 0,
  	},
  	.adds_features = { 0x16717ffc, 0, 0, 0 },
  },
  $

We need to ask for pretty printing the attrs then the data sections, as:

  $ pahole ~/bin/perf --header=perf_file_header \
     -C 'perf_file_attr(range=attrs),perf_event_header(range=data,sizeof,type,type_enum=perf_event_type+perf_user_event_type)' < perf.data

Notice that both types have the range= setting where in the header it
should find the instances of its respective types, the result for this
perf.data file:

  $ perf evlist
  instructions
  cycles
  cache-misses
  dummy:HG
  $

Those events have these attributes, which we'll match in the pahole
output for the header 'attrs' range:

  $ perf evlist -v
  instructions: size: 120, config: 0x1, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
  cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
  cache-misses: size: 120, config: 0x3, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1, exclude_guest: 1
  dummy:HG: type: 1, size: 120, config: 0x9, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ID|CPU|PERIOD, read_format: ID, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
  $

To make it more compact lets remove zeroed fields using grep:

  $ pahole ~/bin/perf --header=perf_file_header -C 'perf_file_attr(range=attrs),perf_event_header(range=data,sizeof,type,type_enum=perf_event_type+perf_user_event_type)' < perf.data | grep -v '= 0,'
  {
  	.attr = {
  		.size = 120,
  		.config = 1,
  		.sample_period = 4000,
  		.sample_freq = 4000,
  		.sample_type = 455,
  		.read_format = 4,
  		.disabled = 1,
  		.inherit = 1,
  		.freq = 1,
  		.sample_id_all = 1,
  		.exclude_guest = 1,
  	},
  	.ids = {
  		.offset = 104,
  		.size = 64,
  	},
  },
  {
  	.attr = {
  		.size = 120,
  		.sample_period = 4000,
  		.sample_freq = 4000,
  		.sample_type = 455,
  		.read_format = 4,
  		.disabled = 1,
  		.inherit = 1,
  		.freq = 1,
  		.sample_id_all = 1,
  		.exclude_guest = 1,
  	},
  	.ids = {
  		.offset = 168,
  		.size = 64,
  	},
  },
  {
  	.attr = {
  		.size = 120,
  		.config = 3,
  		.sample_period = 4000,
  		.sample_freq = 4000,
  		.sample_type = 455,
  		.read_format = 4,
  		.disabled = 1,
  		.inherit = 1,
  		.freq = 1,
  		.sample_id_all = 1,
  		.exclude_guest = 1,
  	},
  	.ids = {
  		.offset = 232,
  		.size = 64,
  	},
  },
  {
  	.attr = {
  		.type = 1,
  		.size = 120,
  		.config = 9,
  		.sample_period = 4000,
  		.sample_freq = 4000,
  		.sample_type = 455,
  		.read_format = 4,
  		.inherit = 1,
  		.mmap = 1,
  		.comm = 1,
  		.freq = 1,
  		.task = 1,
  		.sample_id_all = 1,
  		.mmap2 = 1,
  		.comm_exec = 1,
  		.ksymbol = 1,
  		.bpf_event = 1,
  	},
  	.ids = {
  		.offset = 296,
  		.size = 64,
  	},
  },
  {
  	.header = {
  		.type = PERF_RECORD_TIME_CONV,
  		.size = 32,
  	},
  	.time_shift = 31,
  	.time_mult = 1016798081,
  	.time_zero = 670877213069232,
  },
  {
  	.header = {
  		.type = PERF_RECORD_MMAP,
  		.misc = 1,
  		.size = 96,
  	},
  	.pid = -1,
  	.start = -1929379840,
  	.len = 14683553,
  	.pgoff = -1929379840,
  	.filename = "[kernel.kallsyms]_text",
  },
  {
  	.header = {
  		.type = PERF_RECORD_MMAP,
  		.misc = 1,
  		.size = 136,
  	},
  	.pid = -1,
  	.start = -1072852992,
  	.len = 139264,
  	.filename = "/lib/modules/5.7.8-200.fc32.x86_64/kernel/fs/fuse/fuse.ko.xz",
  },
<SNIP>
  {
  	.header = {
  		.type = PERF_RECORD_SAMPLE,
  		.misc = 1,
  		.size = 56,
  	},
  	.array = { -1927972873, 14267881360602, 671090228720656, 8685165, 7, 93802 },
  },
  {
  	.header = {
  		.type = PERF_RECORD_SAMPLE,
  		.misc = 1,
  		.size = 56,
  	},
  	.array = { -1928098583, 0, 671090229714951, 8685165, 7, 79438 },
  },
  {
  	.type = PERF_RECORD_FINISHED_ROUND,
  	.size = 8,
  },
  $

Validation is done all around:

  $ pahole ~/bin/perf --header=paerf_file_header -C 'perf_file_attr(range=attrs),perf_event_header(range=data,sizeof,type,type_enum=perf_event_type+perf_user_event_type)' < perf.data | grep -v '= 0,'
  pahole: --header_type=paerf_file_header not found
  $

  $ pahole ~/bin/perf --header=perf_file_header -C 'perf_file_atatr(range=attrs),perf_event_header(range=data,sizeof,type,type_enum=perf_event_type+perf_user_event_type)' < perf.data | grep -v '= 0,'
  pahole: type 'perf_file_atatr' not found
  $

  $ pahole ~/bin/perf --header=perf_file_header -C 'perf_file_attr(range=atrs),perf_event_header(range=data,sizeof,type,type_enum=perf_event_type+perf_user_event_type)' < perf.data | grep -v '= 0,'
  pahole: couldn't read the 'atrs.offset' member of 'perf_file_header' for evaluating range=atrs
  $
  $ pahole ~/bin/perf --header=perf_file_header -C 'perf_file_attr(range=attrs),perf_event_hader(range=data,sizeof,type,type_enum=perf_event_type+perf_user_event_type)' < perf.data | grep -v '= 0,'
  pahole: type 'perf_event_hader' not found
  $
  $ pahole ~/bin/perf --header=perf_file_header -C 'perf_file_attr(range=attrs),perf_event_header(range=daata,sizeof,type,type_enum=perf_event_type+perf_user_event_type)' < perf.data | grep -v '= 0,' > /dev/null
  pahole: couldn't read the 'daata.offset' member of 'perf_file_header' for evaluating range=daata
  $
  $ pahole ~/bin/perf --header=perf_file_header -C 'perf_file_attr(range=attrs),perf_event_header(range=data,sizeof=fads,type,type_enum=perf_event_type+perf_user_event_type)' < perf.data | grep -v '= 0,' > /dev/null
  pahole: the sizeof member 'fads' wasn't found in the 'perf_event_header' type
  pahole: type 'perf_event_header' not found or attributes not validated
  $

The algorithm to find the types was improved to not fallback at the end,
but instead go on saving types that were found, which increases the
possibility of resolving all of them, at the end we just need to check
if all that is needed was found, printing relevant messages when this
isn't the case.

More to do, like pretty printing flags such as sample_type, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 17:58:09 -03:00
Arnaldo Carvalho de Melo 78f2177d90 pahole: Print the evaluated range= per class
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 17:32:49 -03:00
Arnaldo Carvalho de Melo 5c32b9d5c7 pahole: Count the total number of bytes read from stdin
Another prep patch to allow for having multiple pretty printing types,
now we need to resolve all the types needed for all the classes to then
allow for multiple types.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 15:16:19 -03:00
Arnaldo Carvalho de Melo e3e5a4626c pahole: Make sure the header is read only once
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 15:16:19 -03:00
Arnaldo Carvalho de Melo 208bcd9873 pahole: Introduce 'range=member' as a class argument for pretty printing
I.e. this:

  pahole ~/bin/perf --header=perf_file_header \
		    -C 'perf_event_header(range=data,sizeof,type,type_enum=perf_event_type+perf_user_event_type)' < perf.data

Is equivalent to:

  pahole ~/bin/perf --header=perf_file_header --range=data \
		    -C 'perf_event_header(sizeof,type,type_enum=perf_event_type+perf_user_event_type)' < perf.data

This is prep work for pretty printing multiple types of records, doing
it just fot that per-type range.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 15:16:19 -03:00
Arnaldo Carvalho de Melo b9e4063119 pahole: Cache the type_enum lookups into struct enumerator
I.e. when we get the type= field value, look it up in the type_enum=,
get the struct enumerator and if it is the first time that we're looking
up the string representation of that enumerator, do it and then cache
the results, so that next time we reuse it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 15:16:19 -03:00
Arnaldo Carvalho de Melo fda1825f0b dwarves: Introduce tag_cu_node, so that we can have the leaner tag_cu
With just tag + cu pointers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 15:16:19 -03:00
Arnaldo Carvalho de Melo 47d4dd4c8a pahole: Optimize --header processing by keeping the first successfull instance
Since we store both the tag and the cu when we find it, defer deleting
it till we're completely sure we don't need it anymore.

Sometimes this is the only reason for us to fallback to looking at all
the cus to get all the needed types, so keeping it may make the DWARF
"stealer" to be able to, after finding the header type in one CU, find
the other needed types in another and we end up not having to process
all the CUs in a DWARF based session, speeding up the process.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 15:16:19 -03:00
Arnaldo Carvalho de Melo fdfc64ec44 pahole: Introduce --range
For a data structure like:

  $ pahole --hex ~/bin/perf --header perf_file_header < perf.data
  {
  	.magic = 0x32454c4946524550,
  	.size = 0x68,
  	.attr_size = 0x88,
  	.attrs = {
  		.offset = 0xa8,
  		.size = 0x88,
  	},
  	.data = {
  		.offset = 0x130,
  		.size = 0x588,
  	},
  	.event_types = {
  		.offset = 0,
  		.size = 0,
  	},
  	.adds_features = { 0x16717ffc, 0, 0, 0 },
  },
  $

These are now equivalent:

  $ pahole ~/bin/perf --header=perf_file_header --seek_bytes '$header.data.offset' --size_bytes='$header.data.size' -C 'perf_event_header(sizeof,type,type_enum=perf_event_type+perf_user_event_type)' --count 1 --hex < perf.data
  {
  	.header = {
  		.type = PERF_RECORD_TIME_CONV,
  		.misc = 0,
  		.size = 0x20,
  	},
  	.time_shift = 0x1f,
  	.time_mult = 0x3c9b3031,
  	.time_zero = 0x18c520cf8532e,
  },
  $ pahole ~/bin/perf --header=perf_file_header --range=data -C 'perf_event_header(sizeof,type,type_enum=perf_event_type+perf_user_event_type)' --count 1 --hex < perf.data
  {
  	.header = {
  		.type = PERF_RECORD_TIME_CONV,
  		.misc = 0,
  		.size = 0x20,
  	},
  	.time_shift = 0x1f,
  	.time_mult = 0x3c9b3031,
  	.time_zero = 0x18c520cf8532e,
  },
  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 15:16:19 -03:00
Arnaldo Carvalho de Melo f63d3677e3 pahole: Support multiple enums in type_enum=
See the pahole man page:

Some of the records are not found in 'type_enum=perf_event_type' so some of the
records don't get converted to a type that fully shows its contents. For perf
we know that those are in another enumeration, 'enum perf_user_event_type', so,
for these  cases,  we can create a 'virtual enum', i.e. the sum of two enums
and then get all those entries decoded and properly casted, first few records
with just 'enum perf_event_type':

  $ pahole ~/bin/perf --header=perf_file_header --seek_bytes '$header.data.offset' --size_bytes='$header.data.size' -C 'perf_event_header(sizeof,type,type_enum=perf_event_type)' --count 4 < perf.data
  {
       .type = 79,
       .misc = 0,
       .size = 32,
  },
  {
       .type = 73,
       .misc = 0,
       .size = 40,
  },
  {
       .type = 74,
       .misc = 0,
       .size = 32,
  },
  {
       .header = {
            .type = PERF_RECORD_CGROUP,
            .misc = 0,
            .size = 40,
       },
       .id = 1,
       .path = "/",
  },
  $

Now with both enumerations, i.e. with 'type_enum=perf_event_type+perf_user_event_type':

  $ pahole ~/bin/perf --header=perf_file_header --seek_bytes '$header.data.offset' --size_bytes='$header.data.size' -C 'perf_event_header(sizeof,type,type_enum=perf_event_type+perf_user_event_type)' --count 5 < perf.data
  {
       .header = {
            .type = PERF_RECORD_TIME_CONV,
            .misc = 0,
            .size = 32,
       },
       .time_shift = 31,
       .time_mult = 1016803377,
       .time_zero = 435759009518382,
  },
  {
       .header = {
            .type = PERF_RECORD_THREAD_MAP,
            .misc = 0,
            .size = 40,
       },
       .nr = 1,
       .entries = 0x50 0x7e 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00,
  },
  {
       .header = {
            .type = PERF_RECORD_CPU_MAP,
            .misc = 0,
            .size = 32,
       },
       .data = {
            .type = 1,
            .data = "",
       },
  },
  {
       .header = {
            .type = PERF_RECORD_CGROUP,
            .misc = 0,
            .size = 40,
       },
       .id = 1,
       .path = "/",
  },
  {
       .header = {
            .type = PERF_RECORD_CGROUP,
            .misc = 0,
            .size = 48,
       },
       .id = 1553,
       .path = "/system.slice",
  },
  $

And since the fun never ends, this needs to be properly supported (arrays of
structs):

  $ pahole ~/bin/perf -C perf_record_thread_map_entry
  struct perf_record_thread_map_entry {
  	__u64                      pid;                  /*     0     8 */
  	char                       comm[16];             /*     8    16 */

  	/* size: 24, cachelines: 1, members: 2 */
  	/* last cacheline: 24 bytes */
  };
  $

that 'nr' field:

  $ pahole ~/bin/perf -C perf_record_thread_map
  struct perf_record_thread_map {
  	struct perf_event_header   header;               /*     0     8 */
  	__u64                      nr;                   /*     8     8 */
  	struct perf_record_thread_map_entry entries[];   /*    16     0 */

  	/* size: 16, cachelines: 1, members: 3 */
  	/* last cacheline: 16 bytes */
  };
  $

So probably we need something like a file with types and its pretty printing
details, with one for 'struct perf_record_thread_map':

perf_record_thread_map(entries.nr_entries=$nr)

Meaning: the perf_record_thread_map 'entries' has a number of
sizeof(typeof entries) that is defined in the perf_record_thread_map
'nr' member.  Everything starting with a '$' needs to be evaluated, like
the '$header.*' we already have, since there is no 'namespace selector'
($header. in the header case) this means: the current record.

So, when pretty printing a record that has such type (perf_record_thread_map)
this has to be taken into account, and validated by the containing 'struct
perf_event_header->size' field.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 15:16:19 -03:00
Arnaldo Carvalho de Melo c50b6d37e9 pahole: Add infrastructure to have multiple concatenated type_enum
As sometimes we have multiple enums to represent some struct type, like
with perf_event_attr->type, that has 'enum perf_event_type' in Linux's
UAPI and 'enum perf_user_event_type' for purely userspace types, like
the ones synthesized for Intel PT, like PERF_RECORD_AUXTRACE, etc.

This patch just transforms type->type_enum into a list, the support for
multiple types comes next.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 15:16:19 -03:00