Commit Graph

15 Commits

Author SHA1 Message Date
Andrii Nakryiko ace05ba941 btf: Add support for split BTF loading and encoding
Add support for generating split BTF, in which there is a designated base
BTF, containing a base set of types, and a split BTF, which extends main BTF
with extra types, that can reference types and strings from the main BTF.

This is going to be used to generate compact BTFs for kernel modules, with
vmlinux BTF being a main BTF, which all kernel modules are based off of.

These changes rely on patch set [0] to be present in libbpf submodule.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=377859&state=*

Committer notes:

Fixed up wrt ARGP_numeric_version and added a man page entry.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-11 09:12:44 -03:00
Andrii Nakryiko 2e719cca66 btf_encoder: revamp how per-CPU variables are encoded
Right now to encode per-CPU variables in BTF, pahole iterates complete vmlinux
symbol table for each CU. There are 2500 CUs for a typical kernel image.
Overall, to encode 287 per-CPU variables pahole spends more than 10% of its CPU
budget, this is incredibly wasteful.

This patch revamps how this is done. Now it pre-processes symbol table once
before any of per-CU processing starts. It remembers each per-CPU variable
symbol, including its address, size, and name. Then during processing each CU,
binary search is used to correlate DWARF variable with per-CPU symbols and
figure out if variable belongs to per-CPU data section. If the match is found,
BTF_KIND_VAR is emitted and var_secinfo is recorded, just like before. At the
very end, after all CUs are processed, BTF_KIND_DATASEC is emitted with sorted
variables.

This change makes per-CPU variables generation overhead pretty negligible and
returns back about 10% of CPU usage.

Performance counter stats for './pahole -J /home/andriin/linux-build/default/vmlinux':

BEFORE:
      19.160149000 seconds user
       1.304873000 seconds sys

         24,114.05 msec task-clock                #    0.999 CPUs utilized
                83      context-switches          #    0.003 K/sec
                 0      cpu-migrations            #    0.000 K/sec
           622,417      page-faults               #    0.026 M/sec
    72,897,315,125      cycles                    #    3.023 GHz                      (25.02%)
   127,807,316,959      instructions              #    1.75  insn per cycle           (25.01%)
    29,087,179,117      branches                  # 1206.234 M/sec                    (25.01%)
       464,105,921      branch-misses             #    1.60% of all branches          (25.01%)
    30,252,119,368      L1-dcache-loads           # 1254.543 M/sec                    (25.01%)
     1,156,336,207      L1-dcache-load-misses     #    3.82% of all L1-dcache hits    (25.05%)
       343,373,503      LLC-loads                 #   14.240 M/sec                    (25.02%)
        12,044,977      LLC-load-misses           #    3.51% of all LL-cache hits     (25.01%)

      24.136198321 seconds time elapsed

      22.729693000 seconds user
       1.384859000 seconds sys

AFTER:
      16.781455000 seconds user
       1.343956000 seconds sys

         23,398.77 msec task-clock                #    1.000 CPUs utilized
                86      context-switches          #    0.004 K/sec
                 0      cpu-migrations            #    0.000 K/sec
           622,420      page-faults               #    0.027 M/sec
    68,395,641,468      cycles                    #    2.923 GHz                      (25.05%)
   114,241,327,034      instructions              #    1.67  insn per cycle           (25.01%)
    26,330,711,718      branches                  # 1125.303 M/sec                    (25.01%)
       465,926,869      branch-misses             #    1.77% of all branches          (25.00%)
    24,662,984,772      L1-dcache-loads           # 1054.029 M/sec                    (25.00%)
     1,054,052,064      L1-dcache-load-misses     #    4.27% of all L1-dcache hits    (25.00%)
       340,970,622      LLC-loads                 #   14.572 M/sec                    (25.00%)
        16,032,297      LLC-load-misses           #    4.70% of all LL-cache hits     (25.03%)

      23.402259654 seconds time elapsed

      21.916437000 seconds user
       1.482826000 seconds sys

Committer testing:

  $ grep 'model name' -m1 /proc/cpuinfo
  model name	: AMD Ryzen 9 3900X 12-Core Processor
  $

Before:

  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

            9,730.28 msec task-clock:u              #    0.998 CPUs utilized            ( +-  0.54% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             353,854      page-faults:u             #    0.036 M/sec                    ( +-  0.00% )
      39,721,726,459      cycles:u                  #    4.082 GHz                      ( +-  0.07% )  (83.33%)
         626,010,654      stalled-cycles-frontend:u #    1.58% frontend cycles idle     ( +-  0.91% )  (83.33%)
       7,518,333,691      stalled-cycles-backend:u  #   18.93% backend cycles idle      ( +-  0.56% )  (83.33%)
      85,477,123,093      instructions:u            #    2.15  insn per cycle
                                                    #    0.09  stalled cycles per insn  ( +-  0.02% )  (83.34%)
      19,346,085,683      branches:u                # 1988.235 M/sec                    ( +-  0.03% )  (83.34%)
         237,291,787      branch-misses:u           #    1.23% of all branches          ( +-  0.15% )  (83.33%)

              9.7465 +- 0.0524 seconds time elapsed  ( +-  0.54% )

  $

After:

  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

            8,953.80 msec task-clock:u              #    0.998 CPUs utilized            ( +-  0.09% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             353,855      page-faults:u             #    0.040 M/sec                    ( +-  0.00% )
      35,775,730,539      cycles:u                  #    3.996 GHz                      ( +-  0.07% )  (83.33%)
         579,534,836      stalled-cycles-frontend:u #    1.62% frontend cycles idle     ( +-  2.21% )  (83.33%)
       5,719,840,144      stalled-cycles-backend:u  #   15.99% backend cycles idle      ( +-  0.93% )  (83.33%)
      73,035,744,786      instructions:u            #    2.04  insn per cycle
                                                    #    0.08  stalled cycles per insn  ( +-  0.02% )  (83.34%)
      16,798,017,844      branches:u                # 1876.077 M/sec                    ( +-  0.05% )  (83.33%)
         237,777,143      branch-misses:u           #    1.42% of all branches          ( +-  0.15% )  (83.34%)

             8.97077 +- 0.00803 seconds time elapsed  ( +-  0.09% )

  $

Indeed, about 10% shaved, not bad.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Hao Luo <haoluo@google.com>
Cc: Oleg Rombakh <olegrom@google.com>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-09 12:57:35 -03:00
Andrii Nakryiko 48efa92933 btf_encoder: Use libbpf APIs to encode BTF type info
Switch to use libbpf's BTF writing APIs to encode BTF. This reconciles
btf_elf's use of internal struct btf from libbpf for both loading and
encoding BTF type info. This change also saves a considerable amount of
memory used for DWARF to BTF conversion due to avoiding extra memory
copy between gobuffers and libbpf's struct btf. Now that pahole uses
libbpf's struct btf, it's possible to further utilize libbpf's features
and APIs, e.g., for handling endianness conversion, for dumping raw BTF
type info during encoding. These features might be implemented in the
follow up patches.

Committer notes:

Built with 'cmake -DCMAKE_BUILD_TYPE=Release'

Before:

  $ cp ~/git/build/bpf-next-v5.9.0-rc8+/vmlinux .
  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

           10,065.20 msec task-clock:u              #    0.998 CPUs utilized            ( +-  0.68% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             514,596      page-faults:u             #    0.051 M/sec                    ( +-  0.00% )
      40,098,447,225      cycles:u                  #    3.984 GHz                      ( +-  0.26% )  (83.33%)
         547,247,149      stalled-cycles-frontend:u #    1.36% frontend cycles idle     ( +-  2.00% )  (83.33%)
       6,493,462,167      stalled-cycles-backend:u  #   16.19% backend cycles idle      ( +-  1.53% )  (83.33%)
      86,338,929,286      instructions:u            #    2.15  insn per cycle
                                                    #    0.08  stalled cycles per insn  ( +-  0.01% )  (83.34%)
      19,859,060,127      branches:u                # 1973.043 M/sec                    ( +-  0.02% )  (83.33%)
         288,389,742      branch-misses:u           #    1.45% of all branches          ( +-  0.13% )  (83.33%)

             10.0831 +- 0.0683 seconds time elapsed  ( +-  0.68% )

  $

After:

  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

           10,043.94 msec task-clock:u              #    0.998 CPUs utilized            ( +-  0.69% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             412,035      page-faults:u             #    0.041 M/sec                    ( +-  0.00% )
      39,985,610,202      cycles:u                  #    3.981 GHz                      ( +-  0.18% )  (83.33%)
         657,352,766      stalled-cycles-frontend:u #    1.64% frontend cycles idle     ( +-  2.79% )  (83.33%)
       7,387,740,861      stalled-cycles-backend:u  #   18.48% backend cycles idle      ( +-  1.65% )  (83.33%)
      85,926,053,845      instructions:u            #    2.15  insn per cycle
                                                    #    0.09  stalled cycles per insn  ( +-  0.04% )  (83.34%)
      19,428,047,875      branches:u                # 1934.305 M/sec                    ( +-  0.05% )  (83.33%)
         240,156,838      branch-misses:u           #    1.24% of all branches          ( +-  0.14% )  (83.34%)

             10.0609 +- 0.0696 seconds time elapsed  ( +-  0.69% )

  $

  $ ./btfdiff vmlinux
  $

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-09 12:49:37 -03:00
Andrii Nakryiko 5d863aa7ce btf_loader: Use libbpf to load BTF
Switch BTF loading to completely use libbpf's own struct btf and related
APIs.

BTF encoding is still happening with pahole's own code, so these two
code paths are not sharing anything now. String fetching is happening
based on whether btfe->strings were set to non-NULL pointer by
btf_encoder.

Committer testing:

  $ cp ~/git/build/bpf-next-v5.9.0-rc8+/vmlinux .
  $ readelf -SW vmlinux  | grep BTF
    [24] .BTF      PROGBITS  ffffffff82494ac0 1694ac0 340207 00   A  0  0  1
    [25] .BTF_ids  PROGBITS  ffffffff827d4cc8 19d4cc8 0000a4 00   A  0  0  1
  $ ./btfdiff vmlinux
  $

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-09 12:43:42 -03:00
Hao Luo f3d9054ba8 btf_encoder: Teach pahole to store percpu variables in vmlinux BTF.
On SMP systems, the global percpu variables are placed in a special
'.data..percpu' section, which is stored in a segment whose initial
address is set to 0, the addresses of per-CPU variables are relative
positive addresses [1].

This patch extracts these variables from vmlinux and places them with
their type information in BTF. More specifically, when BTF is encoded,
we find the index of the '.data..percpu' section and then traverse the
symbol table to find those global objects which are in this section.
For each of these objects, we push a BTF_KIND_VAR into the types buffer,
and a BTF_VAR_SECINFO into another buffer, percpu_secinfo. When all the
CUs have finished processing, we push a BTF_KIND_DATASEC into the
btfe->types buffer, followed by the percpu_secinfo's content.

In a v5.8-rc3 linux kernel, I was able to extract 288 such variables.
The build time overhead is small and the space overhead is also small.
See testings below.

A found variable can be invalid in two ways:

 - Its name found in elf_sym__name is invalid.
 - Its size identified by elf_sym__size is 0.

In either case, the BTF containing such symbols will be rejected by the
BTF verifier. Normally we should not see such symbols. But if one is
seen during BTF encoding, the encoder will exit with error. An new flag
'-j' (or '--force') is implemented to help testing, which skips the
invalid symbols and force emit a BTF.

Testing:

- vmlinux size has increased by ~12kb.
  Before:
   $ readelf -SW vmlinux | grep BTF
   [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d2bf8 00
  After:
   $ pahole -J vmlinux
   $ readelf -SW vmlinux  | grep BTF
   [25] .BTF              PROGBITS        ffffffff821a905c 13a905c 2d5bca 00

- Common global percpu VARs and DATASEC are found in BTF section.
  $ bpftool btf dump file vmlinux | grep runqueues
  [14152] VAR 'runqueues' type_id=13778, linkage=global-alloc

  $ bpftool btf dump file vmlinux | grep 'cpu_stopper'
  [17582] STRUCT 'cpu_stopper' size=72 vlen=5
  [17601] VAR 'cpu_stopper' type_id=17582, linkage=static

  $ bpftool btf dump file vmlinux | grep ' DATASEC '
  [63652] DATASEC '.data..percpu' size=179288 vlen=288

- Tested bpf selftests.

- pahole exits with error if an invalid symbol is seen during encoding,
  make -f Makefile -j 36 -s
  PAHOLE: Error: Found symbol of zero size when encoding btf (sym: 'yyy', cu: 'xxx.c').
  PAHOLE: Error: Use '-j' or '--force_emit' to ignore such symbols and force emit the btf.
  scripts/link-vmlinux.sh: line 137: 2475712 Segmentation fault      LLVM_OBJCOPY=${OBJCOPY} ${PAHOLE} -J ${1}

- With the flag '-j' or '--force', the invalid symbols are ignored.

- Further in verbose mode and with '-j' or '--force' set, a warning is generated:
  PAHOLE: Warning: Found symbol of zero size when encoding btf, ignored (sym: 'yyy', cu: 'xxx.c').
  PAHOLE: Warning: Found symbol of invalid name when encoding btf, ignored (sym: 'zzz', cu: 'sss.c').

References:
 [1] https://lwn.net/Articles/531148/

Signed-off-by: Hao Luo <haoluo@google.com>
Tested-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Oleg Rombakh <olegrom@google.com>
Cc: dwarves@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-05 14:51:17 -03:00
Arnaldo Carvalho de Melo cdd5e1399b btf loader: Support raw BTF as available in /sys/kernel/btf/vmlinux
Be it automatically when no -F option is passed and
/sys/kernel/btf/vmlinux is available, or when /sys/kernel/btf/vmlinux is
passed as the filename to the tool, i.e.:

  $ pahole -C list_head
  struct list_head {
  	struct list_head *         next;                 /*     0     8 */
  	struct list_head *         prev;                 /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* last cacheline: 16 bytes */
  };
  $ strace -e openat pahole -C list_head |& grep /sys/kernel/btf/
  openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3
  $
  $ pahole -C list_head /sys/kernel/btf/vmlinux
  struct list_head {
  	struct list_head *         next;                 /*     0     8 */
  	struct list_head *         prev;                 /*     8     8 */

  	/* size: 16, cachelines: 1, members: 2 */
  	/* last cacheline: 16 bytes */
  };
  $

If one wants to grab the matching vmlinux to use its DWARF info instead,
which is useful to compare the results with what we have from BTF, for
instance, its just a matter of using '-F dwarf'.

This in turn shows something that at first came as a surprise, but then
has a simple explanation:

For very common data structures, that will probably appear in all of the
DWARF CUs (Compilation Units), like 'struct list_head', using '-F dwarf'
is faster:

  [acme@quaco pahole]$ perf stat -e cycles pahole -F btf -C list_head > /dev/null

   Performance counter stats for 'pahole -F btf -C list_head':

          45,722,518      cycles:u

         0.023717300 seconds time elapsed

         0.016474000 seconds user
         0.007212000 seconds sys

  [acme@quaco pahole]$ perf stat -e cycles pahole -F dwarf -C list_head > /dev/null

   Performance counter stats for 'pahole -F dwarf -C list_head':

          14,170,321      cycles:u

         0.006668904 seconds time elapsed

         0.005562000 seconds user
         0.001109000 seconds sys

  [acme@quaco pahole]$

But for something that is more specific to a subsystem, the DWARF loader
will have to process way more stuff till it gets to that struct:

  $ perf stat -e cycles pahole -F dwarf -C tcp_sock > /dev/null

   Performance counter stats for 'pahole -F dwarf -C tcp_sock':

      31,579,795,238      cycles:u

         8.332272930 seconds time elapsed

         8.032124000 seconds user
         0.286537000 seconds sys

  $

While using the BTF loader the time should be constant, as it loads
everything from /sys/kernel/btf/vmlinux:

  $ perf stat -e cycles pahole -F btf -C tcp_sock > /dev/null

   Performance counter stats for 'pahole -F btf -C tcp_sock':

          48,823,488      cycles:u

         0.024102760 seconds time elapsed

         0.012035000 seconds user
         0.012046000 seconds sys

  $

Above I used '-F btf' just to show that it can be used, but its not
really needed, i.e. those are equivalent:

  $ strace -e openat pahole -F btf -C list_head |& grep /sys/kernel/btf/vmlinux
  openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3
  $ strace -e openat pahole -C list_head |& grep /sys/kernel/btf/vmlinux
  openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3
  $

The btf_raw__load() function that ends up being grafted into the
preexisting btf_elf routines was based on libbpf's btf_load_raw().

Acked-by: Alexei Starovoitov <ast@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-01-06 13:09:16 -03:00
Arnaldo Carvalho de Melo fe4e1f799c btf_elf: Rename btf_elf__free() to btf_elf__delete()
That is the idiom for free its members and then free itself, 'free' is
just to free its members.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14 17:06:40 -03:00
Arnaldo Carvalho de Melo 6780c4334d btf: Rename 'struct btf' to 'struct btf_elf'
So that we don't clash with libbpf's 'struct btf', in time more internal
state now in 'struct btf_elf' will refer to the equivalent internal
state in libbpf's 'struct btf', as they have lots in common.

Requested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Acked-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14 17:06:23 -03:00
Martin Lau a58c746c4c Fixup copyright notices for BTF files authored by Facebook engineers
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Domenico Andreoli <domenico.andreoli@linux.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin Lau <kafai@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-18 20:34:05 -03:00
Domenico Andreoli e714d2eaa1 Adopt SPDX-License-Identifier
Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-18 15:41:48 -03:00
Arnaldo Carvalho de Melo 472256d3c5 btf_loader: Introduce a loader for the BTF format
Show 'struct list_head' from DWARF info:

  $ pahole -C list_head ~/git/build/v4.20-rc5+/net/ipv4/tcp.o
  struct list_head {
	  struct list_head *         next;                 /*     0     8 */
	  struct list_head *         prev;                 /*     8     8 */

	  /* size: 16, cachelines: 1, members: 2 */
	  /* last cacheline: 16 bytes */
  };

Try to show it from BTF, on a file without it:

  $ pahole -F btf -C list_head ~/git/build/v4.20-rc5+/net/ipv4/tcp.o
  pahole: /home/acme/git/build/v4.20-rc5+/net/ipv4/tcp.o: No debugging information found

Encode BTF from the DWARF info:

  $ pahole -J ~/git/build/v4.20-rc5+/net/ipv4/tcp.o

Check that it is there:
  $ readelf -SW ~/git/build/v4.20-rc5+/net/ipv4/tcp.o  | grep BTF
  readelf: /home/acme/git/build/v4.20-rc5+/net/ipv4/tcp.o: Warning: possibly corrupt ELF header - it has a non-zero program header offset, but no program headers
    [136] .BTF              PROGBITS        0000000000000000 101d0e 042edf 00      0   0  1

Now try again printing 'struct list_head' from the BTF info just
encoded:

  $ pahole -F btf -C list_head ~/git/build/v4.20-rc5+/net/ipv4/tcp.o  2> /dev/null
  struct list_head {
	  struct list_head *         next;                 /*     0     8 */
	  struct list_head *         prev;                 /*     8     8 */

	  /* size: 16, cachelines: 1, members: 2 */
	  /* last cacheline: 16 bytes */
  };
  $

There is the bitfields case that BTF desn't have the bit_size info for
bitfield members that makes the output from dwarf to be different than
the one from BTF:

  $ pahole -F btf -C sk_buff ~/git/build/v4.20-rc5+/net/ipv4/tcp.o > /tmp/sk_buff.btf
  $ pahole -F dwarf -C sk_buff ~/git/build/v4.20-rc5+/net/ipv4/tcp.o > /tmp/sk_buff.dwarf
  $ diff -u /tmp/sk_buff.dwarf /tmp/sk_buff.btf
  --- /tmp/sk_buff.dwarf	2018-12-20 14:50:51.428653046 -0300
  +++ /tmp/sk_buff.btf	2018-12-20 14:50:46.302601516 -0300
  @@ -38,45 +38,45 @@
   	__u16                      hdr_len;              /*   138     2 */
   	__u16                      queue_mapping;        /*   140     2 */
   	__u8                       __cloned_offset[0];   /*   142     0 */
  -	__u8                       cloned:1;             /*   142: 7  1 */
  -	__u8                       nohdr:1;              /*   142: 6  1 */
  -	__u8                       fclone:2;             /*   142: 4  1 */
  -	__u8                       peeked:1;             /*   142: 3  1 */
  -	__u8                       head_frag:1;          /*   142: 2  1 */
  -	__u8                       xmit_more:1;          /*   142: 1  1 */
  -	__u8                       pfmemalloc:1;         /*   142: 0  1 */
  +	__u8                       cloned;               /*   142     1 */
  +	__u8                       nohdr;                /*   142     1 */
  +	__u8                       fclone;               /*   142     1 */
  +	__u8                       peeked;               /*   142     1 */
  +	__u8                       head_frag;            /*   142     1 */
  +	__u8                       xmit_more;            /*   142     1 */
  +	__u8                       pfmemalloc;           /*   142     1 */

   	/* XXX 1 byte hole, try to pack */

   	__u32                      headers_start[0];     /*   144     0 */
   	__u8                       __pkt_type_offset[0]; /*   144     0 */
  -	__u8                       pkt_type:3;           /*   144: 5  1 */
  -	__u8                       ignore_df:1;          /*   144: 4  1 */
  -	__u8                       nf_trace:1;           /*   144: 3  1 */
  -	__u8                       ip_summed:2;          /*   144: 1  1 */
  -	__u8                       ooo_okay:1;           /*   144: 0  1 */
  -	__u8                       l4_hash:1;            /*   145: 7  1 */
  -	__u8                       sw_hash:1;            /*   145: 6  1 */
  -	__u8                       wifi_acked_valid:1;   /*   145: 5  1 */
  -	__u8                       wifi_acked:1;         /*   145: 4  1 */
  -	__u8                       no_fcs:1;             /*   145: 3  1 */
  -	__u8                       encapsulation:1;      /*   145: 2  1 */
  -	__u8                       encap_hdr_csum:1;     /*   145: 1  1 */
  -	__u8                       csum_valid:1;         /*   145: 0  1 */
  -	__u8                       csum_complete_sw:1;   /*   146: 7  1 */
  -	__u8                       csum_level:2;         /*   146: 5  1 */
  -	__u8                       csum_not_inet:1;      /*   146: 4  1 */
  -	__u8                       dst_pending_confirm:1; /*   146: 3  1 */
  -	__u8                       ndisc_nodetype:2;     /*   146: 1  1 */
  -	__u8                       ipvs_property:1;      /*   146: 0  1 */
  -	__u8                       inner_protocol_type:1; /*   147: 7  1 */
  -	__u8                       remcsum_offload:1;    /*   147: 6  1 */
  -	__u8                       offload_fwd_mark:1;   /*   147: 5  1 */
  -	__u8                       offload_mr_fwd_mark:1; /*   147: 4  1 */
  -	__u8                       tc_skip_classify:1;   /*   147: 3  1 */
  -	__u8                       tc_at_ingress:1;      /*   147: 2  1 */
  -	__u8                       tc_redirected:1;      /*   147: 1  1 */
  -	__u8                       tc_from_ingress:1;    /*   147: 0  1 */
  +	__u8                       pkt_type;             /*   144     1 */
  +	__u8                       ignore_df;            /*   144     1 */
  +	__u8                       nf_trace;             /*   144     1 */
  +	__u8                       ip_summed;            /*   144     1 */
  +	__u8                       ooo_okay;             /*   144     1 */
  +	__u8                       l4_hash;              /*   145     1 */
  +	__u8                       sw_hash;              /*   145     1 */
  +	__u8                       wifi_acked_valid;     /*   145     1 */
  +	__u8                       wifi_acked;           /*   145     1 */
  +	__u8                       no_fcs;               /*   145     1 */
  +	__u8                       encapsulation;        /*   145     1 */
  +	__u8                       encap_hdr_csum;       /*   145     1 */
  +	__u8                       csum_valid;           /*   145     1 */
  +	__u8                       csum_complete_sw;     /*   146     1 */
  +	__u8                       csum_level;           /*   146     1 */
  +	__u8                       csum_not_inet;        /*   146     1 */
  +	__u8                       dst_pending_confirm;  /*   146     1 */
  +	__u8                       ndisc_nodetype;       /*   146     1 */
  +	__u8                       ipvs_property;        /*   146     1 */
  +	__u8                       inner_protocol_type;  /*   147     1 */
  +	__u8                       remcsum_offload;      /*   147     1 */
  +	__u8                       offload_fwd_mark;     /*   147     1 */
  +	__u8                       offload_mr_fwd_mark;  /*   147     1 */
  +	__u8                       tc_skip_classify;     /*   147     1 */
  +	__u8                       tc_at_ingress;        /*   147     1 */
  +	__u8                       tc_redirected;        /*   147     1 */
  +	__u8                       tc_from_ingress;      /*   147     1 */
   	__u16                      tc_index;             /*   148     2 */

   	/* XXX 2 bytes hole, try to pack */
  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-20 15:23:35 -03:00
Yonghong Song 3aa3fd506e btf: add func_proto support
Two new btf kinds, BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO,
have been added in kernel since
  https://patchwork.ozlabs.org/cover/1000176/
to support better func introspection.

Currently, for a DW_TAG_subroutine_type dwarf type,
a simple "void *" is generated instead of real subroutine type.

This patch teaches pahole to generate BTF_KIND_FUNC_PROTO
properly. After this patch, pahole should have complete
type coverage for C frontend with types a bpf program cares.

For example,
  $ cat t1.c
  typedef int __int32;
  struct t1 {
    int a1;
    int (*f1)(char p1, __int32 p2);
  } g1;
  $ cat t2.c
  typedef int __int32;
  struct t2 {
    int a2;
    int (*f2)(char q1, __int32 q2, ...);
    int (*f3)();
  } g2;
  int main() { return 0; }
  $ gcc -O2 -o t1 -g t1.c t2.c
  $ pahole -JV t1
  File t1:
  [1] TYPEDEF __int32 type_id=2
  [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [3] STRUCT t1 kind_flag=0 size=16 vlen=2
        a1 type_id=2 bits_offset=0
        f1 type_id=6 bits_offset=64
  [4] FUNC_PROTO (anon) return=2 args=(5 (anon), 1 (anon))
  [5] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none)
  [6] PTR (anon) type_id=4
  [7] TYPEDEF __int32 type_id=8
  [8] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [9] STRUCT t2 kind_flag=0 size=24 vlen=3
        a2 type_id=8 bits_offset=0
        f2 type_id=12 bits_offset=64
        f3 type_id=14 bits_offset=128
  [10] FUNC_PROTO (anon) return=8 args=(11 (anon), 7 (anon), vararg)
  [11] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none)
  [12] PTR (anon) type_id=10
  [13] FUNC_PROTO (anon) return=8 args=(vararg)
  [14] PTR (anon) type_id=13
  $

In the above example, type [4], [10] and [13] represent the
func_proto types.

BTF_KIND_FUNC, which represents a real subprogram, is not generated in
this patch and will be considered later.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-20 11:33:35 -03:00
Yonghong Song 8630ce4042 btf: fix struct/union/fwd types with kind_flag
This patch fixed two issues with BTF. One is related to struct/union
bitfield encoding and the other is related to forward type.

Issue #1 and solution:
======================

Current btf encoding of bitfield follows what pahole generates.
For each bitfield, pahole will duplicate the type chain and
put the bitfield size at the final int or enum type.
Since the BTF enum type cannot encode bit size,
commit b18354f64c ("btf: Generate correct struct bitfield
member types") workarounds the issue by generating
an int type whenever the enum bit size is not 32.

The above workaround is not ideal as we lost original type
in BTF. Another undesiable fact is the type duplication
as the pahole duplicates the type chain.

To fix this issue, this patch implemented a compatible
change for BTF struct type encoding:
  . the bit 31 of type->info, previously reserved,
    now is used to indicate whether bitfield_size is
    encoded in btf_member or not.
  . if bit 31 of struct_type->info is set,
    btf_member->offset will encode like:
      bit 0 - 23: bit offset
      bit 24 - 31: bitfield size
    if bit 31 is not set, the old behavior is preserved:
      bit 0 - 31: bit offset

So if the struct contains a bit field, the maximum bit offset
will be reduced to (2^24 - 1) instead of MAX_UINT. The maximum
bitfield size will be 255 which is enough for today as maximum
bitfield in compiler can be 128 where int128 type is supported.

A new global, no_bitfield_type_recode, is introduced and which
will be set to true if BTF encoding is enabled. This global
will prevent pahole duplicating the bitfield types to avoid
type duplication in BTF.

Issue #2 and solution:
======================

Current forward type in BTF does not specify whether the original
type is struct or union. This will not work for type pretty print
and BTF-to-header-file conversion as struct/union must be specified.

To fix this issue, similar to issue #1, type->info bit 31
is used. If the bit is set, it is union type. Otherwise, it is
a struct type.

Examples:
=========

  -bash-4.4$ cat t.c
  struct s;
  union u;
  typedef int ___int;
  enum A { A1, A2, A3 };
  struct t {
	  int a[5];
	  ___int b:4;
	  volatile enum A c:4;
	  struct s *p1;
	  union u *p2;
  } g;
  -bash-4.4$ gcc -c -O2 -g t.c

Without this patch:

  $ pahole -JV t.o
  [1] TYPEDEF ___int type_id=2
  [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [3] ENUM A size=4 vlen=3
        A1 val=0
        A2 val=1
        A3 val=2
  [4] STRUCT t size=40 vlen=5
        a type_id=5 bits_offset=0
        b type_id=13 bits_offset=160
        c type_id=15 bits_offset=164
        p1 type_id=9 bits_offset=192
        p2 type_id=11 bits_offset=256
  [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
  [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
  [7] VOLATILE (anon) type_id=3
  [8] FWD s type_id=0
  [9] PTR (anon) type_id=8
  [10] FWD u type_id=0
  [11] PTR (anon) type_id=10
  [12] INT int size=1 bit_offset=0 nr_bits=4 encoding=(none)
  [13] TYPEDEF ___int type_id=12
  [14] INT (anon) size=1 bit_offset=0 nr_bits=4 encoding=SIGNED
  [15] VOLATILE (anon) type_id=14

With this patch:

  $ pahole -JV t.o
  File t.o:
  [1] TYPEDEF ___int type_id=2
  [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [3] ENUM A size=4 vlen=3
        A1 val=0
        A2 val=1
        A3 val=2
  [4] STRUCT t kind_flag=1 size=40 vlen=5
        a type_id=5 bitfield_size=0 bits_offset=0
        b type_id=1 bitfield_size=4 bits_offset=160
        c type_id=7 bitfield_size=4 bits_offset=164
        p1 type_id=9 bitfield_size=0 bits_offset=192
        p2 type_id=11 bitfield_size=0 bits_offset=256
  [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
  [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
  [7] VOLATILE (anon) type_id=3
  [8] FWD s struct
  [9] PTR (anon) type_id=8
  [10] FWD u union
  [11] PTR (anon) type_id=10

The fix removed the type duplication, preserved the enum type for the
bitfield, and have correct struct/union information for the forward
type.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-20 11:27:20 -03:00
Yonghong Song 0d2511fd1d btf: Fix bitfield encoding
The btf bitfield encoding is broken.

For the following example:

  -bash-4.2$ cat t.c
  struct t {
     int a:2;
     int b:1;
     int :3;
     int c:1;
     int d;
     char e:1;
     char f:1;
     int g;
  };
  void test(struct t *t) {
     return;
  }
  -bash-4.2$ clang -S -g -emit-llvm t.c

The output for bpf "little and big" endian results with pahole dwarf2btf
conversion:

  -bash-4.2$ llc -march=bpfel -mattr=dwarfris -filetype=obj t.ll
  -bash-4.2$ pahole -JV t.o
  [1] PTR (anon) type_id=2
  [2] STRUCT t size=16 vlen=7
        a type_id=5 bits_offset=30
        b type_id=6 bits_offset=29
        c type_id=6 bits_offset=25
        d type_id=3 bits_offset=32
        e type_id=7 bits_offset=71
        f type_id=7 bits_offset=70
        g type_id=3 bits_offset=96
  [3] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [4] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none)
  [5] INT int size=1 bit_offset=0 nr_bits=2 encoding=(none)
  [6] INT int size=1 bit_offset=0 nr_bits=1 encoding=(none)
  [7] INT char size=1 bit_offset=0 nr_bits=1 encoding=(none)
  -bash-4.2$ llc -march=bpfeb -mattr=dwarfris -filetype=obj t.ll
  -bash-4.2$ pahole -JV t.o
  [1] PTR (anon) type_id=2
  [2] STRUCT t size=16 vlen=7
        a type_id=5 bits_offset=0
        b type_id=6 bits_offset=2
        c type_id=6 bits_offset=6
        d type_id=3 bits_offset=32
        e type_id=7 bits_offset=64
        f type_id=7 bits_offset=65
        g type_id=3 bits_offset=96
  [3] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [4] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none)
  [5] INT int size=1 bit_offset=0 nr_bits=2 encoding=(none)
  [6] INT int size=1 bit_offset=0 nr_bits=1 encoding=(none)
  [7] INT char size=1 bit_offset=0 nr_bits=1 encoding=(none)

The BTF struct member bits_offset counts bits from the beginning of the
containing entity regardless of endianness, similar to what
DW_AT_bit_offset from DWARF4 does. Such counting is equivalent to the
big endian conversion in the above.

But the little endian conversion is not correct since dwarf generates
DW_AT_bit_offset based on actual bit position in the little endian
architecture.  For example, for the above struct member "a", the dwarf
would generate DW_AT_bit_offset=30 for little endian, and
DW_AT_bit_offset=0 for big endian.

This patch fixed the little endian structure member bits_offset problem
with proper calculation based on dwarf attributes.

With the fix, we get:

  -bash-4.2$ llc -march=bpfel -mattr=dwarfris -filetype=obj t.ll
  -bash-4.2$ pahole -JV t.o
    [1] STRUCT t size=16 vlen=7
        a type_id=5 bits_offset=0
        b type_id=6 bits_offset=2
        c type_id=6 bits_offset=6
        d type_id=2 bits_offset=32
        e type_id=7 bits_offset=64
        f type_id=7 bits_offset=65
        g type_id=2 bits_offset=96
    [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
    [3] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none)
    [4] PTR (anon) type_id=1
    [5] INT int size=1 bit_offset=0 nr_bits=2 encoding=(none)
    [6] INT int size=1 bit_offset=0 nr_bits=1 encoding=(none)
    [7] INT char size=1 bit_offset=0 nr_bits=1 encoding=(none)
  -bash-4.2$ llc -march=bpfeb -mattr=dwarfris -filetype=obj t.ll
  -bash-4.2$ pahole -JV t.o
  [1] PTR (anon) type_id=2
  [2] STRUCT t size=16 vlen=7
        a type_id=5 bits_offset=0
        b type_id=6 bits_offset=2
        c type_id=6 bits_offset=6
        d type_id=3 bits_offset=32
        e type_id=7 bits_offset=64
        f type_id=7 bits_offset=65
        g type_id=3 bits_offset=96
  [3] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
  [4] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none)
  [5] INT int size=1 bit_offset=0 nr_bits=2 encoding=(none)
  [6] INT int size=1 bit_offset=0 nr_bits=1 encoding=(none)
  [7] INT char size=1 bit_offset=0 nr_bits=1 encoding=(none)
  -bash-4.2$

For both little endian and big endian, we have correct and
same bits_offset for struct members.

We could fix pos->bit_offset, but pos->bit_offset will be inconsistent
to pos->bitfield_offset in the meaning and pos->bitfield_offset is used
to print out pahole data structure:

  -bash-4.2$ llc -march=bpfel -mattr=dwarfris -filetype=obj t.ll
  -bash-4.2$ /bin/pahole t.o
  struct t {
        int                        a:2;                  /*     0:30  4 */
        int                        b:1;                  /*     0:29  4 */
        int                        c:1;                  /*     0:25  4 */
  .....

So this patch just made the change in btf specific routines.

Signed-off-by: Yonghong Song <yhs@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-09-17 11:44:58 -03:00
Martin KaFai Lau 68645f7fac btf: Add BTF support
This patch introduces BPF Type Format (BTF).

BTF (BPF Type Format) is the meta data format which describes
the data types of BPF program/map.  Hence, it basically focus
on the C programming language which the modern BPF is primary
using.  The first use case is to provide a generic pretty print
capability for a BPF map.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-25 14:42:06 -03:00