Add support for generating split BTF, in which there is a designated base
BTF, containing a base set of types, and a split BTF, which extends main BTF
with extra types, that can reference types and strings from the main BTF.
This is going to be used to generate compact BTFs for kernel modules, with
vmlinux BTF being a main BTF, which all kernel modules are based off of.
These changes rely on patch set [0] to be present in libbpf submodule.
[0] https://patchwork.kernel.org/project/netdevbpf/list/?series=377859&state=*
Committer notes:
Fixed up wrt ARGP_numeric_version and added a man page entry.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Switch BTF loading to completely use libbpf's own struct btf and related
APIs.
BTF encoding is still happening with pahole's own code, so these two
code paths are not sharing anything now. String fetching is happening
based on whether btfe->strings were set to non-NULL pointer by
btf_encoder.
Committer testing:
$ cp ~/git/build/bpf-next-v5.9.0-rc8+/vmlinux .
$ readelf -SW vmlinux | grep BTF
[24] .BTF PROGBITS ffffffff82494ac0 1694ac0 340207 00 A 0 0 1
[25] .BTF_ids PROGBITS ffffffff827d4cc8 19d4cc8 0000a4 00 A 0 0 1
$ ./btfdiff vmlinux
$
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On SMP systems, the global percpu variables are placed in a special
'.data..percpu' section, which is stored in a segment whose initial
address is set to 0, the addresses of per-CPU variables are relative
positive addresses [1].
This patch extracts these variables from vmlinux and places them with
their type information in BTF. More specifically, when BTF is encoded,
we find the index of the '.data..percpu' section and then traverse the
symbol table to find those global objects which are in this section.
For each of these objects, we push a BTF_KIND_VAR into the types buffer,
and a BTF_VAR_SECINFO into another buffer, percpu_secinfo. When all the
CUs have finished processing, we push a BTF_KIND_DATASEC into the
btfe->types buffer, followed by the percpu_secinfo's content.
In a v5.8-rc3 linux kernel, I was able to extract 288 such variables.
The build time overhead is small and the space overhead is also small.
See testings below.
A found variable can be invalid in two ways:
- Its name found in elf_sym__name is invalid.
- Its size identified by elf_sym__size is 0.
In either case, the BTF containing such symbols will be rejected by the
BTF verifier. Normally we should not see such symbols. But if one is
seen during BTF encoding, the encoder will exit with error. An new flag
'-j' (or '--force') is implemented to help testing, which skips the
invalid symbols and force emit a BTF.
Testing:
- vmlinux size has increased by ~12kb.
Before:
$ readelf -SW vmlinux | grep BTF
[25] .BTF PROGBITS ffffffff821a905c 13a905c 2d2bf8 00
After:
$ pahole -J vmlinux
$ readelf -SW vmlinux | grep BTF
[25] .BTF PROGBITS ffffffff821a905c 13a905c 2d5bca 00
- Common global percpu VARs and DATASEC are found in BTF section.
$ bpftool btf dump file vmlinux | grep runqueues
[14152] VAR 'runqueues' type_id=13778, linkage=global-alloc
$ bpftool btf dump file vmlinux | grep 'cpu_stopper'
[17582] STRUCT 'cpu_stopper' size=72 vlen=5
[17601] VAR 'cpu_stopper' type_id=17582, linkage=static
$ bpftool btf dump file vmlinux | grep ' DATASEC '
[63652] DATASEC '.data..percpu' size=179288 vlen=288
- Tested bpf selftests.
- pahole exits with error if an invalid symbol is seen during encoding,
make -f Makefile -j 36 -s
PAHOLE: Error: Found symbol of zero size when encoding btf (sym: 'yyy', cu: 'xxx.c').
PAHOLE: Error: Use '-j' or '--force_emit' to ignore such symbols and force emit the btf.
scripts/link-vmlinux.sh: line 137: 2475712 Segmentation fault LLVM_OBJCOPY=${OBJCOPY} ${PAHOLE} -J ${1}
- With the flag '-j' or '--force', the invalid symbols are ignored.
- Further in verbose mode and with '-j' or '--force' set, a warning is generated:
PAHOLE: Warning: Found symbol of zero size when encoding btf, ignored (sym: 'yyy', cu: 'xxx.c').
PAHOLE: Warning: Found symbol of invalid name when encoding btf, ignored (sym: 'zzz', cu: 'sss.c').
References:
[1] https://lwn.net/Articles/531148/
Signed-off-by: Hao Luo <haoluo@google.com>
Tested-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Oleg Rombakh <olegrom@google.com>
Cc: dwarves@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Be it automatically when no -F option is passed and
/sys/kernel/btf/vmlinux is available, or when /sys/kernel/btf/vmlinux is
passed as the filename to the tool, i.e.:
$ pahole -C list_head
struct list_head {
struct list_head * next; /* 0 8 */
struct list_head * prev; /* 8 8 */
/* size: 16, cachelines: 1, members: 2 */
/* last cacheline: 16 bytes */
};
$ strace -e openat pahole -C list_head |& grep /sys/kernel/btf/
openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3
$
$ pahole -C list_head /sys/kernel/btf/vmlinux
struct list_head {
struct list_head * next; /* 0 8 */
struct list_head * prev; /* 8 8 */
/* size: 16, cachelines: 1, members: 2 */
/* last cacheline: 16 bytes */
};
$
If one wants to grab the matching vmlinux to use its DWARF info instead,
which is useful to compare the results with what we have from BTF, for
instance, its just a matter of using '-F dwarf'.
This in turn shows something that at first came as a surprise, but then
has a simple explanation:
For very common data structures, that will probably appear in all of the
DWARF CUs (Compilation Units), like 'struct list_head', using '-F dwarf'
is faster:
[acme@quaco pahole]$ perf stat -e cycles pahole -F btf -C list_head > /dev/null
Performance counter stats for 'pahole -F btf -C list_head':
45,722,518 cycles:u
0.023717300 seconds time elapsed
0.016474000 seconds user
0.007212000 seconds sys
[acme@quaco pahole]$ perf stat -e cycles pahole -F dwarf -C list_head > /dev/null
Performance counter stats for 'pahole -F dwarf -C list_head':
14,170,321 cycles:u
0.006668904 seconds time elapsed
0.005562000 seconds user
0.001109000 seconds sys
[acme@quaco pahole]$
But for something that is more specific to a subsystem, the DWARF loader
will have to process way more stuff till it gets to that struct:
$ perf stat -e cycles pahole -F dwarf -C tcp_sock > /dev/null
Performance counter stats for 'pahole -F dwarf -C tcp_sock':
31,579,795,238 cycles:u
8.332272930 seconds time elapsed
8.032124000 seconds user
0.286537000 seconds sys
$
While using the BTF loader the time should be constant, as it loads
everything from /sys/kernel/btf/vmlinux:
$ perf stat -e cycles pahole -F btf -C tcp_sock > /dev/null
Performance counter stats for 'pahole -F btf -C tcp_sock':
48,823,488 cycles:u
0.024102760 seconds time elapsed
0.012035000 seconds user
0.012046000 seconds sys
$
Above I used '-F btf' just to show that it can be used, but its not
really needed, i.e. those are equivalent:
$ strace -e openat pahole -F btf -C list_head |& grep /sys/kernel/btf/vmlinux
openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3
$ strace -e openat pahole -C list_head |& grep /sys/kernel/btf/vmlinux
openat(AT_FDCWD, "/sys/kernel/btf/vmlinux", O_RDONLY) = 3
$
The btf_raw__load() function that ends up being grafted into the
preexisting btf_elf routines was based on libbpf's btf_load_raw().
Acked-by: Alexei Starovoitov <ast@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
That is the idiom for free its members and then free itself, 'free' is
just to free its members.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that we don't clash with libbpf's 'struct btf', in time more internal
state now in 'struct btf_elf' will refer to the equivalent internal
state in libbpf's 'struct btf', as they have lots in common.
Requested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Acked-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Two new btf kinds, BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO,
have been added in kernel since
https://patchwork.ozlabs.org/cover/1000176/
to support better func introspection.
Currently, for a DW_TAG_subroutine_type dwarf type,
a simple "void *" is generated instead of real subroutine type.
This patch teaches pahole to generate BTF_KIND_FUNC_PROTO
properly. After this patch, pahole should have complete
type coverage for C frontend with types a bpf program cares.
For example,
$ cat t1.c
typedef int __int32;
struct t1 {
int a1;
int (*f1)(char p1, __int32 p2);
} g1;
$ cat t2.c
typedef int __int32;
struct t2 {
int a2;
int (*f2)(char q1, __int32 q2, ...);
int (*f3)();
} g2;
int main() { return 0; }
$ gcc -O2 -o t1 -g t1.c t2.c
$ pahole -JV t1
File t1:
[1] TYPEDEF __int32 type_id=2
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[3] STRUCT t1 kind_flag=0 size=16 vlen=2
a1 type_id=2 bits_offset=0
f1 type_id=6 bits_offset=64
[4] FUNC_PROTO (anon) return=2 args=(5 (anon), 1 (anon))
[5] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none)
[6] PTR (anon) type_id=4
[7] TYPEDEF __int32 type_id=8
[8] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[9] STRUCT t2 kind_flag=0 size=24 vlen=3
a2 type_id=8 bits_offset=0
f2 type_id=12 bits_offset=64
f3 type_id=14 bits_offset=128
[10] FUNC_PROTO (anon) return=8 args=(11 (anon), 7 (anon), vararg)
[11] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none)
[12] PTR (anon) type_id=10
[13] FUNC_PROTO (anon) return=8 args=(vararg)
[14] PTR (anon) type_id=13
$
In the above example, type [4], [10] and [13] represent the
func_proto types.
BTF_KIND_FUNC, which represents a real subprogram, is not generated in
this patch and will be considered later.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch fixed two issues with BTF. One is related to struct/union
bitfield encoding and the other is related to forward type.
Issue #1 and solution:
======================
Current btf encoding of bitfield follows what pahole generates.
For each bitfield, pahole will duplicate the type chain and
put the bitfield size at the final int or enum type.
Since the BTF enum type cannot encode bit size,
commit b18354f64c ("btf: Generate correct struct bitfield
member types") workarounds the issue by generating
an int type whenever the enum bit size is not 32.
The above workaround is not ideal as we lost original type
in BTF. Another undesiable fact is the type duplication
as the pahole duplicates the type chain.
To fix this issue, this patch implemented a compatible
change for BTF struct type encoding:
. the bit 31 of type->info, previously reserved,
now is used to indicate whether bitfield_size is
encoded in btf_member or not.
. if bit 31 of struct_type->info is set,
btf_member->offset will encode like:
bit 0 - 23: bit offset
bit 24 - 31: bitfield size
if bit 31 is not set, the old behavior is preserved:
bit 0 - 31: bit offset
So if the struct contains a bit field, the maximum bit offset
will be reduced to (2^24 - 1) instead of MAX_UINT. The maximum
bitfield size will be 255 which is enough for today as maximum
bitfield in compiler can be 128 where int128 type is supported.
A new global, no_bitfield_type_recode, is introduced and which
will be set to true if BTF encoding is enabled. This global
will prevent pahole duplicating the bitfield types to avoid
type duplication in BTF.
Issue #2 and solution:
======================
Current forward type in BTF does not specify whether the original
type is struct or union. This will not work for type pretty print
and BTF-to-header-file conversion as struct/union must be specified.
To fix this issue, similar to issue #1, type->info bit 31
is used. If the bit is set, it is union type. Otherwise, it is
a struct type.
Examples:
=========
-bash-4.4$ cat t.c
struct s;
union u;
typedef int ___int;
enum A { A1, A2, A3 };
struct t {
int a[5];
___int b:4;
volatile enum A c:4;
struct s *p1;
union u *p2;
} g;
-bash-4.4$ gcc -c -O2 -g t.c
Without this patch:
$ pahole -JV t.o
[1] TYPEDEF ___int type_id=2
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[3] ENUM A size=4 vlen=3
A1 val=0
A2 val=1
A3 val=2
[4] STRUCT t size=40 vlen=5
a type_id=5 bits_offset=0
b type_id=13 bits_offset=160
c type_id=15 bits_offset=164
p1 type_id=9 bits_offset=192
p2 type_id=11 bits_offset=256
[5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
[6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
[7] VOLATILE (anon) type_id=3
[8] FWD s type_id=0
[9] PTR (anon) type_id=8
[10] FWD u type_id=0
[11] PTR (anon) type_id=10
[12] INT int size=1 bit_offset=0 nr_bits=4 encoding=(none)
[13] TYPEDEF ___int type_id=12
[14] INT (anon) size=1 bit_offset=0 nr_bits=4 encoding=SIGNED
[15] VOLATILE (anon) type_id=14
With this patch:
$ pahole -JV t.o
File t.o:
[1] TYPEDEF ___int type_id=2
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[3] ENUM A size=4 vlen=3
A1 val=0
A2 val=1
A3 val=2
[4] STRUCT t kind_flag=1 size=40 vlen=5
a type_id=5 bitfield_size=0 bits_offset=0
b type_id=1 bitfield_size=4 bits_offset=160
c type_id=7 bitfield_size=4 bits_offset=164
p1 type_id=9 bitfield_size=0 bits_offset=192
p2 type_id=11 bitfield_size=0 bits_offset=256
[5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
[6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
[7] VOLATILE (anon) type_id=3
[8] FWD s struct
[9] PTR (anon) type_id=8
[10] FWD u union
[11] PTR (anon) type_id=10
The fix removed the type duplication, preserved the enum type for the
bitfield, and have correct struct/union information for the forward
type.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The btf bitfield encoding is broken.
For the following example:
-bash-4.2$ cat t.c
struct t {
int a:2;
int b:1;
int :3;
int c:1;
int d;
char e:1;
char f:1;
int g;
};
void test(struct t *t) {
return;
}
-bash-4.2$ clang -S -g -emit-llvm t.c
The output for bpf "little and big" endian results with pahole dwarf2btf
conversion:
-bash-4.2$ llc -march=bpfel -mattr=dwarfris -filetype=obj t.ll
-bash-4.2$ pahole -JV t.o
[1] PTR (anon) type_id=2
[2] STRUCT t size=16 vlen=7
a type_id=5 bits_offset=30
b type_id=6 bits_offset=29
c type_id=6 bits_offset=25
d type_id=3 bits_offset=32
e type_id=7 bits_offset=71
f type_id=7 bits_offset=70
g type_id=3 bits_offset=96
[3] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[4] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none)
[5] INT int size=1 bit_offset=0 nr_bits=2 encoding=(none)
[6] INT int size=1 bit_offset=0 nr_bits=1 encoding=(none)
[7] INT char size=1 bit_offset=0 nr_bits=1 encoding=(none)
-bash-4.2$ llc -march=bpfeb -mattr=dwarfris -filetype=obj t.ll
-bash-4.2$ pahole -JV t.o
[1] PTR (anon) type_id=2
[2] STRUCT t size=16 vlen=7
a type_id=5 bits_offset=0
b type_id=6 bits_offset=2
c type_id=6 bits_offset=6
d type_id=3 bits_offset=32
e type_id=7 bits_offset=64
f type_id=7 bits_offset=65
g type_id=3 bits_offset=96
[3] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[4] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none)
[5] INT int size=1 bit_offset=0 nr_bits=2 encoding=(none)
[6] INT int size=1 bit_offset=0 nr_bits=1 encoding=(none)
[7] INT char size=1 bit_offset=0 nr_bits=1 encoding=(none)
The BTF struct member bits_offset counts bits from the beginning of the
containing entity regardless of endianness, similar to what
DW_AT_bit_offset from DWARF4 does. Such counting is equivalent to the
big endian conversion in the above.
But the little endian conversion is not correct since dwarf generates
DW_AT_bit_offset based on actual bit position in the little endian
architecture. For example, for the above struct member "a", the dwarf
would generate DW_AT_bit_offset=30 for little endian, and
DW_AT_bit_offset=0 for big endian.
This patch fixed the little endian structure member bits_offset problem
with proper calculation based on dwarf attributes.
With the fix, we get:
-bash-4.2$ llc -march=bpfel -mattr=dwarfris -filetype=obj t.ll
-bash-4.2$ pahole -JV t.o
[1] STRUCT t size=16 vlen=7
a type_id=5 bits_offset=0
b type_id=6 bits_offset=2
c type_id=6 bits_offset=6
d type_id=2 bits_offset=32
e type_id=7 bits_offset=64
f type_id=7 bits_offset=65
g type_id=2 bits_offset=96
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[3] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none)
[4] PTR (anon) type_id=1
[5] INT int size=1 bit_offset=0 nr_bits=2 encoding=(none)
[6] INT int size=1 bit_offset=0 nr_bits=1 encoding=(none)
[7] INT char size=1 bit_offset=0 nr_bits=1 encoding=(none)
-bash-4.2$ llc -march=bpfeb -mattr=dwarfris -filetype=obj t.ll
-bash-4.2$ pahole -JV t.o
[1] PTR (anon) type_id=2
[2] STRUCT t size=16 vlen=7
a type_id=5 bits_offset=0
b type_id=6 bits_offset=2
c type_id=6 bits_offset=6
d type_id=3 bits_offset=32
e type_id=7 bits_offset=64
f type_id=7 bits_offset=65
g type_id=3 bits_offset=96
[3] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[4] INT char size=1 bit_offset=0 nr_bits=8 encoding=(none)
[5] INT int size=1 bit_offset=0 nr_bits=2 encoding=(none)
[6] INT int size=1 bit_offset=0 nr_bits=1 encoding=(none)
[7] INT char size=1 bit_offset=0 nr_bits=1 encoding=(none)
-bash-4.2$
For both little endian and big endian, we have correct and
same bits_offset for struct members.
We could fix pos->bit_offset, but pos->bit_offset will be inconsistent
to pos->bitfield_offset in the meaning and pos->bitfield_offset is used
to print out pahole data structure:
-bash-4.2$ llc -march=bpfel -mattr=dwarfris -filetype=obj t.ll
-bash-4.2$ /bin/pahole t.o
struct t {
int a:2; /* 0:30 4 */
int b:1; /* 0:29 4 */
int c:1; /* 0:25 4 */
.....
So this patch just made the change in btf specific routines.
Signed-off-by: Yonghong Song <yhs@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch introduces BPF Type Format (BTF).
BTF (BPF Type Format) is the meta data format which describes
the data types of BPF program/map. Hence, it basically focus
on the C programming language which the modern BPF is primary
using. The first use case is to provide a generic pretty print
capability for a BPF map.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>