Commit Graph

22 Commits

Author SHA1 Message Date
Andrii Nakryiko 29fce8dc85 strings: use BTF's string APIs for strings management
Switch strings container to using struct btf and its
btf__add_str()/btf__find_str() APIs, which do equivalent internal string
deduplication. This turns out to be a very significantly faster than using
tsearch functions. To satisfy CTF encoding use case, some hacky string size
fetching approach is utilized, as libbpf doesn't provide direct API to get
total string section size and to copy over just strings data section.

BEFORE:
         22,624.28 msec task-clock                #    1.000 CPUs utilized
                85      context-switches          #    0.004 K/sec
                 3      cpu-migrations            #    0.000 K/sec
           622,545      page-faults               #    0.028 M/sec
    68,177,206,387      cycles                    #    3.013 GHz                      (24.99%)
   114,370,031,619      instructions              #    1.68  insn per cycle           (25.01%)
    26,125,001,179      branches                  # 1154.733 M/sec                    (25.01%)
       458,861,243      branch-misses             #    1.76% of all branches          (25.00%)
    24,533,455,967      L1-dcache-loads           # 1084.386 M/sec                    (25.02%)
       973,500,214      L1-dcache-load-misses     #    3.97% of all L1-dcache hits    (25.05%)
       338,773,561      LLC-loads                 #   14.974 M/sec                    (25.02%)
        12,651,196      LLC-load-misses           #    3.73% of all LL-cache hits     (25.00%)

      22.628910615 seconds time elapsed

      21.341063000 seconds user
       1.283763000 seconds sys

AFTER:
         18,362.97 msec task-clock                #    1.000 CPUs utilized
                37      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
           626,281      page-faults               #    0.034 M/sec
    52,480,619,000      cycles                    #    2.858 GHz                      (25.00%)
   104,736,434,384      instructions              #    2.00  insn per cycle           (25.01%)
    23,878,428,465      branches                  # 1300.358 M/sec                    (25.01%)
       252,669,685      branch-misses             #    1.06% of all branches          (25.03%)
    21,829,390,952      L1-dcache-loads           # 1188.772 M/sec                    (25.04%)
       638,086,339      L1-dcache-load-misses     #    2.92% of all L1-dcache hits    (25.02%)
       212,327,435      LLC-loads                 #   11.563 M/sec                    (25.00%)
        14,578,117      LLC-load-misses           #    6.87% of all LL-cache hits     (25.00%)

      18.364427347 seconds time elapsed

      16.985494000 seconds user
       1.377959000 seconds sys

Committer testing:

Before:

  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

            8,735.92 msec task-clock:u              #    0.998 CPUs utilized            ( +-  0.34% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             353,978      page-faults:u             #    0.041 M/sec                    ( +-  0.00% )
      34,722,167,335      cycles:u                  #    3.975 GHz                      ( +-  0.12% )  (83.33%)
         555,981,118      stalled-cycles-frontend:u #    1.60% frontend cycles idle     ( +-  1.53% )  (83.33%)
       5,215,370,531      stalled-cycles-backend:u  #   15.02% backend cycles idle      ( +-  1.31% )  (83.33%)
      72,615,773,119      instructions:u            #    2.09  insn per cycle
                                                    #    0.07  stalled cycles per insn  ( +-  0.02% )  (83.34%)
      16,624,959,121      branches:u                # 1903.057 M/sec                    ( +-  0.01% )  (83.33%)
         229,962,327      branch-misses:u           #    1.38% of all branches          ( +-  0.07% )  (83.33%)

              8.7503 +- 0.0301 seconds time elapsed  ( +-  0.34% )

  $

After:

  $ perf stat -r5 pahole -J vmlinux

   Performance counter stats for 'pahole -J vmlinux' (5 runs):

            7,302.31 msec task-clock:u              #    0.998 CPUs utilized            ( +-  1.16% )
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
             355,884      page-faults:u             #    0.049 M/sec                    ( +-  0.00% )
      29,150,861,078      cycles:u                  #    3.992 GHz                      ( +-  0.35% )  (83.33%)
         478,705,326      stalled-cycles-frontend:u #    1.64% frontend cycles idle     ( +-  2.70% )  (83.33%)
       5,351,001,796      stalled-cycles-backend:u  #   18.36% backend cycles idle      ( +-  1.20% )  (83.33%)
      65,835,888,022      instructions:u            #    2.26  insn per cycle
                                                    #    0.08  stalled cycles per insn  ( +-  0.03% )  (83.33%)
      15,025,195,460      branches:u                # 2057.594 M/sec                    ( +-  0.05% )  (83.34%)
         141,209,214      branch-misses:u           #    0.94% of all branches          ( +-  0.15% )  (83.33%)

              7.3140 +- 0.0851 seconds time elapsed  ( +-  1.16% )

  $

16.04% less cycles, keep the patches coming! :-)

Had to add this patch tho:

  +++ b/dwarf_loader.c
  @@ -2159,7 +2159,7 @@ static unsigned long long dwarf_tag__orig_id(const struct tag *tag,
   static const char *dwarf__strings_ptr(const struct cu *cu __unused,
   				      strings_t s)
   {
  -	return strings__ptr(strings, s);
  +	return s ? strings__ptr(strings, s) : NULL;
   }

To keep preexisting behaviour and to do what the BTF specific
strings_ptr method does:

  static const char *btf_elf__strings_ptr(const struct cu *cu, strings_t s)
  {
          return btf_elf__string(cu->priv, s);
  }

  const char *btf_elf__string(struct btf_elf *btfe, uint32_t ref)
  {
          const char *s = btf__str_by_offset(btfe->btf, ref);

          return s && s[0] == '\0' ? NULL : s;
  }

With these adjustments, btfdiff on a vmlinux with BTF and DWARF is again
clean, i.e. pretty printing from BTF matches what we get when using
DWARF.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org
Cc: dwarves@vger.kernel.org
Cc: kernel-team@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-20 17:17:51 -03:00
Arnaldo Carvalho de Melo f601f67258 libctf: The type_ids returned are uint32_t fixup where it was uint16_t
To help in the tree wide conversion to uint32_t to represent type IDs.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-11 11:44:53 -03:00
Domenico Andreoli e714d2eaa1 Adopt SPDX-License-Identifier
Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-18 15:41:48 -03:00
Arnaldo Carvalho de Melo c65f2cf436 dwarves: Rename variable->location to ->scope
We'll use location in the DWARF sense, i.e. location lists, etc, i.e.
where is this variable? In a register? The stack? etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-09-26 16:45:25 -03:00
Cody P Schafer 1e461ec7e0 dwarves_fprintf: Fix printf types on 64bit linux
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-03-20 15:56:38 -03:00
Arnaldo Carvalho de Melo a54515fa6e dwarves: Stop using 'self'
As Thomas Gleixner wisely pointed out, using 'self' is stupid, it
doesn't convey useful information, so use sensible names.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-08-17 18:47:15 -03:00
Arnaldo Carvalho de Melo 01a7fb50d4 ctf_encoder: Allow specifying a verbose level for cu__encode_ctf
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-12-06 14:17:29 -02:00
Arnaldo Carvalho de Melo a8b3f74b90 ctf: structure_type__encode shouldn encode only DW_TAG_member
C++ is not properly supported in CTF anyway... And this was
causing a bug, so don't encode DW_TAG_inheritance entries.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-09-11 14:11:22 -03:00
Arnaldo Carvalho de Melo d9b4badca2 ctf: Handle dwfl_module_getsymtab errors
That can happen, for instance, when the symtabs are NOBITS. When that
happened we ended up in an infinite loop. Call it earlier and check the
result.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-08-23 11:47:03 -03:00
Arnaldo Carvalho de Melo 7c6603189e dwarves: Make all the tags that have an IP to be derived from ip_tag
Next we'll add a new kind of tag, DW_TAG_perf_counter, that will come
from perf.data generated by 'perf report'.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-06-04 17:30:06 -03:00
Arnaldo Carvalho de Melo 4d619ac4cb core: Only DWARF uses the global strings table, so move it there
There is still the problem of handing the strings table to the CTF encoder, but
that will be fixed another day.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-04-02 18:46:54 -03:00
Arnaldo Carvalho de Melo 495c70ae14 ctf_encoder: Add void entries for variables not found on DWARF
Temporary hack till I figure out how to do more filtering on the variables on
the symtab that aren't in the DWARF info.

Problem is that if we don't put something on the table at encode time, we won't
find it at decode time, when we don't have DWARF to notice that its not there
because its not in DWARF.

We then discard it at load time, as "void foo;" doesn't make sense.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-04-02 12:41:01 -03:00
Arnaldo Carvalho de Melo b911a0aa7a ctf_encoder: Add void (void) signature for functions not found on DWARF
Temporary hack till I figure out how to do more filtering on the functions on
the symtab that aren't in the DWARF info.

Problem is that if we don't put something on the table at encode time, we won't
find it at decode time, when we don't have DWARF to notice that its not there
because its not in DWARF.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-04-01 14:11:51 -03:00
Arnaldo Carvalho de Melo d7d419f6ab ctf_encoder: Create objects section (data/variables)
Encoding all the non UNDEF OBJECT entries in the symtab. Some must be filtered
in upcoming patches, but for at least kernel/sched.o it works just fine.

To test it I used DaveM's ctfdump and also pdwtags on a --strip-debug, pahole
-Z CTF encoded object.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-03-31 16:12:00 -03:00
Arnaldo Carvalho de Melo f0aec4a0f4 ctf_encoder: Rename hashaddr__find to hashaddr__find_function
As we will also hash variables by addr.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-03-31 16:06:22 -03:00
Arnaldo Carvalho de Melo 879f483daf core: Introduce cu__cache_symtab
We need it to be able to call cu__for_each_cached_symtab_entry more
than once in the same function.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-03-31 15:15:08 -03:00
Arnaldo Carvalho de Melo e97d952744 ctf_encoder: Convert DWARF functions to CTF
Finally we can use the Elf file already opened in dwarf_load, call
cu__for_each_cached_symtab_entry to iterate over the symtab entries,
this iterator will first call dwfl_module_getsymtab, that will do the
relocation that will allow us to go from the symtab address to the one
in the DWARF DW_TAG_subprogram tag DW_AT_low_pc attribute.

And voila, for a relatively complex single unit Linux kernel object
file, kernel/sched.o, we go from:

Just DWARF (gcc -g):

$ ls -la kernel/sched.o
1979011 kernel/sched.o

Then we run this to encode the CTF section:

$ pahole -Z kernel/sched.o

And get a file with both DWARF and CTF ELF sections:

$ ls -la kernel/sched.o
2019848 kernel/sched.o

We still need to encode the "OBJECTS", i.e. variables, but this
gets us from 1979011 (just DWARF) to:

$ strip--strip-debug kernel/sched.o
$ ls -la kernel/sched.o
-rw-rw-r-- 1 acme acme 507008 2009-03-30 23:01 kernel/sched.o

25% of the original size.

Of course we don't have inline expansion information, parameter names,
goto labels, etc, but should be good enough for most use cases.

See, without DWARF data, if we ask for it to use DWARF, nothing will be
printed, if we don't speficy the format, it will try first DWARF, it
will not find anything, it will try CTF:

$ pahole -F dwarf kernel/sched.o
$ pahole -C seq_operations kernel/sched.o
struct seq_operations {
	void *  (*start)(struct seq_file *, loff_t *);         /*   0  8 */
        void    (*stop)(struct seq_file *, void *);            /*   8  8 */
	void *  (*next)(struct seq_file *, void *, loff_t *);  /*  16  8 */
	int     (*show)(struct seq_file *, void *);            /*  24  8 */

	/* size: 32, cachelines: 1, members: 4 */
	/* last cacheline: 32 bytes */
};
$ $ pfunct -Vi -f schedule kernel/sched.o
void schedule(void);
{ /* low_pc=0xe01 */
}/* size: 83 */
$

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-03-30 22:54:29 -03:00
Arnaldo Carvalho de Melo 60e76245b8 core: Allow cachine an open Elf file handle for reuse
pahole --ctf_encode being the first to put this to good use.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-03-30 22:00:39 -03:00
Arnaldo Carvalho de Melo 2fd3936a9d ctf: combine the structs ctf_state and ctf
Moving more CTF only stuff out of the dwarves land and into something that can
be more easily stolen by other projects not interested in funny named stuff
such as pahole.

This also will help with encoding, as we will normally be recoding data from
DWARF, so the ELF file will be available and we will just add a new section to
it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-03-24 18:12:11 -03:00
Arnaldo Carvalho de Melo ce97ac9a26 ctf_encoder: Allow encoding a bit_size in enumeration types
As we have to support enum bitfields, so remember the bit_size.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-03-20 12:51:30 -03:00
Arnaldo Carvalho de Melo fedbfb60ff libctf: Encode VARARGS an extra 0 short at the end of the parm list
We'll see if this is how things should be, but its good enough for me 8)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-03-19 20:10:33 -03:00
Arnaldo Carvalho de Melo feab8aa5e3 ctf: Include the initial implementation of a ctf encoder
"pahole -Z foo" will create foo.SUNW_ctf, that if objcopy
--add-section'ed to the right word-sized object will work, sans VARARGS,
that will get fixed soon (as in, probably, tomorrow).

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2009-03-19 12:16:07 -03:00