Commit Graph

2001 Commits

Author SHA1 Message Date
Arnaldo Carvalho de Melo df92cb6b8e core: Change last_seen_bit to uint32_t in class__find_holes()
And it is being compared against uint32_t variables, resulting in this
clang warning:

  /var/home/acme/git/pahole/dwarves.c: In function ‘class__find_holes’:
  /var/home/acme/git/pahole/dwarves.c:1439:43: warning: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Wsign-compare]
   1439 |                         if (last_seen_bit < aligned_start && aligned_start <= bit_start) {
        |                                           ^

Since it can't be less than zero, just make then uint32_t.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 789d8b3e1a core: Change aligned_start to uint32_t in class__find_holes()
And it is being compared against uint32_t variables, resulting in this
clang warning:

  /var/home/acme/git/pahole/dwarves.c: In function ‘class__find_holes’:
  /var/home/acme/git/pahole/dwarves.c:1439:76: warning: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Wsign-compare]
   1439 |                         if (last_seen_bit < aligned_start && aligned_start <= bit_start) {
        |                                                                            ^~

Since it can't be less than zero, just make then uint32_t.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 182cdcaed9 core: Change cur_bitfield_end to uint32_t in class__find_holes()
And it is being compared against uint32_t variables, resulting in this
clang warning:

  /var/home/acme/git/pahole/dwarves.c:1430:44: note: in expansion of macro ‘min’
   1430 |                         int bitfield_end = min(bit_start, cur_bitfield_end);
        |                                            ^~~

Since it can't be less than zero, just make then uint32_t.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 5900f43f10 core: Change bit_start and bit_end to uint32_t in class__find_holes()
And they were being compared against uint32_t variables, resulting in
this clang warning:

  /var/home/acme/git/pahole/dwarves.c: In function ‘class__find_holes’:
  /var/home/acme/git/pahole/dwarves.c:1453:73: warning: comparison of integer expressions of different signedness: ‘uint32_t’ {aka ‘unsigned int’} and ‘int’ [-Wsign-compare]
   1453 |                         if (bit_end > cur_bitfield_end || pos->bit_size > cur_bitfield_size) {
        |

Since they can't be less than zero, just make then uint32_t.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 8634d8535f btf_encoder: Fix signed/unsigned comparision
Since the 'int' variable, 'err' was just checked for < 0, cast it to
uint32_t and compare with the 'uint32_t' one.

Fixes this clang warning:

  /var/home/acme/git/pahole/btf_encoder.c: In function ‘btf_encoder__write_raw_file’:
  /var/home/acme/git/pahole/btf_encoder.c:890:17: warning: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Wsign-compare]
    890 |         if (err != raw_btf_size) {
        |                 ^~

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 8d2efa2b6c btf_encoder: has_arg_names() doesn't need the 'cu' pointer
Since we don't need the cu to get the strings table anymore, all tags
have a char pointer for strings.

Also rename it to ftype__has_arg_names() and simplify a bit it by
removing a needless one-type use 'name' variable.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 05f737076f btf_encoder: btf_encoder__encode_tag() doesn't need the 'core_id' pointer
Not being used at all, ditch it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo dc30e82b26 btf_encoder: btf_encoder__encode_tag() doesn't need the 'cu' pointer
Since we don't need the cu to get the strings table anymore, all tags
have a char pointer for strings.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 4360359e43 btf_encoder: btf_encoder__add_struct_type() doesn't need the 'cu' pointer
Since we don't need the cu to get the strings table anymore, all tags
have a char pointer for strings.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 6e1e4881a5 btf_encoder: btf_encoder__add_func_proto() doesn't need the 'cu' pointer
Since we don't need the cu to get the strings table anymore, all tags
have a char pointer for strings.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 9fbfcee7d9 btf_encoder: No need to read the ehdr in btf_encoder__write_elf(), ditch it
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 898cc49027 ctracer: No need to read the ehdr, ditch it
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo dee83e27dd btf_encoder: No need to store the ehdr in the instance
We need it only in btf_encoder__new(), so just use a local variable for
that.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 24404190b8 elf_symtab: Remove needless GElf_Ehdr pointer argument from the constructor
We don't need it as we used it only for calling elf_section_by_name(),
that doesn't need it anymore.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 74c2078e04 dutil: elf_symtab__new() doesn't need the GElf_Ehdr *ep argument
In 3f8aad340b ("elf_symtab: Handle SHN_XINDEX index in
elf_section_by_name()") we stopped using that argument as we switched to
using elf_getshdrstrndx() to get SHN_XINDEX.

So just remove that argument and fixup its callers, this will allow
removing a good chunk of calls and variables.

Cc: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 23ea62817c pahole: Move case fallthru comment to after the statement
In this case we have:

	case foo: {
	}
        case bar:

The fallthru comment has to be _after_ the closing curly brace, fix it
and avoid this warning (from clang, but probably from gcc too):

  /var/home/acme/git/pahole/pahole.c:573:40: warning: this statement may fall through [-Wimplicit-fallthrough=]
    573 |                 case DW_TAG_base_type: {
        |                                        ^
  /var/home/acme/git/pahole/pahole.c:582:17: note: here
    582 |                 case DW_TAG_pointer_type:
        |                 ^~~~

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 7a8e75cd9a elfcreator: elfcreator_copy_scn() doesn't need the 'elf' arg
Not used at all, remove it.

Cc: Peter Jones <pjones@redhat.com>
Fixes: 29ef465cd8 ("Add scncopy - like object copy but tries not to change section content")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 3925a5bd53 syscse: zero_extend() doesn't need a 'cu' arg
Since we don't need the cu to get the strings table, all tags have a
char pointer for strings.

Found while building with clang to prep 1.22.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 21b2933f01 pahole: Fix signedness of ternary expression operator
To address this clang warning:

  /var/home/acme/git/pahole/pahole.c: In function ‘type__instance_read_once’:
  /var/home/acme/git/pahole/pahole.c:1933:78: warning: operand of ‘?:’ changes signedness from ‘int’ to ‘uint32_t’ {aka ‘unsigned int’} due to unsignedness of other operand [-Wsign-compare]
   1933 |         return fread(instance->instance, instance->type->size, 1, fp) != 1 ? -1 : instance->type->size;

Fixes: e3e5a4626c ("pahole: Make sure the header is read only once")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 4e11c13895 ctracer: Remove a bunch of unused 'cu' pointers
Since we don't need the cu to get the strings table, all tags have a
char pointer for strings.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 54c1e93b8e pahole: Use the 'prototypes' parameter in prototypes__load()
It was using &class_names directly while it was also being passed as the
'prototypes' argument, use the argument.

Fixes: 823739b56f ("pahole: Convert class_names into a list of struct prototypes")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 8b495918e6 codiff: class__find_pair_member() doesn't need 'cu' args
Since we don't need the cu to get the strings table, all tags have a
char pointer for strings.

Found while building with clang to prep 1.22.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 057be3d993 core: class__find_member_by_name() doesn't need a cu pointer
Since we don't need the cu to get the strings table, all tags have a
char pointer for strings.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo ce9de90364 core: Document type->node member usage
Right now its just for when we emit types, so we can reuse it for
instance, to handle different types with the same name in different CUs
in pahole.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo cead526d6b core: Fix nnr_members typo on 'struct type' comment docs
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 7cfc9be1f2 man-pages: Improve the --nr_methods/-m pahole man page entry
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 3895127ce6 pahole: Clarify that currently --nr_methods doesn't work together witn -C
It should, as its natural to do:

  $ pahole --nr_methods -C sock

And have it traverse all functions in all compilation units and show how
many of them have 'struct sock *' as one of its arguments, but more
changes are needed to have this in place and it is easy enough to do:

  $ pahole --nr_methods | grep -w sock

  $ pahole --nr_methods  | grep -w sock
  sock	1005
  $

And with BTF, its super fast too.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 2ea46285ac pahole: No need to store the class name in 'struct structure'
As we by now already store the 'struct class' it comes from and
class->name is now a string, no point in storing a duplicate name.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 4d8551396d pahole: Multithreaded DWARF loading requires elfutils >= 0.178
According to Mark Wieelard and as per testing, elfutils' libdw version
must be at least 0.178 for multithreaded DWARF loading.

Check that and emit a warning and then continue using just a single
thread, this allows for asking for multithreading in things like the
Linux Kernel makefiles while still working on older systems, such as
centos:7, where the elfutils version is 0.176.

Mark also provided this info for people using centos:7 (and
equivalents):

''Note that on centos7 if you install centos-release-scl you can get the
various devtoolset packages that do contain newer gcc and elfutils. The
latest are devtoolset-10-gcc (gcc-10.2.1) and devtoolset-10-elfutils-devel
(elfutils-0.182).

After installing you can use them with "scl enable devtoolset-10 bash"
which sets up the environment with the new devtools as default.''

A quick attempt at using a lock around all libdw functions ended up
being a too heavy big hammer, making the multithreaded DWARF loader to
be worse than using just a single thread.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo e57e23c72a btf_encoder: Add methods to maintain a list of btf encoders
We'll have one per thread and then at the end combine and dedup them one
last time.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo e9b83dba79 list: Adopt list_next_entry() from the Linux kernel
We'll use it to traverse the list of opaque btf_encoder entries in
pahole.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 6edae3e768 dwarf_loader: Make hash table size default to 12, faster than 15
The sweet spot for recent kernels, the default is 15 in the tests below,
changing to 12 reduces the time elapsed, make it the new default.

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

  $ sudo perf stat -d -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,101.71 msec task-clock                #    2.752 CPUs utilized            ( +-  0.06% )
               1,682      context-switches          #  207.610 /sec                     ( +-  0.98% )
                   5      cpu-migrations            #    0.592 /sec                     ( +- 15.31% )
              68,870      page-faults               #    8.501 K/sec                    ( +-  0.02% )
      29,205,269,606      cycles                    #    3.605 GHz                      ( +-  0.05% )
      63,448,636,788      instructions              #    2.17  insn per cycle           ( +-  0.00% )
      15,127,493,299      branches                  #    1.867 G/sec                    ( +-  0.00% )
         120,362,476      branch-misses             #    0.80% of all branches          ( +-  0.11% )
      13,967,000,698      L1-dcache-loads           #    1.724 G/sec                    ( +-  0.00% )
         375,052,289      L1-dcache-load-misses     #    2.69% of all L1-dcache accesses  ( +-  0.03% )
          91,506,061      LLC-loads                 #   11.295 M/sec                    ( +-  0.10% )
          27,905,809      LLC-load-misses           #   30.50% of all LL-cache accesses  ( +-  0.16% )

             2.94445 +- 0.00188 seconds time elapsed  ( +-  0.06% )

  $ sudo perf stat -d -r5 pahole --hashbits 12 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits 12 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,681.15 msec task-clock                #    2.702 CPUs utilized            ( +-  0.05% )
               1,660      context-switches          #  216.114 /sec                     ( +-  1.02% )
                   3      cpu-migrations            #    0.365 /sec                     ( +- 13.36% )
              67,794      page-faults               #    8.826 K/sec                    ( +-  0.05% )
      27,692,748,327      cycles                    #    3.605 GHz                      ( +-  0.04% )
      63,041,363,409      instructions              #    2.28  insn per cycle           ( +-  0.00% )
      15,063,798,404      branches                  #    1.961 G/sec                    ( +-  0.00% )
         127,461,737      branch-misses             #    0.85% of all branches          ( +-  0.11% )
      13,974,527,710      L1-dcache-loads           #    1.819 G/sec                    ( +-  0.00% )
         364,775,664      L1-dcache-load-misses     #    2.61% of all L1-dcache accesses  ( +-  0.01% )
          83,685,127      LLC-loads                 #   10.895 M/sec                    ( +-  0.14% )
          19,073,967      LLC-load-misses           #   22.79% of all LL-cache accesses  ( +-  0.30% )

            2.842468 +- 0.000561 seconds time elapsed  ( +-  0.02% )

  $ sudo perf stat -d -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64' (5 runs):

            9,512.30 msec task-clock                #    2.741 CPUs utilized            ( +-  0.54% )
               1,964      context-switches          #  206.469 /sec                     ( +-  2.60% )
                   7      cpu-migrations            #    0.736 /sec                     ( +- 37.25% )
              81,611      page-faults               #    8.579 K/sec                    ( +-  0.08% )
      34,294,568,812      cycles                    #    3.605 GHz                      ( +-  0.53% )
      72,897,384,015      instructions              #    2.13  insn per cycle           ( +-  0.15% )
      17,386,180,039      branches                  #    1.828 G/sec                    ( +-  0.15% )
         136,142,139      branch-misses             #    0.78% of all branches          ( +-  1.06% )
      16,020,787,096      L1-dcache-loads           #    1.684 G/sec                    ( +-  0.19% )
         430,392,585      L1-dcache-load-misses     #    2.69% of all L1-dcache accesses  ( +-  0.37% )
         107,401,567      LLC-loads                 #   11.291 M/sec                    ( +-  0.30% )
          35,172,977      LLC-load-misses           #   32.75% of all LL-cache accesses  ( +-  0.48% )

              3.4710 +- 0.0243 seconds time elapsed  ( +-  0.70% )

  $ sudo perf stat -d -r5 pahole --hashbits 12 -j --btf_encode_detached vmlinux-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64

   Performance counter stats for 'pahole --hashbits 12 -j --btf_encode_detached vmlinux-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64' (5 runs):

            8,929.50 msec task-clock                #    2.700 CPUs utilized            ( +-  0.04% )
               1,907      context-switches          #  213.539 /sec                     ( +-  0.68% )
                   4      cpu-migrations            #    0.426 /sec                     ( +- 30.46% )
              80,661      page-faults               #    9.033 K/sec                    ( +-  0.03% )
      32,213,009,827      cycles                    #    3.607 GHz                      ( +-  0.03% )
      72,345,614,657      instructions              #    2.25  insn per cycle           ( +-  0.00% )
      17,290,227,666      branches                  #    1.936 G/sec                    ( +-  0.00% )
         142,108,954      branch-misses             #    0.82% of all branches          ( +-  0.09% )
      15,998,190,852      L1-dcache-loads           #    1.792 G/sec                    ( +-  0.00% )
         417,872,772      L1-dcache-load-misses     #    2.61% of all L1-dcache accesses  ( +-  0.02% )
          98,061,829      LLC-loads                 #   10.982 M/sec                    ( +-  0.24% )
          24,750,223      LLC-load-misses           #   25.24% of all LL-cache accesses  ( +-  0.17% )

             3.30670 +- 0.00185 seconds time elapsed  ( +-  0.06% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo d2d83be1e2 pahole: Allow tweaking the size of the loader hash tables
To experiment with different sizes as time goes by and the number of symbols in
the kernel grows.

The current default, 15, is suboptimal for the fedora rawhide kernel, we can do
better using 12.

Default: 15:

  $ sudo ~acme/bin/perf stat -d -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,107.73 msec task-clock                #    2.749 CPUs utilized            ( +-  0.05% )
               1,723      context-switches          #  212.562 /sec                     ( +-  1.86% )
                   5      cpu-migrations            #    0.641 /sec                     ( +- 46.07% )
              68,802      page-faults               #    8.486 K/sec                    ( +-  0.05% )
      29,221,590,880      cycles                    #    3.604 GHz                      ( +-  0.04% )
      63,438,138,612      instructions              #    2.17  insn per cycle           ( +-  0.00% )
      15,125,172,105      branches                  #    1.866 G/sec                    ( +-  0.00% )
         119,983,284      branch-misses             #    0.79% of all branches          ( +-  0.06% )
      13,964,248,638      L1-dcache-loads           #    1.722 G/sec                    ( +-  0.00% )
         375,110,346      L1-dcache-load-misses     #    2.69% of all L1-dcache accesses( +-  0.01% )
          91,712,402      LLC-loads                 #   11.312 M/sec                    ( +-  0.14% )
          28,025,289      LLC-load-misses           #   30.56% of all LL-cache accesses ( +-  0.23% )

             2.94980 +- 0.00193 seconds time elapsed  ( +-  0.07% )

  $

New default, to be set in an upcoming patch, 12:

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=12 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=12 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,687.31 msec task-clock                #    2.704 CPUs utilized            ( +-  0.02% )
               1,677      context-switches          #  218.126 /sec                     ( +-  0.70% )
                   4      cpu-migrations            #    0.468 /sec                     ( +- 18.84% )
              67,827      page-faults               #    8.823 K/sec                    ( +-  0.03% )
      27,711,744,058      cycles                    #    3.605 GHz                      ( +-  0.02% )
      63,032,539,630      instructions              #    2.27  insn per cycle           ( +-  0.00% )
      15,062,001,666      branches                  #    1.959 G/sec                    ( +-  0.00% )
         127,728,818      branch-misses             #    0.85% of all branches          ( +-  0.07% )
      13,972,184,314      L1-dcache-loads           #    1.818 G/sec                    ( +-  0.00% )
         364,962,883      L1-dcache-load-misses     #    2.61% of all L1-dcache accesses( +-  0.02% )
          83,969,109      LLC-loads                 #   10.923 M/sec                    ( +-  0.13% )
          19,141,055      LLC-load-misses           #   22.80% of all LL-cache accesses ( +-  0.25% )

            2.842440 +- 0.000952 seconds time elapsed  ( +-  0.03% )

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=11 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=11 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,704.29 msec task-clock                #    2.702 CPUs utilized            ( +-  0.05% )
               1,676      context-switches          #  217.515 /sec                     ( +-  1.04% )
                   2      cpu-migrations            #    0.286 /sec                     ( +- 17.01% )
              67,813      page-faults               #    8.802 K/sec                    ( +-  0.05% )
      27,786,710,102      cycles                    #    3.607 GHz                      ( +-  0.05% )
      63,027,795,038      instructions              #    2.27  insn per cycle           ( +-  0.00% )
      15,066,316,987      branches                  #    1.956 G/sec                    ( +-  0.00% )
         130,431,772      branch-misses             #    0.87% of all branches          ( +-  0.20% )
      13,981,516,517      L1-dcache-loads           #    1.815 G/sec                    ( +-  0.00% )
         369,525,466      L1-dcache-load-misses     #    2.64% of all L1-dcache accesses( +-  0.03% )
          83,328,524      LLC-loads                 #   10.816 M/sec                    ( +-  0.27% )
          18,704,020      LLC-load-misses           #   22.45% of all LL-cache accesses ( +-  0.18% )

             2.85109 +- 0.00281 seconds time elapsed  ( +-  0.10% )

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=8 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=8 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,190.55 msec task-clock                #    2.774 CPUs utilized            ( +-  0.03% )
               1,607      context-switches          #  196.226 /sec                     ( +-  0.67% )
                   3      cpu-migrations            #    0.317 /sec                     ( +- 15.38% )
              67,869      page-faults               #    8.286 K/sec                    ( +-  0.05% )
      29,511,213,192      cycles                    #    3.603 GHz                      ( +-  0.02% )
      63,347,196,598      instructions              #    2.15  insn per cycle           ( +-  0.00% )
      15,198,023,498      branches                  #    1.856 G/sec                    ( +-  0.00% )
         131,113,100      branch-misses             #    0.86% of all branches          ( +-  0.14% )
      14,118,162,884      L1-dcache-loads           #    1.724 G/sec                    ( +-  0.00% )
         422,048,384      L1-dcache-load-misses     #    2.99% of all L1-dcache accesses( +-  0.01% )
         105,878,910      LLC-loads                 #   12.927 M/sec                    ( +-  0.05% )
          21,022,664      LLC-load-misses           #   19.86% of all LL-cache accesses ( +-  0.20% )

            2.952678 +- 0.000858 seconds time elapsed  ( +-  0.03% )

  $ sudo ~acme/bin/perf stat -d -r5 pahole --hashbits=13 -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole --hashbits=13 -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            7,728.71 msec task-clock                #    2.707 CPUs utilized            ( +-  0.07% )
               1,661      context-switches          #  214.887 /sec                     ( +-  0.70% )
                   2      cpu-migrations            #    0.259 /sec                     ( +- 22.36% )
              67,893      page-faults               #    8.785 K/sec                    ( +-  0.04% )
      27,874,322,843      cycles                    #    3.607 GHz                      ( +-  0.07% )
      63,079,425,815      instructions              #    2.26  insn per cycle           ( +-  0.00% )
      15,067,279,408      branches                  #    1.950 G/sec                    ( +-  0.00% )
         125,706,874      branch-misses             #    0.83% of all branches          ( +-  1.00% )
      13,967,177,801      L1-dcache-loads           #    1.807 G/sec                    ( +-  0.00% )
         363,566,754      L1-dcache-load-misses     #    2.60% of all L1-dcache accesses( +-  0.02% )
          86,583,482      LLC-loads                 #   11.203 M/sec                    ( +-  0.13% )
          20,629,871      LLC-load-misses           #   23.83% of all LL-cache accesses ( +-  0.21% )

             2.85551 +- 0.00124 seconds time elapsed  ( +-  0.04% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo ff7bd7083f core: Allow sizing the loader hash table
For now this will only apply to the dwarf loader, for experimenting as
time passes and kernels grow bigger or with more symbols.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 3068ff36b7 hash: Remove unused hash_32(), hash_ptr()
We're only using hash_64(), so ditch unused parts.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 8eebf70d05 dwarf_loader: Use a per-CU frontend cache for the latest lookup result
Using a debug patch I found that for the Linux (vmlinux from fedora
rawhide) we get this number of hits:

  nr_saved_lookups=2661460

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -d -r1 pahole -j --btf_encode_detached vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64':

            9,515.95 msec task-clock:u              #    2.731 CPUs utilized
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              81,634      page-faults:u             #    8.579 K/sec
      33,468,454,452      cycles:u                  #    3.517 GHz
      72,279,667,117      instructions:u            #    2.16  insn per cycle
      17,256,208,904      branches:u                #    1.813 G/sec
         132,775,067      branch-misses:u           #    0.77% of all branches
      15,840,427,579      L1-dcache-loads:u         #    1.665 G/sec
         417,209,398      L1-dcache-load-misses:u   #    2.63% of all L1-dcache accesses
         105,099,756      LLC-loads:u               #   11.045 M/sec
          35,027,985      LLC-load-misses:u         #   33.33% of all LL-cache accesses

         3.484851710 seconds time elapsed

         9.353155000 seconds user
         0.190730000 seconds sys

  $

After:

  $ perf stat -d -r1 pahole -j --btf_encode_detached \
	vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64-j.btf \
	vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64-j.btf vmlinux-5.14.0-0.rc1.20210714git40226a3d96ef.18.fc35.x86_64':

            9,416.17 msec task-clock:u              #    2.744 CPUs utilized
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              81,461      page-faults:u             #    8.651 K/sec
      33,330,006,641      cycles:u                  #    3.540 GHz
      72,301,897,397      instructions:u            #    2.17  insn per cycle
      17,263,694,358      branches:u                #    1.833 G/sec
         133,414,373      branch-misses:u           #    0.77% of all branches
      15,860,141,450      L1-dcache-loads:u         #    1.684 G/sec
         418,816,079      L1-dcache-load-misses:u   #    2.64% of all L1-dcache accesses
         104,960,787      LLC-loads:u               #   11.147 M/sec
          34,629,758      LLC-load-misses:u         #   32.99% of all LL-cache accesses

         3.431376846 seconds time elapsed

         9.294489000 seconds user
         0.146507000 seconds sys

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo a2f1e69848 core: Use obstacks: take 2
Allow asking for obstacks to be used, as for use cases like the btf
encoder where its all allocate sequentially + free everything at
cu__delete(), so obstacks are applicable and provide a good speedup:

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

           10,445.75 msec task-clock:u              #    2.864 CPUs utilized            ( +-  0.08% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             761,926      page-faults:u             #   72.941 K/sec                    ( +-  0.00% )
      31,946,591,661      cycles:u                  #    3.058 GHz                      ( +-  0.05% )
      69,103,520,880      instructions:u            #    2.16  insn per cycle           ( +-  0.00% )
      16,353,763,143      branches:u                #    1.566 G/sec                    ( +-  0.00% )
         122,309,098      branch-misses:u           #    0.75% of all branches          ( +-  0.12% )

             3.64689 +- 0.00437 seconds time elapsed  ( +-  0.12% )

  $ perf record --call-graph lbr pahole -j --btf_encode_detached vmlinux-j.btf vmlinux
  [ perf record: Woken up 52 times to write data ]
  [ perf record: Captured and wrote 13.151 MB perf.data (43058 samples) ]
  $
  $ perf report --no-children
  Samples: 43K of event 'cycles:u', Event count (approx.): 31938442091
    Overhead  Command  Shared Object         Symbol
  +   22.98%  pahole   libdw-0.185.so        [.] __libdw_find_attr
  +    6.69%  pahole   libdwarves.so.1.0.0   [.] cu__hash.isra.0
  +    5.82%  pahole   libdwarves.so.1.0.0   [.] hashmap__insert
  +    5.16%  pahole   libc.so.6             [.] __libc_calloc
  +    5.01%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_is_equiv
  +    3.39%  pahole   libc.so.6             [.] _int_malloc
  +    2.82%  pahole   libc.so.6             [.] __strcmp_avx2
  +    2.22%  pahole   libdw-0.185.so        [.] __libdw_form_val_compute_len
  +    2.13%  pahole   libdw-0.185.so        [.] dwarf_attr
  +    2.08%  pahole   [unknown]             [k] 0xffffffffa0e010a7
  +    1.98%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__find_type_by_ref
  +    1.98%  pahole   libdwarves.so.1.0.0   [.] btf__dedup
  +    1.92%  pahole   libc.so.6             [.] pthread_rwlock_unlock@@GLIBC_2.34
  +    1.92%  pahole   libdwarves.so.1.0.0   [.] btf__add_field
  +    1.92%  pahole   libdwarves.so.1.0.0   [.] list__for_all_tags
  +    1.61%  pahole   libdwarves.so.1.0.0   [.] btf_encoder__encode_cu
  +    1.49%  pahole   libdwarves.so.1.0.0   [.] die__process_class
  +    1.44%  pahole   libc.so.6             [.] pthread_rwlock_tryrdlock@@GLIBC_2.34
  +    1.24%  pahole   libdw-0.185.so        [.] dwarf_siblingof
  +    1.18%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_ref_type
  +    1.12%  pahole   libdwarves.so.1.0.0   [.] strs_hash_fn
  +    1.11%  pahole   libdwarves.so.1.0.0   [.] attr_numeric
  +    1.01%  pahole   libdwarves.so.1.0.0   [.] tag__size

After:

  $ perf stat -r5 pahole -j --btf_encode_detached vmlinux-j.btf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached vmlinux-j.btf vmlinux' (5 runs):

            8,114.11 msec task-clock:u              #    2.747 CPUs utilized            ( +-  0.09% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
              68,792      page-faults:u             #    8.478 K/sec                    ( +-  0.05% )
      28,705,283,249      cycles:u                  #    3.538 GHz                      ( +-  0.09% )
      63,013,653,035      instructions:u            #    2.20  insn per cycle           ( +-  0.00% )
      15,039,319,384      branches:u                #    1.853 G/sec                    ( +-  0.00% )
         118,272,350      branch-misses:u           #    0.79% of all branches          ( +-  0.41% )

             2.95368 +- 0.00221 seconds time elapsed  ( +-  0.07% )

  $
  $ perf record --call-graph lbr pahole -j --btf_encode_detached vmlinux-j.btf vmlinux
  [ perf record: Woken up 40 times to write data ]
  [ perf record: Captured and wrote 10.426 MB perf.data (33733 samples) ]
  $
  $ perf report --no-children
  Samples: 33K of event 'cycles:u', Event count (approx.): 28860426071
    Overhead  Command  Shared Object         Symbol
  +   26.10%  pahole   libdw-0.185.so        [.] __libdw_find_attr
  +    6.13%  pahole   libdwarves.so.1.0.0   [.] cu__hash.isra.0
  +    5.83%  pahole   libdwarves.so.1.0.0   [.] hashmap__insert
  +    5.52%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_is_equiv
  +    3.04%  pahole   libc.so.6             [.] __strcmp_avx2
  +    2.45%  pahole   libdw-0.185.so        [.] __libdw_form_val_compute_len
  +    2.31%  pahole   libdwarves.so.1.0.0   [.] btf__dedup
  +    2.30%  pahole   libdw-0.185.so        [.] dwarf_attr
  +    2.19%  pahole   libc.so.6             [.] pthread_rwlock_unlock@@GLIBC_2.34
  +    2.08%  pahole   libdwarves.so.1.0.0   [.] list__for_all_tags
  +    2.07%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__find_type_by_ref
  +    1.96%  pahole   libdwarves.so.1.0.0   [.] btf__add_field
  +    1.67%  pahole   libc.so.6             [.] pthread_rwlock_tryrdlock@@GLIBC_2.34
  +    1.63%  pahole   libdwarves.so.1.0.0   [.] btf_encoder__encode_cu
  +    1.52%  pahole   libdwarves.so.1.0.0   [.] die__process_class
  +    1.51%  pahole   libdwarves.so.1.0.0   [.] attr_type
  +    1.36%  pahole   libdwarves.so.1.0.0   [.] btf_dedup_ref_type
  +    1.32%  pahole   libdwarves.so.1.0.0   [.] strs_hash_fn
  +    1.25%  pahole   libdw-0.185.so        [.] dwarf_siblingof
  +    1.24%  pahole   libdwarves.so.1.0.0   [.] namespace__recode_dwarf_types
  +    1.17%  pahole   libdwarves.so.1.0.0   [.] attr_numeric
  +    1.16%  pahole   libdwarves.so.1.0.0   [.] dwarf_cu__init
  +    1.03%  pahole   libdwarves.so.1.0.0   [.] tag__init
  +    1.01%  pahole   libdwarves.so.1.0.0   [.] tag__size

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo dca86fb8c2 dwarf_loader: Add comment on why we can't ignore lexblocks
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 9d0a7ee0c3 pahole: Ignore DW_TAG_label when encoding BTF
As it will not be used, so don't waste cycles/memory parsing them:

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,487.54 msec task-clock:u              #    2.855 CPUs utilized            ( +-  0.31% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             762,431      page-faults:u             #   72.699 K/sec                    ( +-  0.00% )
      31,994,949,358      cycles:u                  #    3.051 GHz                      ( +-  0.09% )
      69,129,157,311      instructions:u            #    2.16  insn per cycle           ( +-  0.00% )
      16,359,974,001      branches:u                #    1.560 G/sec                    ( +-  0.00% )
         122,800,385      branch-misses:u           #    0.75% of all branches          ( +-  0.23% )

             3.67286 +- 0.00917 seconds time elapsed  ( +-  0.25% )

  $

After:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,431.47 msec task-clock:u              #    2.865 CPUs utilized            ( +-  0.04% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             761,982      page-faults:u             #   73.046 K/sec                    ( +-  0.00% )
      31,885,756,148      cycles:u                  #    3.057 GHz                      ( +-  0.04% )
      69,103,456,079      instructions:u            #    2.17  insn per cycle           ( +-  0.00% )
      16,353,867,606      branches:u                #    1.568 G/sec                    ( +-  0.00% )
         122,023,818      branch-misses:u           #    0.75% of all branches          ( +-  0.09% )

             3.64095 +- 0.00194 seconds time elapsed  ( +-  0.05% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo d40c5f1e20 core: Allow ignoring DW_TAG_label
As the BTF encoder doesn't use this information, so no need parsing it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:27 -03:00
Arnaldo Carvalho de Melo 51ba831929 pahole: Ignore DW_TAG_inline_expansion when encoding BTF
XXX: for now leave this commented out, see comments in the source code.

As it will not be used, so don't waste cycles/memory parsing them:

  $ grep "model name" /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Before:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,973.13 msec task-clock:u              #    2.906 CPUs utilized            ( +-  0.13% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,927      page-faults:u             #   72.352 K/sec                    ( +-  0.00% )
      33,585,562,298      cycles:u                  #    3.061 GHz                      ( +-  0.17% )
      72,687,766,428      instructions:u            #    2.16  insn per cycle           ( +-  0.15% )
      17,198,056,478      branches:u                #    1.567 G/sec                    ( +-  0.16% )
         129,011,360      branch-misses:u           #    0.75% of all branches          ( +-  0.53% )

              3.7760 +- 0.0158 seconds time elapsed  ( +-  0.42% )

  $

After:

  $ perf stat -r5 pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux

   Performance counter stats for 'pahole -j --btf_encode_detached=vmlinux-j.btf -F dwarf vmlinux' (5 runs):

           10,487.54 msec task-clock:u              #    2.855 CPUs utilized            ( +-  0.31% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             762,431      page-faults:u             #   72.699 K/sec                    ( +-  0.00% )
      31,994,949,358      cycles:u                  #    3.051 GHz                      ( +-  0.09% )
      69,129,157,311      instructions:u            #    2.16  insn per cycle           ( +-  0.00% )
      16,359,974,001      branches:u                #    1.560 G/sec                    ( +-  0.00% )
         122,800,385      branch-misses:u           #    0.75% of all branches          ( +-  0.23% )

             3.67286 +- 0.00917 seconds time elapsed  ( +-  0.25% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:40:25 -03:00
Arnaldo Carvalho de Melo 9038638891 core: Allow ignoring DW_TAG_inline_expansion
As the BTF encoder doesn't use this information, so no need parsing it.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:39:31 -03:00
Arnaldo Carvalho de Melo 20757745f0 pahole: Allow encoding BTF with parallel DWARF loading
By adding a lock to serialize access to btf_encoder__encode_cu().

This works and allows a speedup in BTF encoding, but its too brute
force, the right thing to do is have per-thread BTF encoders and then
at the end merge everything in a last pass.

But pick the low hanging fruits now.

On a machine with 4 cores, no HT:

  $ grep "model name" -m1 /proc/cpuinfo
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  $

Non-parallel:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf vmlinux' (5 runs):

            8,580.19 msec task-clock:u              #    1.000 CPUs utilized            ( +-  0.08% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             795,451      page-faults:u             #   92.708 K/sec                    ( +-  0.00% )
      29,151,924,821      cycles:u                  #    3.398 GHz                      ( +-  0.11% )
      70,947,245,709      instructions:u            #    2.43  insn per cycle           ( +-  0.00% )
      16,791,160,182      branches:u                #    1.957 G/sec                    ( +-  0.00% )
         120,793,994      branch-misses:u           #    0.72% of all branches          ( +-  1.04% )

             8.58192 +- 0.00686 seconds time elapsed  ( +-  0.08% )
  $

Parallel:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux-j.btf -j vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux-j.btf -j vmlinux' (5 runs):

           10,962.45 msec task-clock:u              #    2.914 CPUs utilized            ( +-  0.15% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,915      page-faults:u             #   72.421 K/sec                    ( +-  0.00% )
      33,552,130,646      cycles:u                  #    3.061 GHz                      ( +-  0.16% )
      72,778,320,572      instructions:u            #    2.17  insn per cycle           ( +-  0.12% )
      17,220,541,136      branches:u                #    1.571 G/sec                    ( +-  0.13% )
         129,353,767      branch-misses:u           #    0.75% of all branches          ( +-  0.48% )

              3.7614 +- 0.0141 seconds time elapsed  ( +-  0.38% )

  $

That CPUs utilized should go all the way to 4 when we parallelize the
BTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:39:02 -03:00
Arnaldo Carvalho de Melo 5a85d9a450 core: Zero out unused entries when extending ptr_table array in ptr_table__add()
Otherwise we may end up accessing invalid pointers and crashing.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:39:02 -03:00
Arnaldo Carvalho de Melo d133569bd0 pahole: No need to read DW_AT_alignment when encoding BTF
No need to read the DW_AT_alignment, not used in BTF encoding.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:38:58 -03:00
Arnaldo Carvalho de Melo 21a41e5386 dwarf_loader: Allow asking not to read the DW_AT_alignment attribute
As this isn't present in most types or struct members, which ends up
making dwarf_attr() call libdw_find_attr() that will do a linear search
on all the attributes.

We don't use this in the BTF encoder, so no point in reading that.

This will be used in pahole in the following cset.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-20 16:38:09 -03:00
Arnaldo Carvalho de Melo 1ef1639039 dwarf_loader: Do not look for non-C DWARF attributes in C CUs
Avoid looking for attributes that doesn't apply to the C language, such
as DW_AT_virtuality (virtual, pure_virtual), DW_AT_accessibility
(public, protected, private) and DW_AT_const_value.

Looking for those attributes in class_member__new() makes
libdw_find_attr() linearly search all attributes for a die, which
appears on profiling.

Before:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf -j vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf -j vmlinux' (5 runs):

           11,239.99 msec task-clock:u              #    2.921 CPUs utilized    ( +-  0.08% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,897      page-faults:u             #   70.631 K/sec            ( +-  0.00% )
      34,593,518,484      cycles:u                  #    3.078 GHz              ( +-  0.05% )
      75,592,805,563      instructions:u            #    2.19  insn per cycle   ( +-  0.00% )
      17,923,046,622      branches:u                #    1.595 G/sec            ( +-  0.00% )
         131,080,371      branch-misses:u           #    0.73% of all branches  ( +-  0.18% )

              3.84794 +- 0.00327 seconds time elapsed  ( +-  0.09% )
  $

After:

  $ perf stat -r5 pahole --btf_encode_detached=vmlinux.btf -j vmlinux

   Performance counter stats for 'pahole --btf_encode_detached=vmlinux.btf -j vmlinux' (5 runs):

           11,178.28 msec task-clock:u              #    2.929 CPUs utilized            ( +-  0.12% )
                   0      context-switches:u        #    0.000 /sec
                   0      cpu-migrations:u          #    0.000 /sec
             793,890      page-faults:u             #   71.021 K/sec                    ( +-  0.00% )
      34,378,886,265      cycles:u                  #    3.076 GHz                      ( +-  0.13% )
      75,523,849,140      instructions:u            #    2.20  insn per cycle           ( +-  0.12% )
      17,907,573,910      branches:u                #    1.602 G/sec                    ( +-  0.12% )
         130,137,529      branch-misses:u           #    0.73% of all branches          ( +-  0.50% )

              3.8165 +- 0.0137 seconds time elapsed  ( +-  0.36% )

  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 88265eab35 core: Add cu__is_c() to check if the CU language is C
We'll use this to avoid looking for attributes that doesn't apply to the
C language, such as DW_AT_virtuality (virtual, pure_virtual) and
DW_AT_accessibility (public, protected, private),

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo 1caed1c443 dwarf_loader: Add a lock around dwarf_decl_file() and dwarf_decl_line() calls
As this ends up racing on a tsearch() call, probably for some libdw
cache that gets updated/lookedup in concurrent pahole threads (-j N).

This cures the following, a patch for libdw will be cooked up and sent.

  (gdb) run -j -I -F dwarf vmlinux > /dev/null
  Starting program: /var/home/acme/git/pahole/build/pahole -j -I -F dwarf vmlinux > /dev/null
  warning: Expected absolute pathname for libpthread in the inferior, but got .gnu_debugdata for /lib64/libpthread.so.0.
  warning: File "/usr/lib64/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
  warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
  [New LWP 844789]
  [New LWP 844790]
  [New LWP 844791]
  [New LWP 844792]
  [New LWP 844793]
  [New LWP 844794]
  [New LWP 844795]
  [New LWP 844796]
  [New LWP 844797]
  [New LWP 844798]
  [New LWP 844799]
  [New LWP 844800]
  [New LWP 844801]
  [New LWP 844802]
  [New LWP 844803]
  [New LWP 844804]
  [New LWP 844805]
  [New LWP 844806]
  [New LWP 844807]
  [New LWP 844808]
  [New LWP 844809]
  [New LWP 844810]
  [New LWP 844811]
  [New LWP 844812]
  [New LWP 844813]
  [New LWP 844814]

  Thread 2 "pahole" received signal SIGSEGV, Segmentation fault.
  [Switching to LWP 844789]
  0x00007ffff7dfa321 in ?? () from /lib64/libc.so.6
  (gdb) bt
  #0  0x00007ffff7dfa321 in ?? () from /lib64/libc.so.6
  #1  0x00007ffff7dfa4bb in ?? () from /lib64/libc.so.6
  #2  0x00007ffff7f5eaa6 in __libdw_getsrclines (dbg=0x4a7f90, debug_line_offset=10383710, comp_dir=0x7ffff3c29f01 "/var/home/acme/git/build/v5.13.0-rc6+", address_size=address_size@entry=8, linesp=linesp@entry=0x7fffcfe04ba0, filesp=filesp@entry=0x7fffcfe04ba8)
      at dwarf_getsrclines.c:1129
  #3  0x00007ffff7f5ed14 in dwarf_getsrclines (cudie=cudie@entry=0x7fffd210caf0, lines=lines@entry=0x7fffd210cac0, nlines=nlines@entry=0x7fffd210cac8) at dwarf_getsrclines.c:1213
  #4  0x00007ffff7f64883 in dwarf_decl_file (die=<optimized out>) at dwarf_decl_file.c:66
  #5  0x0000000000425f24 in tag__init (tag=0x7fff0421b710, cu=0x7fffcc001e40, die=0x7fffd210cd30) at /var/home/acme/git/pahole/dwarf_loader.c:476
  #6  0x00000000004262ec in namespace__init (namespace=0x7fff0421b710, die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:576
  #7  0x00000000004263ac in type__init (type=0x7fff0421b710, die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:595
  #8  0x00000000004264d1 in type__new (die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:614
  #9  0x0000000000427ba6 in die__create_new_typedef (die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:1212
  #10 0x0000000000428df5 in __die__process_tag (die=0x7fffd210cd30, cu=0x7fffcc001e40, top_level=1, fn=0x45cee0 <__FUNCTION__.10> "die__process_unit", conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:1823
  #11 0x0000000000428ea1 in die__process_unit (die=0x7fffd210cd30, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:1848
  #12 0x0000000000429e45 in die__process (die=0x7fffd210ce20, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:2311
  #13 0x0000000000429ecb in die__process_and_recode (die=0x7fffd210ce20, cu=0x7fffcc001e40, conf=0x475600 <conf_load>) at /var/home/acme/git/pahole/dwarf_loader.c:2326
  #14 0x000000000042a9d6 in dwarf_cus__create_and_process_cu (dcus=0x7fffffffddc0, cu_die=0x7fffd210ce20, pointer_size=8 '\b') at /var/home/acme/git/pahole/dwarf_loader.c:2644
  #15 0x000000000042ab28 in dwarf_cus__process_cu_thread (arg=0x7fffffffddc0) at /var/home/acme/git/pahole/dwarf_loader.c:2687
  #16 0x00007ffff7ed6299 in start_thread () from /lib64/libpthread.so.0
  #17 0x00007ffff7dfe353 in ?? () from /lib64/libc.so.6
  (gdb)
  (gdb) fr 2
  1085
  (gdb) list files_lines_compare
  1086    static int
  1087    files_lines_compare (const void *p1, const void *p2)
  1088    {
  1089	  const struct files_lines_s *t1 = p1;
  1090	  const struct files_lines_s *t2 = p2;
  1091
  1092	  if (t1->debug_line_offset < t2->debug_line_offset)
  (gdb)
  1093        return -1;
  1094	  if (t1->debug_line_offset > t2->debug_line_offset)
  1095        return 1;
  1096
  1097	  return 0;
  1098    }
  1099
  1100    int
  1101    internal_function
  1102    __libdw_getsrclines (Dwarf *dbg, Dwarf_Off debug_line_offset,
  (gdb) list __libdw_getsrclines
  1100    int
  1101    internal_function
  1102    __libdw_getsrclines (Dwarf *dbg, Dwarf_Off debug_line_offset,
  1103                         const char *comp_dir, unsigned address_size,
  1104                         Dwarf_Lines **linesp, Dwarf_Files **filesp)
  1105    {
  1106	  struct files_lines_s fake = { .debug_line_offset = debug_line_offset };
  1107	  struct files_lines_s **found = tfind (&fake, &dbg->files_lines,
  1108                                            files_lines_compare);
  1109	  if (found == NULL)
  (gdb)
  1110        {
  1111          Elf_Data *data = __libdw_checked_get_data (dbg, IDX_debug_line);
  1112          if (data == NULL
  1113              || __libdw_offset_in_section (dbg, IDX_debug_line,
  1114                                            debug_line_offset, 1) != 0)
  1115            return -1;
  1116
  1117          const unsigned char *linep = data->d_buf + debug_line_offset;
  1118          const unsigned char *lineendp = data->d_buf + data->d_size;
  1119
  (gdb)
  1120          struct files_lines_s *node = libdw_alloc (dbg, struct files_lines_s,
  1121                                                    sizeof *node, 1);
  1122
  1123          if (read_srclines (dbg, linep, lineendp, comp_dir, address_size,
  1124                             &node->lines, &node->files) != 0)
  1125            return -1;
  1126
  1127          node->debug_line_offset = debug_line_offset;
  1128
  1129          found = tsearch (node, &dbg->files_lines, files_lines_compare);
  (gdb)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00
Arnaldo Carvalho de Melo dd13708f2f btfdiff: Use multithreaded DWARF loading
Quite a few cases of types with the same name, will add a
--exclude-types option to filter those, and study BTF dedup to see what
it does in this case.

  $ btfdiff vmlinux
  --- /tmp/btfdiff.dwarf.BgsYYn	2021-07-06 17:03:07.471814114 -0300
  +++ /tmp/btfdiff.btf.Ene2Ug	2021-07-06 17:03:07.714819609 -0300
  @@ -23627,12 +23627,15 @@ struct deadline_data {
   };
   struct debug_buffer {
   	ssize_t                    (*fill_func)(struct debug_buffer *); /*     0     8 */
  -	struct ohci_hcd *          ohci;                 /*     8     8 */
  +	struct usb_bus *           bus;                  /*     8     8 */
   	struct mutex               mutex;                /*    16    32 */
   	size_t                     count;                /*    48     8 */
  -	char *                     page;                 /*    56     8 */
  +	char *                     output_buf;           /*    56     8 */
  +	/* --- cacheline 1 boundary (64 bytes) --- */
  +	size_t                     alloc_size;           /*    64     8 */

  -	/* size: 64, cachelines: 1, members: 5 */
  +	/* size: 72, cachelines: 2, members: 6 */
  +	/* last cacheline: 8 bytes */
   };
   struct debug_reply_data {
   	struct ethnl_reply_data    base;                 /*     0     8 */
  @@ -47930,11 +47933,12 @@ struct intel_community {
   	/* last cacheline: 32 bytes */
   };
   struct intel_community_context {
  -	u32 *                      intmask;              /*     0     8 */
  -	u32 *                      hostown;              /*     8     8 */
  +	unsigned int               intr_lines[16];       /*     0    64 */
  +	/* --- cacheline 1 boundary (64 bytes) --- */
  +	u32                        saved_intmask;        /*    64     4 */

  -	/* size: 16, cachelines: 1, members: 2 */
  -	/* last cacheline: 16 bytes */
  +	/* size: 68, cachelines: 2, members: 2 */
  +	/* last cacheline: 4 bytes */
   };
   struct intel_early_ops {
   	resource_size_t            (*stolen_size)(int, int, int); /*     0     8 */
  @@ -52600,64 +52604,19 @@ struct irqtime {
   	/* size: 24, cachelines: 1, members: 4 */
   	/* last cacheline: 24 bytes */
   };
  -struct irte {
  -	union {
  -		struct {
  -			__u64      present:1;            /*     0: 0  8 */
  -			__u64      fpd:1;                /*     0: 1  8 */
  -			__u64      __res0:6;             /*     0: 2  8 */
  -			__u64      avail:4;              /*     0: 8  8 */
  -			__u64      __res1:3;             /*     0:12  8 */
  -			__u64      pst:1;                /*     0:15  8 */
  -			__u64      vector:8;             /*     0:16  8 */
  -			__u64      __res2:40;            /*     0:24  8 */
  -		};                                       /*     0     8 */
  -		struct {
  -			__u64      r_present:1;          /*     0: 0  8 */
  -			__u64      r_fpd:1;              /*     0: 1  8 */
  -			__u64      dst_mode:1;           /*     0: 2  8 */
  -			__u64      redir_hint:1;         /*     0: 3  8 */
  -			__u64      trigger_mode:1;       /*     0: 4  8 */
  -			__u64      dlvry_mode:3;         /*     0: 5  8 */
  -			__u64      r_avail:4;            /*     0: 8  8 */
  -			__u64      r_res0:4;             /*     0:12  8 */
  -			__u64      r_vector:8;           /*     0:16  8 */
  -			__u64      r_res1:8;             /*     0:24  8 */
  -			__u64      dest_id:32;           /*     0:32  8 */
  -		};                                       /*     0     8 */
  -		struct {
  -			__u64      p_present:1;          /*     0: 0  8 */
  -			__u64      p_fpd:1;              /*     0: 1  8 */
  -			__u64      p_res0:6;             /*     0: 2  8 */
  -			__u64      p_avail:4;            /*     0: 8  8 */
  -			__u64      p_res1:2;             /*     0:12  8 */
  -			__u64      p_urgent:1;           /*     0:14  8 */
  -			__u64      p_pst:1;              /*     0:15  8 */
  -			__u64      p_vector:8;           /*     0:16  8 */
  -			__u64      p_res2:14;            /*     0:24  8 */
  -			__u64      pda_l:26;             /*     0:38  8 */
  -		};                                       /*     0     8 */
  -		__u64              low;                  /*     0     8 */
  -	};                                               /*     0     8 */
  -	union {
  -		struct {
  -			__u64      sid:16;               /*     8: 0  8 */
  -			__u64      sq:2;                 /*     8:16  8 */
  -			__u64      svt:2;                /*     8:18  8 */
  -			__u64      __res3:44;            /*     8:20  8 */
  -		};                                       /*     8     8 */
  -		struct {
  -			__u64      p_sid:16;             /*     8: 0  8 */
  -			__u64      p_sq:2;               /*     8:16  8 */
  -			__u64      p_svt:2;              /*     8:18  8 */
  -			__u64      p_res3:12;            /*     8:20  8 */
  -			__u64      pda_h:32;             /*     8:32  8 */
  -		};                                       /*     8     8 */
  -		__u64              high;                 /*     8     8 */
  -	};                                               /*     8     8 */
  -
  -	/* size: 16, cachelines: 1, members: 2 */
  -	/* last cacheline: 16 bytes */
  +union irte {
  +	u32                        val;                /*     0     4 */
  +	struct {
  +		u32                valid:1;            /*     0: 0  4 */
  +		u32                no_fault:1;         /*     0: 1  4 */
  +		u32                int_type:3;         /*     0: 2  4 */
  +		u32                rq_eoi:1;           /*     0: 5  4 */
  +		u32                dm:1;               /*     0: 6  4 */
  +		u32                rsvd_1:1;           /*     0: 7  4 */
  +		u32                destination:8;      /*     0: 8  4 */
  +		u32                vector:8;           /*     0:16  4 */
  +		u32                rsvd_2:8;           /*     0:24  4 */
  +	} fields;                                      /*     0     4 */
   };
   struct irte_ga {
   	union irte_ga_lo           lo;                   /*     0     8 */
  @@ -66862,12 +66821,13 @@ struct netlbl_domhsh_tbl {
   	/* last cacheline: 16 bytes */
   };
   struct netlbl_domhsh_walk_arg {
  -	struct netlbl_audit *      audit_info;           /*     0     8 */
  -	u32                        doi;                  /*     8     4 */
  +	struct netlink_callback *  nl_cb;                /*     0     8 */
  +	struct sk_buff *           skb;                  /*     8     8 */
  +	u32                        seq;                  /*    16     4 */

  -	/* size: 16, cachelines: 1, members: 2 */
  +	/* size: 24, cachelines: 1, members: 3 */
   	/* padding: 4 */
  -	/* last cacheline: 16 bytes */
  +	/* last cacheline: 24 bytes */
   };
   struct netlbl_dommap_def {
   	u32                        type;                 /*     0     4 */
  @@ -72907,20 +72867,16 @@ struct pci_raw_ops {
   	/* last cacheline: 16 bytes */
   };
   struct pci_root_info {
  -	struct list_head           list;                 /*     0    16 */
  -	char                       name[12];             /*    16    12 */
  -
  -	/* XXX 4 bytes hole, try to pack */
  -
  -	struct list_head           resources;            /*    32    16 */
  -	struct resource            busn;                 /*    48    64 */
  -	/* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */
  -	int                        node;                 /*   112     4 */
  -	int                        link;                 /*   116     4 */
  +	struct acpi_pci_root_info  common;               /*     0    56 */
  +	struct pci_sysdata         sd;                   /*    56    40 */
  +	/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
  +	bool                       mcfg_added;           /*    96     1 */
  +	u8                         start_bus;            /*    97     1 */
  +	u8                         end_bus;              /*    98     1 */

  -	/* size: 120, cachelines: 2, members: 6 */
  -	/* sum members: 116, holes: 1, sum holes: 4 */
  -	/* last cacheline: 56 bytes */
  +	/* size: 104, cachelines: 2, members: 5 */
  +	/* padding: 5 */
  +	/* last cacheline: 40 bytes */
   };
   struct pci_root_res {
   	struct list_head           list;                 /*     0    16 */
  @@ -76415,25 +76371,66 @@ struct pmc_dev {

   	/* XXX 4 bytes hole, try to pack */

  -	void *                     regmap;               /*     8     8 */
  +	void *                     regbase;              /*     8     8 */
   	const struct pmc_reg_map  * map;                 /*    16     8 */
   	struct dentry *            dbgfs_dir;            /*    24     8 */
  -	bool                       init;                 /*    32     1 */
  +	int                        pmc_xram_read_bit;    /*    32     4 */

  -	/* size: 40, cachelines: 1, members: 5 */
  -	/* sum members: 29, holes: 1, sum holes: 4 */
  -	/* padding: 7 */
  -	/* last cacheline: 40 bytes */
  +	/* XXX 4 bytes hole, try to pack */
  +
  +	struct mutex               lock;                 /*    40    32 */
  +	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
  +	bool                       check_counters;       /*    72     1 */
  +
  +	/* XXX 7 bytes hole, try to pack */
  +
  +	u64                        pc10_counter;         /*    80     8 */
  +	u64                        s0ix_counter;         /*    88     8 */
  +	int                        num_lpm_modes;        /*    96     4 */
  +	int                        lpm_en_modes[8];      /*   100    32 */
  +
  +	/* XXX 4 bytes hole, try to pack */
  +
  +	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
  +	u32 *                      lpm_req_regs;         /*   136     8 */
  +
  +	/* size: 144, cachelines: 3, members: 12 */
  +	/* sum members: 125, holes: 4, sum holes: 19 */
  +	/* last cacheline: 16 bytes */
   };
   struct pmc_reg_map {
  -	const struct pmc_bit_map  * d3_sts_0;            /*     0     8 */
  -	const struct pmc_bit_map  * d3_sts_1;            /*     8     8 */
  -	const struct pmc_bit_map  * func_dis;            /*    16     8 */
  -	const struct pmc_bit_map  * func_dis_2;          /*    24     8 */
  -	const struct pmc_bit_map  * pss;                 /*    32     8 */
  +	const struct pmc_bit_map  * * pfear_sts;         /*     0     8 */
  +	const struct pmc_bit_map  * mphy_sts;            /*     8     8 */
  +	const struct pmc_bit_map  * pll_sts;             /*    16     8 */
  +	const struct pmc_bit_map  * * slps0_dbg_maps;    /*    24     8 */
  +	const struct pmc_bit_map  * ltr_show_sts;        /*    32     8 */
  +	const struct pmc_bit_map  * msr_sts;             /*    40     8 */
  +	const struct pmc_bit_map  * * lpm_sts;           /*    48     8 */
  +	const u32                  slp_s0_offset;        /*    56     4 */
  +	const int                  slp_s0_res_counter_step; /*    60     4 */
  +	/* --- cacheline 1 boundary (64 bytes) --- */
  +	const u32                  ltr_ignore_offset;    /*    64     4 */
  +	const int                  regmap_length;        /*    68     4 */
  +	const u32                  ppfear0_offset;       /*    72     4 */
  +	const int                  ppfear_buckets;       /*    76     4 */
  +	const u32                  pm_cfg_offset;        /*    80     4 */
  +	const int                  pm_read_disable_bit;  /*    84     4 */
  +	const u32                  slps0_dbg_offset;     /*    88     4 */
  +	const u32                  ltr_ignore_max;       /*    92     4 */
  +	const u32                  pm_vric1_offset;      /*    96     4 */
  +	const int                  lpm_num_maps;         /*   100     4 */
  +	const int                  lpm_res_counter_step_x2; /*   104     4 */
  +	const u32                  lpm_sts_latch_en_offset; /*   108     4 */
  +	const u32                  lpm_en_offset;        /*   112     4 */
  +	const u32                  lpm_priority_offset;  /*   116     4 */
  +	const u32                  lpm_residency_offset; /*   120     4 */
  +	const u32                  lpm_status_offset;    /*   124     4 */
  +	/* --- cacheline 2 boundary (128 bytes) --- */
  +	const u32                  lpm_live_status_offset; /*   128     4 */
  +	const u32                  etr3_offset;          /*   132     4 */

  -	/* size: 40, cachelines: 1, members: 5 */
  -	/* last cacheline: 40 bytes */
  +	/* size: 136, cachelines: 3, members: 27 */
  +	/* last cacheline: 8 bytes */
   };
   struct pmic_table {
   	int                        address;              /*     0     4 */
  @@ -114574,12 +114571,18 @@ struct urb {
   	/* last cacheline: 56 bytes */
   };
   struct urb_priv {
  -	int                        num_tds;              /*     0     4 */
  -	int                        num_tds_done;         /*     4     4 */
  -	struct xhci_td             td[];                 /*     8     0 */
  +	struct ed *                ed;                   /*     0     8 */
  +	u16                        length;               /*     8     2 */
  +	u16                        td_cnt;               /*    10     2 */

  -	/* size: 8, cachelines: 1, members: 3 */
  -	/* last cacheline: 8 bytes */
  +	/* XXX 4 bytes hole, try to pack */
  +
  +	struct list_head           pending;              /*    16    16 */
  +	struct td *                td[];                 /*    32     0 */
  +
  +	/* size: 32, cachelines: 1, members: 5 */
  +	/* sum members: 28, holes: 1, sum holes: 4 */
  +	/* last cacheline: 32 bytes */
   };
   struct usb2_lpm_parameters {
   	unsigned int               besl;                 /*     0     4 */
  $

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-12 09:41:13 -03:00