linux/tools/perf/examples/bpf/augmented_syscalls.c

119 lines
3.8 KiB
C
Raw Normal View History

perf trace: Handle "bpf-output" events associated with "__augmented_syscalls__" BPF map Add an example BPF script that writes syscalls:sys_enter_openat raw tracepoint payloads augmented with the first 64 bytes of the "filename" syscall pointer arg. Then catch it and print it just like with things written to the "__bpf_stdout__" map associated with a PERF_COUNT_SW_BPF_OUTPUT software event, by just letting the default tracepoint handler in 'perf trace', trace__event_handler(), to use bpf_output__fprintf(trace, sample), just like it does with all other PERF_COUNT_SW_BPF_OUTPUT events, i.e. just do a dump on the payload, so that we can check if what is being printed has at least the first 64 bytes of the "filename" arg: The augmented_syscalls.c eBPF script: # cat tools/perf/examples/bpf/augmented_syscalls.c // SPDX-License-Identifier: GPL-2.0 #include <stdio.h> struct bpf_map SEC("maps") __augmented_syscalls__ = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = __NR_CPUS__, }; struct syscall_enter_openat_args { unsigned long long common_tp_fields; long syscall_nr; long dfd; char *filename_ptr; long flags; long mode; }; struct augmented_enter_openat_args { struct syscall_enter_openat_args args; char filename[64]; }; int syscall_enter(openat)(struct syscall_enter_openat_args *args) { struct augmented_enter_openat_args augmented_args; probe_read(&augmented_args.args, sizeof(augmented_args.args), args); probe_read_str(&augmented_args.filename, sizeof(augmented_args.filename), args->filename_ptr); perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, sizeof(augmented_args)); return 1; } license(GPL); # So it will just prepare a raw_syscalls:sys_enter payload for the "openat" syscall. This will eventually be done for all syscalls with pointer args, globally or just when the user asks, using some spec, which args of which syscalls it wants "expanded" this way, we'll probably start with just all the syscalls that have char * pointers with familiar names, the ones we already handle with the probe:vfs_getname kprobe if it is in place hooking the kernel getname_flags() function used to copy from user the paths. Running it we get: # perf trace -e perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): __augmented_syscalls__:X?.C......................`\..................../etc/ld.so.cache..#......,....ao.k...............k......1."......... 0.006 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC 0.008 ( 0.005 ms): cat/31292 openat(dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC ) = 3 0.036 ( ): __augmented_syscalls__:X?.C.......................\..................../lib64/libc.so.6......... .\....#........?.......=.C..../."......... 0.037 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC 0.039 ( 0.007 ms): cat/31292 openat(dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC ) = 3 0.323 ( ): __augmented_syscalls__:X?.C.....................P....................../etc/passwd......>.C....@................>.C.....,....ao.>.C........ 0.325 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0xe8be50d6 0.327 ( 0.004 ms): cat/31292 openat(dfd: CWD, filename: 0xe8be50d6 ) = 3 # We need to go on optimizing this to avoid seding trash or zeroes in the pointer content payload, using the return from bpf_probe_read_str(), but to keep things simple at this stage and make incremental progress, lets leave it at that for now. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-g360n1zbj6bkbk6q0qo11c28@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-07 20:40:13 +02:00
// SPDX-License-Identifier: GPL-2.0
/*
* Augment the openat syscall with the contents of the filename pointer argument.
*
* Test it with:
*
* perf trace -e tools/perf/examples/bpf/augmented_syscalls.c cat /etc/passwd > /dev/null
*
* It'll catch some openat syscalls related to the dynamic linked and
* the last one should be the one for '/etc/passwd'.
*
* This matches what is marshalled into the raw_syscall:sys_enter payload
* expected by the 'perf trace' beautifiers, and can be used by them unmodified,
* which will be done as that feature is implemented in the next csets, for now
* it will appear in a dump done by the default tracepoint handler in 'perf trace',
* that uses bpf_output__fprintf() to just dump those contents, as done with
* the bpf-output event associated with the __bpf_output__ map declared in
* tools/perf/include/bpf/stdio.h.
*/
#include <stdio.h>
struct bpf_map SEC("maps") __augmented_syscalls__ = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = __NR_CPUS__,
};
perf trace: Use the augmented filename, expanding syscall enter pointers This is the final touch in showing how a syscall argument beautifier can access the augmented args put in place by the tools/perf/examples/bpf/augmented_syscalls.c eBPF script, right after the regular raw syscall args, i.e. the up to 6 long integer values in the syscall interface. With this we are able to show the 'openat' syscall arg, now with up to 64 bytes, but in time this will be configurable, just like with the 'strace -s strsize' argument, from 'strace''s man page: -s strsize Specify the maximum string size to print (the default is 32). This actually is the maximum string to _collect_ and store in the ring buffer, not just print. Before: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): cat/9658 openat(dfd: CWD, filename: 0x6626eda8, flags: CLOEXEC) 0.017 ( 0.007 ms): cat/9658 openat(dfd: CWD, filename: 0x6626eda8, flags: CLOEXEC) = 3 0.049 ( ): cat/9658 openat(dfd: CWD, filename: 0x66476ce0, flags: CLOEXEC) 0.051 ( 0.007 ms): cat/9658 openat(dfd: CWD, filename: 0x66476ce0, flags: CLOEXEC) = 3 0.377 ( ): cat/9658 openat(dfd: CWD, filename: 0x1e8f806b) 0.379 ( 0.005 ms): cat/9658 openat(dfd: CWD, filename: 0x1e8f806b) = 3 # After: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): cat/11966 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) 0.006 ( 0.006 ms): cat/11966 openat(dfd: CWD, filename: 0x4bfdcda8, flags: CLOEXEC) = 3 0.034 ( ): cat/11966 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) 0.036 ( 0.008 ms): cat/11966 openat(dfd: CWD, filename: 0x4c1e4ce0, flags: CLOEXEC) = 3 0.375 ( ): cat/11966 openat(dfd: CWD, filename: /etc/passwd) 0.377 ( 0.005 ms): cat/11966 openat(dfd: CWD, filename: 0xe87906b) = 3 # This cset should show all the aspects of establishing a protocol between an eBPF syscall arg augmenter program, tools/perf/examples/bpf/augmented_syscalls.c and a 'perf trace' beautifier, the one associated with all 'char *' point syscall args with names that can heuristically be associated with filenames. Now to wire up 'open' to show a second syscall using this scheme, all we have to do now is to change tools/perf/examples/bpf/augmented_syscalls.c, as 'perf trace' will notice that the perf_sample.raw_size is more than what is expected for a particular syscall payload as defined by its tracefs format file and will then use the augmented payload in the 'filename' syscall arg beautifier. The same protocol will be used for structs such as 'struct sockaddr *', 'struct pollfd', etc, with additions for handling arrays. This will all be done under the hood when 'perf trace' realizes the system has the necessary components, and also can be done by providing a precompiled augmented_syscalls.c eBPF ELF object. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-gj9kqb61wo7m3shtpzercbcr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-21 17:00:39 +02:00
struct augmented_filename {
int size;
int reserved;
char value[256];
};
perf trace: Handle "bpf-output" events associated with "__augmented_syscalls__" BPF map Add an example BPF script that writes syscalls:sys_enter_openat raw tracepoint payloads augmented with the first 64 bytes of the "filename" syscall pointer arg. Then catch it and print it just like with things written to the "__bpf_stdout__" map associated with a PERF_COUNT_SW_BPF_OUTPUT software event, by just letting the default tracepoint handler in 'perf trace', trace__event_handler(), to use bpf_output__fprintf(trace, sample), just like it does with all other PERF_COUNT_SW_BPF_OUTPUT events, i.e. just do a dump on the payload, so that we can check if what is being printed has at least the first 64 bytes of the "filename" arg: The augmented_syscalls.c eBPF script: # cat tools/perf/examples/bpf/augmented_syscalls.c // SPDX-License-Identifier: GPL-2.0 #include <stdio.h> struct bpf_map SEC("maps") __augmented_syscalls__ = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = __NR_CPUS__, }; struct syscall_enter_openat_args { unsigned long long common_tp_fields; long syscall_nr; long dfd; char *filename_ptr; long flags; long mode; }; struct augmented_enter_openat_args { struct syscall_enter_openat_args args; char filename[64]; }; int syscall_enter(openat)(struct syscall_enter_openat_args *args) { struct augmented_enter_openat_args augmented_args; probe_read(&augmented_args.args, sizeof(augmented_args.args), args); probe_read_str(&augmented_args.filename, sizeof(augmented_args.filename), args->filename_ptr); perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, sizeof(augmented_args)); return 1; } license(GPL); # So it will just prepare a raw_syscalls:sys_enter payload for the "openat" syscall. This will eventually be done for all syscalls with pointer args, globally or just when the user asks, using some spec, which args of which syscalls it wants "expanded" this way, we'll probably start with just all the syscalls that have char * pointers with familiar names, the ones we already handle with the probe:vfs_getname kprobe if it is in place hooking the kernel getname_flags() function used to copy from user the paths. Running it we get: # perf trace -e perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): __augmented_syscalls__:X?.C......................`\..................../etc/ld.so.cache..#......,....ao.k...............k......1."......... 0.006 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC 0.008 ( 0.005 ms): cat/31292 openat(dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC ) = 3 0.036 ( ): __augmented_syscalls__:X?.C.......................\..................../lib64/libc.so.6......... .\....#........?.......=.C..../."......... 0.037 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC 0.039 ( 0.007 ms): cat/31292 openat(dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC ) = 3 0.323 ( ): __augmented_syscalls__:X?.C.....................P....................../etc/passwd......>.C....@................>.C.....,....ao.>.C........ 0.325 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0xe8be50d6 0.327 ( 0.004 ms): cat/31292 openat(dfd: CWD, filename: 0xe8be50d6 ) = 3 # We need to go on optimizing this to avoid seding trash or zeroes in the pointer content payload, using the return from bpf_probe_read_str(), but to keep things simple at this stage and make incremental progress, lets leave it at that for now. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-g360n1zbj6bkbk6q0qo11c28@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-07 20:40:13 +02:00
struct syscall_enter_openat_args {
unsigned long long common_tp_fields;
long syscall_nr;
long dfd;
char *filename_ptr;
long flags;
long mode;
};
struct augmented_enter_openat_args {
struct syscall_enter_openat_args args;
perf trace: Use the augmented filename, expanding syscall enter pointers This is the final touch in showing how a syscall argument beautifier can access the augmented args put in place by the tools/perf/examples/bpf/augmented_syscalls.c eBPF script, right after the regular raw syscall args, i.e. the up to 6 long integer values in the syscall interface. With this we are able to show the 'openat' syscall arg, now with up to 64 bytes, but in time this will be configurable, just like with the 'strace -s strsize' argument, from 'strace''s man page: -s strsize Specify the maximum string size to print (the default is 32). This actually is the maximum string to _collect_ and store in the ring buffer, not just print. Before: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): cat/9658 openat(dfd: CWD, filename: 0x6626eda8, flags: CLOEXEC) 0.017 ( 0.007 ms): cat/9658 openat(dfd: CWD, filename: 0x6626eda8, flags: CLOEXEC) = 3 0.049 ( ): cat/9658 openat(dfd: CWD, filename: 0x66476ce0, flags: CLOEXEC) 0.051 ( 0.007 ms): cat/9658 openat(dfd: CWD, filename: 0x66476ce0, flags: CLOEXEC) = 3 0.377 ( ): cat/9658 openat(dfd: CWD, filename: 0x1e8f806b) 0.379 ( 0.005 ms): cat/9658 openat(dfd: CWD, filename: 0x1e8f806b) = 3 # After: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): cat/11966 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) 0.006 ( 0.006 ms): cat/11966 openat(dfd: CWD, filename: 0x4bfdcda8, flags: CLOEXEC) = 3 0.034 ( ): cat/11966 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) 0.036 ( 0.008 ms): cat/11966 openat(dfd: CWD, filename: 0x4c1e4ce0, flags: CLOEXEC) = 3 0.375 ( ): cat/11966 openat(dfd: CWD, filename: /etc/passwd) 0.377 ( 0.005 ms): cat/11966 openat(dfd: CWD, filename: 0xe87906b) = 3 # This cset should show all the aspects of establishing a protocol between an eBPF syscall arg augmenter program, tools/perf/examples/bpf/augmented_syscalls.c and a 'perf trace' beautifier, the one associated with all 'char *' point syscall args with names that can heuristically be associated with filenames. Now to wire up 'open' to show a second syscall using this scheme, all we have to do now is to change tools/perf/examples/bpf/augmented_syscalls.c, as 'perf trace' will notice that the perf_sample.raw_size is more than what is expected for a particular syscall payload as defined by its tracefs format file and will then use the augmented payload in the 'filename' syscall arg beautifier. The same protocol will be used for structs such as 'struct sockaddr *', 'struct pollfd', etc, with additions for handling arrays. This will all be done under the hood when 'perf trace' realizes the system has the necessary components, and also can be done by providing a precompiled augmented_syscalls.c eBPF ELF object. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-gj9kqb61wo7m3shtpzercbcr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-21 17:00:39 +02:00
struct augmented_filename filename;
perf trace: Handle "bpf-output" events associated with "__augmented_syscalls__" BPF map Add an example BPF script that writes syscalls:sys_enter_openat raw tracepoint payloads augmented with the first 64 bytes of the "filename" syscall pointer arg. Then catch it and print it just like with things written to the "__bpf_stdout__" map associated with a PERF_COUNT_SW_BPF_OUTPUT software event, by just letting the default tracepoint handler in 'perf trace', trace__event_handler(), to use bpf_output__fprintf(trace, sample), just like it does with all other PERF_COUNT_SW_BPF_OUTPUT events, i.e. just do a dump on the payload, so that we can check if what is being printed has at least the first 64 bytes of the "filename" arg: The augmented_syscalls.c eBPF script: # cat tools/perf/examples/bpf/augmented_syscalls.c // SPDX-License-Identifier: GPL-2.0 #include <stdio.h> struct bpf_map SEC("maps") __augmented_syscalls__ = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = __NR_CPUS__, }; struct syscall_enter_openat_args { unsigned long long common_tp_fields; long syscall_nr; long dfd; char *filename_ptr; long flags; long mode; }; struct augmented_enter_openat_args { struct syscall_enter_openat_args args; char filename[64]; }; int syscall_enter(openat)(struct syscall_enter_openat_args *args) { struct augmented_enter_openat_args augmented_args; probe_read(&augmented_args.args, sizeof(augmented_args.args), args); probe_read_str(&augmented_args.filename, sizeof(augmented_args.filename), args->filename_ptr); perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, sizeof(augmented_args)); return 1; } license(GPL); # So it will just prepare a raw_syscalls:sys_enter payload for the "openat" syscall. This will eventually be done for all syscalls with pointer args, globally or just when the user asks, using some spec, which args of which syscalls it wants "expanded" this way, we'll probably start with just all the syscalls that have char * pointers with familiar names, the ones we already handle with the probe:vfs_getname kprobe if it is in place hooking the kernel getname_flags() function used to copy from user the paths. Running it we get: # perf trace -e perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): __augmented_syscalls__:X?.C......................`\..................../etc/ld.so.cache..#......,....ao.k...............k......1."......... 0.006 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC 0.008 ( 0.005 ms): cat/31292 openat(dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC ) = 3 0.036 ( ): __augmented_syscalls__:X?.C.......................\..................../lib64/libc.so.6......... .\....#........?.......=.C..../."......... 0.037 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC 0.039 ( 0.007 ms): cat/31292 openat(dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC ) = 3 0.323 ( ): __augmented_syscalls__:X?.C.....................P....................../etc/passwd......>.C....@................>.C.....,....ao.>.C........ 0.325 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0xe8be50d6 0.327 ( 0.004 ms): cat/31292 openat(dfd: CWD, filename: 0xe8be50d6 ) = 3 # We need to go on optimizing this to avoid seding trash or zeroes in the pointer content payload, using the return from bpf_probe_read_str(), but to keep things simple at this stage and make incremental progress, lets leave it at that for now. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-g360n1zbj6bkbk6q0qo11c28@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-07 20:40:13 +02:00
};
int syscall_enter(openat)(struct syscall_enter_openat_args *args)
{
perf trace: Use the augmented filename, expanding syscall enter pointers This is the final touch in showing how a syscall argument beautifier can access the augmented args put in place by the tools/perf/examples/bpf/augmented_syscalls.c eBPF script, right after the regular raw syscall args, i.e. the up to 6 long integer values in the syscall interface. With this we are able to show the 'openat' syscall arg, now with up to 64 bytes, but in time this will be configurable, just like with the 'strace -s strsize' argument, from 'strace''s man page: -s strsize Specify the maximum string size to print (the default is 32). This actually is the maximum string to _collect_ and store in the ring buffer, not just print. Before: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): cat/9658 openat(dfd: CWD, filename: 0x6626eda8, flags: CLOEXEC) 0.017 ( 0.007 ms): cat/9658 openat(dfd: CWD, filename: 0x6626eda8, flags: CLOEXEC) = 3 0.049 ( ): cat/9658 openat(dfd: CWD, filename: 0x66476ce0, flags: CLOEXEC) 0.051 ( 0.007 ms): cat/9658 openat(dfd: CWD, filename: 0x66476ce0, flags: CLOEXEC) = 3 0.377 ( ): cat/9658 openat(dfd: CWD, filename: 0x1e8f806b) 0.379 ( 0.005 ms): cat/9658 openat(dfd: CWD, filename: 0x1e8f806b) = 3 # After: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): cat/11966 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) 0.006 ( 0.006 ms): cat/11966 openat(dfd: CWD, filename: 0x4bfdcda8, flags: CLOEXEC) = 3 0.034 ( ): cat/11966 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) 0.036 ( 0.008 ms): cat/11966 openat(dfd: CWD, filename: 0x4c1e4ce0, flags: CLOEXEC) = 3 0.375 ( ): cat/11966 openat(dfd: CWD, filename: /etc/passwd) 0.377 ( 0.005 ms): cat/11966 openat(dfd: CWD, filename: 0xe87906b) = 3 # This cset should show all the aspects of establishing a protocol between an eBPF syscall arg augmenter program, tools/perf/examples/bpf/augmented_syscalls.c and a 'perf trace' beautifier, the one associated with all 'char *' point syscall args with names that can heuristically be associated with filenames. Now to wire up 'open' to show a second syscall using this scheme, all we have to do now is to change tools/perf/examples/bpf/augmented_syscalls.c, as 'perf trace' will notice that the perf_sample.raw_size is more than what is expected for a particular syscall payload as defined by its tracefs format file and will then use the augmented payload in the 'filename' syscall arg beautifier. The same protocol will be used for structs such as 'struct sockaddr *', 'struct pollfd', etc, with additions for handling arrays. This will all be done under the hood when 'perf trace' realizes the system has the necessary components, and also can be done by providing a precompiled augmented_syscalls.c eBPF ELF object. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-gj9kqb61wo7m3shtpzercbcr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-21 17:00:39 +02:00
struct augmented_enter_openat_args augmented_args = { .filename.reserved = 0, };
perf trace: Handle "bpf-output" events associated with "__augmented_syscalls__" BPF map Add an example BPF script that writes syscalls:sys_enter_openat raw tracepoint payloads augmented with the first 64 bytes of the "filename" syscall pointer arg. Then catch it and print it just like with things written to the "__bpf_stdout__" map associated with a PERF_COUNT_SW_BPF_OUTPUT software event, by just letting the default tracepoint handler in 'perf trace', trace__event_handler(), to use bpf_output__fprintf(trace, sample), just like it does with all other PERF_COUNT_SW_BPF_OUTPUT events, i.e. just do a dump on the payload, so that we can check if what is being printed has at least the first 64 bytes of the "filename" arg: The augmented_syscalls.c eBPF script: # cat tools/perf/examples/bpf/augmented_syscalls.c // SPDX-License-Identifier: GPL-2.0 #include <stdio.h> struct bpf_map SEC("maps") __augmented_syscalls__ = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = __NR_CPUS__, }; struct syscall_enter_openat_args { unsigned long long common_tp_fields; long syscall_nr; long dfd; char *filename_ptr; long flags; long mode; }; struct augmented_enter_openat_args { struct syscall_enter_openat_args args; char filename[64]; }; int syscall_enter(openat)(struct syscall_enter_openat_args *args) { struct augmented_enter_openat_args augmented_args; probe_read(&augmented_args.args, sizeof(augmented_args.args), args); probe_read_str(&augmented_args.filename, sizeof(augmented_args.filename), args->filename_ptr); perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, sizeof(augmented_args)); return 1; } license(GPL); # So it will just prepare a raw_syscalls:sys_enter payload for the "openat" syscall. This will eventually be done for all syscalls with pointer args, globally or just when the user asks, using some spec, which args of which syscalls it wants "expanded" this way, we'll probably start with just all the syscalls that have char * pointers with familiar names, the ones we already handle with the probe:vfs_getname kprobe if it is in place hooking the kernel getname_flags() function used to copy from user the paths. Running it we get: # perf trace -e perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): __augmented_syscalls__:X?.C......................`\..................../etc/ld.so.cache..#......,....ao.k...............k......1."......... 0.006 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC 0.008 ( 0.005 ms): cat/31292 openat(dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC ) = 3 0.036 ( ): __augmented_syscalls__:X?.C.......................\..................../lib64/libc.so.6......... .\....#........?.......=.C..../."......... 0.037 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC 0.039 ( 0.007 ms): cat/31292 openat(dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC ) = 3 0.323 ( ): __augmented_syscalls__:X?.C.....................P....................../etc/passwd......>.C....@................>.C.....,....ao.>.C........ 0.325 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0xe8be50d6 0.327 ( 0.004 ms): cat/31292 openat(dfd: CWD, filename: 0xe8be50d6 ) = 3 # We need to go on optimizing this to avoid seding trash or zeroes in the pointer content payload, using the return from bpf_probe_read_str(), but to keep things simple at this stage and make incremental progress, lets leave it at that for now. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-g360n1zbj6bkbk6q0qo11c28@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-07 20:40:13 +02:00
probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
perf trace: Use the augmented filename, expanding syscall enter pointers This is the final touch in showing how a syscall argument beautifier can access the augmented args put in place by the tools/perf/examples/bpf/augmented_syscalls.c eBPF script, right after the regular raw syscall args, i.e. the up to 6 long integer values in the syscall interface. With this we are able to show the 'openat' syscall arg, now with up to 64 bytes, but in time this will be configurable, just like with the 'strace -s strsize' argument, from 'strace''s man page: -s strsize Specify the maximum string size to print (the default is 32). This actually is the maximum string to _collect_ and store in the ring buffer, not just print. Before: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): cat/9658 openat(dfd: CWD, filename: 0x6626eda8, flags: CLOEXEC) 0.017 ( 0.007 ms): cat/9658 openat(dfd: CWD, filename: 0x6626eda8, flags: CLOEXEC) = 3 0.049 ( ): cat/9658 openat(dfd: CWD, filename: 0x66476ce0, flags: CLOEXEC) 0.051 ( 0.007 ms): cat/9658 openat(dfd: CWD, filename: 0x66476ce0, flags: CLOEXEC) = 3 0.377 ( ): cat/9658 openat(dfd: CWD, filename: 0x1e8f806b) 0.379 ( 0.005 ms): cat/9658 openat(dfd: CWD, filename: 0x1e8f806b) = 3 # After: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): cat/11966 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) 0.006 ( 0.006 ms): cat/11966 openat(dfd: CWD, filename: 0x4bfdcda8, flags: CLOEXEC) = 3 0.034 ( ): cat/11966 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) 0.036 ( 0.008 ms): cat/11966 openat(dfd: CWD, filename: 0x4c1e4ce0, flags: CLOEXEC) = 3 0.375 ( ): cat/11966 openat(dfd: CWD, filename: /etc/passwd) 0.377 ( 0.005 ms): cat/11966 openat(dfd: CWD, filename: 0xe87906b) = 3 # This cset should show all the aspects of establishing a protocol between an eBPF syscall arg augmenter program, tools/perf/examples/bpf/augmented_syscalls.c and a 'perf trace' beautifier, the one associated with all 'char *' point syscall args with names that can heuristically be associated with filenames. Now to wire up 'open' to show a second syscall using this scheme, all we have to do now is to change tools/perf/examples/bpf/augmented_syscalls.c, as 'perf trace' will notice that the perf_sample.raw_size is more than what is expected for a particular syscall payload as defined by its tracefs format file and will then use the augmented payload in the 'filename' syscall arg beautifier. The same protocol will be used for structs such as 'struct sockaddr *', 'struct pollfd', etc, with additions for handling arrays. This will all be done under the hood when 'perf trace' realizes the system has the necessary components, and also can be done by providing a precompiled augmented_syscalls.c eBPF ELF object. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-gj9kqb61wo7m3shtpzercbcr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-21 17:00:39 +02:00
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
sizeof(augmented_args.filename.value),
args->filename_ptr);
perf trace: Handle "bpf-output" events associated with "__augmented_syscalls__" BPF map Add an example BPF script that writes syscalls:sys_enter_openat raw tracepoint payloads augmented with the first 64 bytes of the "filename" syscall pointer arg. Then catch it and print it just like with things written to the "__bpf_stdout__" map associated with a PERF_COUNT_SW_BPF_OUTPUT software event, by just letting the default tracepoint handler in 'perf trace', trace__event_handler(), to use bpf_output__fprintf(trace, sample), just like it does with all other PERF_COUNT_SW_BPF_OUTPUT events, i.e. just do a dump on the payload, so that we can check if what is being printed has at least the first 64 bytes of the "filename" arg: The augmented_syscalls.c eBPF script: # cat tools/perf/examples/bpf/augmented_syscalls.c // SPDX-License-Identifier: GPL-2.0 #include <stdio.h> struct bpf_map SEC("maps") __augmented_syscalls__ = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = __NR_CPUS__, }; struct syscall_enter_openat_args { unsigned long long common_tp_fields; long syscall_nr; long dfd; char *filename_ptr; long flags; long mode; }; struct augmented_enter_openat_args { struct syscall_enter_openat_args args; char filename[64]; }; int syscall_enter(openat)(struct syscall_enter_openat_args *args) { struct augmented_enter_openat_args augmented_args; probe_read(&augmented_args.args, sizeof(augmented_args.args), args); probe_read_str(&augmented_args.filename, sizeof(augmented_args.filename), args->filename_ptr); perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, sizeof(augmented_args)); return 1; } license(GPL); # So it will just prepare a raw_syscalls:sys_enter payload for the "openat" syscall. This will eventually be done for all syscalls with pointer args, globally or just when the user asks, using some spec, which args of which syscalls it wants "expanded" this way, we'll probably start with just all the syscalls that have char * pointers with familiar names, the ones we already handle with the probe:vfs_getname kprobe if it is in place hooking the kernel getname_flags() function used to copy from user the paths. Running it we get: # perf trace -e perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): __augmented_syscalls__:X?.C......................`\..................../etc/ld.so.cache..#......,....ao.k...............k......1."......... 0.006 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC 0.008 ( 0.005 ms): cat/31292 openat(dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC ) = 3 0.036 ( ): __augmented_syscalls__:X?.C.......................\..................../lib64/libc.so.6......... .\....#........?.......=.C..../."......... 0.037 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC 0.039 ( 0.007 ms): cat/31292 openat(dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC ) = 3 0.323 ( ): __augmented_syscalls__:X?.C.....................P....................../etc/passwd......>.C....@................>.C.....,....ao.>.C........ 0.325 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0xe8be50d6 0.327 ( 0.004 ms): cat/31292 openat(dfd: CWD, filename: 0xe8be50d6 ) = 3 # We need to go on optimizing this to avoid seding trash or zeroes in the pointer content payload, using the return from bpf_probe_read_str(), but to keep things simple at this stage and make incremental progress, lets leave it at that for now. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-g360n1zbj6bkbk6q0qo11c28@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-07 20:40:13 +02:00
perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU,
perf trace: Use the augmented filename, expanding syscall enter pointers This is the final touch in showing how a syscall argument beautifier can access the augmented args put in place by the tools/perf/examples/bpf/augmented_syscalls.c eBPF script, right after the regular raw syscall args, i.e. the up to 6 long integer values in the syscall interface. With this we are able to show the 'openat' syscall arg, now with up to 64 bytes, but in time this will be configurable, just like with the 'strace -s strsize' argument, from 'strace''s man page: -s strsize Specify the maximum string size to print (the default is 32). This actually is the maximum string to _collect_ and store in the ring buffer, not just print. Before: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): cat/9658 openat(dfd: CWD, filename: 0x6626eda8, flags: CLOEXEC) 0.017 ( 0.007 ms): cat/9658 openat(dfd: CWD, filename: 0x6626eda8, flags: CLOEXEC) = 3 0.049 ( ): cat/9658 openat(dfd: CWD, filename: 0x66476ce0, flags: CLOEXEC) 0.051 ( 0.007 ms): cat/9658 openat(dfd: CWD, filename: 0x66476ce0, flags: CLOEXEC) = 3 0.377 ( ): cat/9658 openat(dfd: CWD, filename: 0x1e8f806b) 0.379 ( 0.005 ms): cat/9658 openat(dfd: CWD, filename: 0x1e8f806b) = 3 # After: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): cat/11966 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) 0.006 ( 0.006 ms): cat/11966 openat(dfd: CWD, filename: 0x4bfdcda8, flags: CLOEXEC) = 3 0.034 ( ): cat/11966 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) 0.036 ( 0.008 ms): cat/11966 openat(dfd: CWD, filename: 0x4c1e4ce0, flags: CLOEXEC) = 3 0.375 ( ): cat/11966 openat(dfd: CWD, filename: /etc/passwd) 0.377 ( 0.005 ms): cat/11966 openat(dfd: CWD, filename: 0xe87906b) = 3 # This cset should show all the aspects of establishing a protocol between an eBPF syscall arg augmenter program, tools/perf/examples/bpf/augmented_syscalls.c and a 'perf trace' beautifier, the one associated with all 'char *' point syscall args with names that can heuristically be associated with filenames. Now to wire up 'open' to show a second syscall using this scheme, all we have to do now is to change tools/perf/examples/bpf/augmented_syscalls.c, as 'perf trace' will notice that the perf_sample.raw_size is more than what is expected for a particular syscall payload as defined by its tracefs format file and will then use the augmented payload in the 'filename' syscall arg beautifier. The same protocol will be used for structs such as 'struct sockaddr *', 'struct pollfd', etc, with additions for handling arrays. This will all be done under the hood when 'perf trace' realizes the system has the necessary components, and also can be done by providing a precompiled augmented_syscalls.c eBPF ELF object. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-gj9kqb61wo7m3shtpzercbcr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-21 17:00:39 +02:00
&augmented_args,
sizeof(augmented_args) - sizeof(augmented_args.filename.value) + augmented_args.filename.size);
perf trace: Make the augmented_syscalls filter out the tracepoint event When we attach a eBPF object to a tracepoint, if we return 1, then that tracepoint will be stored in the perf's ring buffer. In the augmented_syscalls.c case we want to just attach and _override_ the tracepoint payload with an augmented, extended one. In this example, tools/perf/examples/bpf/augmented_syscalls.c, we are attaching to the 'openat' syscall, and adding, after the syscalls:sys_enter_openat usual payload as defined by /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/format, a snapshot of its sole pointer arg: # grep 'field:.*\*' /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/format field:const char * filename; offset:24; size:8; signed:0; # For now this is not being considered, the next csets will make use of it, but as this is overriding the syscall tracepoint enter, we don't want that event appearing on the ring buffer, just our synthesized one. Before: # perf trace -e ~acme/git/perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): __augmented_syscalls__:dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC 0.006 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: , flags: CLOEXEC 0.007 ( 0.004 ms): cat/24044 openat(dfd: CWD, filename: 0x216dda8, flags: CLOEXEC ) = 3 0.028 ( ): __augmented_syscalls__:dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC 0.030 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: , flags: CLOEXEC 0.031 ( 0.006 ms): cat/24044 openat(dfd: CWD, filename: 0x2375ce0, flags: CLOEXEC ) = 3 0.291 ( ): __augmented_syscalls__:dfd: CWD, filename: /etc/passwd 0.293 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0.294 ( 0.004 ms): cat/24044 openat(dfd: CWD, filename: 0x637db06b ) = 3 # After: # perf trace -e ~acme/git/perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): __augmented_syscalls__:dfd: CWD, filename: 0x9c6a1da8, flags: CLOEXEC 0.005 ( 0.015 ms): cat/27341 openat(dfd: CWD, filename: 0x9c6a1da8, flags: CLOEXEC ) = 3 0.040 ( ): __augmented_syscalls__:dfd: CWD, filename: 0x9c8a9ce0, flags: CLOEXEC 0.041 ( 0.006 ms): cat/27341 openat(dfd: CWD, filename: 0x9c8a9ce0, flags: CLOEXEC ) = 3 0.294 ( ): __augmented_syscalls__:dfd: CWD, filename: 0x482a706b 0.296 ( 0.067 ms): cat/27341 openat(dfd: CWD, filename: 0x482a706b ) = 3 # Now lets replace that __augmented_syscalls__ name with the syscall name, using: # grep 'field:.*syscall_nr' /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/format field:int __syscall_nr; offset:8; size:4; signed:1; # That the synthesized payload has exactly where the syscall enter tracepoint puts it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-og4r9k87mzp9hv7el046idmd@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-21 16:14:15 +02:00
return 0;
perf trace: Handle "bpf-output" events associated with "__augmented_syscalls__" BPF map Add an example BPF script that writes syscalls:sys_enter_openat raw tracepoint payloads augmented with the first 64 bytes of the "filename" syscall pointer arg. Then catch it and print it just like with things written to the "__bpf_stdout__" map associated with a PERF_COUNT_SW_BPF_OUTPUT software event, by just letting the default tracepoint handler in 'perf trace', trace__event_handler(), to use bpf_output__fprintf(trace, sample), just like it does with all other PERF_COUNT_SW_BPF_OUTPUT events, i.e. just do a dump on the payload, so that we can check if what is being printed has at least the first 64 bytes of the "filename" arg: The augmented_syscalls.c eBPF script: # cat tools/perf/examples/bpf/augmented_syscalls.c // SPDX-License-Identifier: GPL-2.0 #include <stdio.h> struct bpf_map SEC("maps") __augmented_syscalls__ = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = __NR_CPUS__, }; struct syscall_enter_openat_args { unsigned long long common_tp_fields; long syscall_nr; long dfd; char *filename_ptr; long flags; long mode; }; struct augmented_enter_openat_args { struct syscall_enter_openat_args args; char filename[64]; }; int syscall_enter(openat)(struct syscall_enter_openat_args *args) { struct augmented_enter_openat_args augmented_args; probe_read(&augmented_args.args, sizeof(augmented_args.args), args); probe_read_str(&augmented_args.filename, sizeof(augmented_args.filename), args->filename_ptr); perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, sizeof(augmented_args)); return 1; } license(GPL); # So it will just prepare a raw_syscalls:sys_enter payload for the "openat" syscall. This will eventually be done for all syscalls with pointer args, globally or just when the user asks, using some spec, which args of which syscalls it wants "expanded" this way, we'll probably start with just all the syscalls that have char * pointers with familiar names, the ones we already handle with the probe:vfs_getname kprobe if it is in place hooking the kernel getname_flags() function used to copy from user the paths. Running it we get: # perf trace -e perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): __augmented_syscalls__:X?.C......................`\..................../etc/ld.so.cache..#......,....ao.k...............k......1."......... 0.006 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC 0.008 ( 0.005 ms): cat/31292 openat(dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC ) = 3 0.036 ( ): __augmented_syscalls__:X?.C.......................\..................../lib64/libc.so.6......... .\....#........?.......=.C..../."......... 0.037 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC 0.039 ( 0.007 ms): cat/31292 openat(dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC ) = 3 0.323 ( ): __augmented_syscalls__:X?.C.....................P....................../etc/passwd......>.C....@................>.C.....,....ao.>.C........ 0.325 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0xe8be50d6 0.327 ( 0.004 ms): cat/31292 openat(dfd: CWD, filename: 0xe8be50d6 ) = 3 # We need to go on optimizing this to avoid seding trash or zeroes in the pointer content payload, using the return from bpf_probe_read_str(), but to keep things simple at this stage and make incremental progress, lets leave it at that for now. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-g360n1zbj6bkbk6q0qo11c28@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-07 20:40:13 +02:00
}
perf trace: Augment the 'open' syscall 'filename' arg As described in the previous cset, all we had to do was to touch the augmented_syscalls.c eBPF program, fire up 'perf trace' with that new eBPF script in system wide mode and wait for 'open' syscalls, in addition to 'openat' ones to see that it works: # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c 0.000 StreamT~s #200/16150 openat(dfd: CWD, filename: /home/acme/.mozilla/firefox/fqxhj76d.default/prefs.js, flags: CREAT|EXCL|TRUNC|WRONLY, mode: IRUSR|IWUSR) 0.065 StreamT~s #200/16150 openat(dfd: CWD, filename: /home/acme/.mozilla/firefox/fqxhj76d.default/prefs-1.js, flags: CREAT|EXCL|TRUNC|WRONLY, mode: IRUSR|IWUSR) 0.435 StreamT~s #200/16150 openat(dfd: CWD, filename: /home/acme/.mozilla/firefox/fqxhj76d.default/prefs-1.js, flags: CREAT|TRUNC|WRONLY, mode: IRUSR|IWUSR) 1.875 perf/16772 openat(dfd: CWD, filename: /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/form) 1227.260 gnome-shell/1463 openat(dfd: CWD, filename: /proc/self/stat) 1227.397 gnome-shell/2125 openat(dfd: CWD, filename: /proc/self/stat) 7227.619 gnome-shell/1463 openat(dfd: CWD, filename: /proc/self/stat) 7227.661 gnome-shell/2125 openat(dfd: CWD, filename: /proc/self/stat) 10018.079 gnome-shell/1463 openat(dfd: CWD, filename: /proc/self/stat) 10018.514 perf/16772 openat(dfd: CWD, filename: /proc/1237/status) 10018.568 perf/16772 openat(dfd: CWD, filename: /proc/1237/status) 10022.409 gnome-shell/2125 openat(dfd: CWD, filename: /proc/self/stat) 10090.044 NetworkManager/1237 openat(dfd: CWD, filename: /proc/2125/stat) 10090.351 NetworkManager/1237 open(filename: /etc/passwd, flags: CLOEXEC) 10090.407 perf/16772 openat(dfd: CWD, filename: /sys/kernel/debug/tracing/events/syscalls/sys_enter_open/format) 10091.763 NetworkManager/1237 openat(dfd: CWD, filename: /proc/2125/stat) 10091.812 NetworkManager/1237 open(filename: /etc/passwd, flags: CLOEXEC) 10092.807 NetworkManager/1237 openat(dfd: CWD, filename: /proc/2125/stat) 10092.851 NetworkManager/1237 open(filename: /etc/passwd, flags: CLOEXEC) 10094.650 NetworkManager/1237 openat(dfd: CWD, filename: /proc/1463/stat) 10094.926 NetworkManager/1237 open(filename: /etc/passwd, flags: CLOEXEC) 10096.010 NetworkManager/1237 openat(dfd: CWD, filename: /proc/1463/stat) 10096.057 NetworkManager/1237 open(filename: /etc/passwd, flags: CLOEXEC) 10097.056 NetworkManager/1237 openat(dfd: CWD, filename: /proc/1463/stat) 10097.099 NetworkManager/1237 open(filename: /etc/passwd, flags: CLOEXEC) 13228.345 gnome-shell/1463 openat(dfd: CWD, filename: /proc/self/stat) 13232.734 gnome-shell/2125 openat(dfd: CWD, filename: /proc/self/stat) 15198.956 lighttpd/16748 open(filename: /proc/loadavg, mode: ISGID|IXOTH) ^C# It even catches 'perf' itself looking at the sys_enter_open and sys_enter_openat tracefs format dictionaries when it first finds them in the trace... :-) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-upmogc57uatljr6el6u8537l@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-21 17:20:06 +02:00
struct syscall_enter_open_args {
unsigned long long common_tp_fields;
long syscall_nr;
char *filename_ptr;
long flags;
long mode;
};
struct augmented_enter_open_args {
struct syscall_enter_open_args args;
struct augmented_filename filename;
};
int syscall_enter(open)(struct syscall_enter_open_args *args)
{
struct augmented_enter_open_args augmented_args = { .filename.reserved = 0, };
probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
sizeof(augmented_args.filename.value),
args->filename_ptr);
perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU,
&augmented_args,
sizeof(augmented_args) - sizeof(augmented_args.filename.value) + augmented_args.filename.size);
return 0;
}
perf trace: Augment inotify_add_watch pathname syscall arg Again, just changing tools/perf/examples/bpf/augmented_syscalls.c, that is starting to have too much boilerplate, some macro will come to the rescue. # perf trace -e tools/perf/examples/bpf/augmented_syscalls.c 0.000 gmain/2590 inotify_add_watch(fd: 3<anon_inode:inotify>, pathname: /var/cache/app-info/yaml, mask: 16789454) 0.023 gmain/2590 inotify_add_watch(fd: 3<anon_inode:inotify>, pathname: /var/lib/app-info/xmls, mask: 16789454) 0.028 gmain/2590 inotify_add_watch(fd: 3<anon_inode:inotify>, pathname: /var/lib/app-info/yaml, mask: 16789454) 0.032 gmain/2590 inotify_add_watch(fd: 3<anon_inode:inotify>, pathname: /usr/share/app-info/yaml, mask: 16789454) 0.039 gmain/2590 inotify_add_watch(fd: 3<anon_inode:inotify>, pathname: /usr/local/share/app-info/xmls, mask: 16789454) 0.045 gmain/2590 inotify_add_watch(fd: 3<anon_inode:inotify>, pathname: /usr/local/share/app-info/yaml, mask: 16789454) 0.049 gmain/2590 inotify_add_watch(fd: 3<anon_inode:inotify>, pathname: /home/acme/.local/share/app-info/yaml, mask: 16789454) 0.056 gmain/2590 inotify_add_watch(fd: 3<anon_inode:inotify>, pathname: , mask: 16789454) 0.010 gmain/2245 inotify_add_watch(fd: 7<anon_inode:inotify>, pathname: /home/acme/~, mask: 16789454) 0.087 perf/20116 openat(dfd: CWD, filename: /sys/kernel/debug/tracing/events/syscalls/sys_enter_inotify_add) 0.436 perf/20116 openat(dfd: CWD, filename: /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/form) 56.042 gmain/2791 inotify_add_watch(fd: 4<anon_inode:inotify>, pathname: /var/lib/fwupd/remotes.d/lvfs-testing, mask: 16789454) 113.986 gmain/1721 inotify_add_watch(fd: 3<anon_inode:inotify>, pathname: /var/lib/gdm/~, mask: 16789454) 3777.265 gsd-color/2408 openat(dfd: CWD, filename: /etc/localtime) 3777.550 gsd-color/2408 openat(dfd: CWD, filename: /etc/localtime) ^C[root@jouet perf]# Still not combining raw_syscalls:sys_enter + raw_syscalls:sys_exit, to get it strace-like, but that probably will come very naturally with some more wiring up... Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-ol83juin2cht9vzquynec5hz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-21 18:21:36 +02:00
struct syscall_enter_inotify_add_watch_args {
unsigned long long common_tp_fields;
long syscall_nr;
long fd;
char *pathname_ptr;
long mask;
};
struct augmented_enter_inotify_add_watch_args {
struct syscall_enter_inotify_add_watch_args args;
struct augmented_filename pathname;
};
int syscall_enter(inotify_add_watch)(struct syscall_enter_inotify_add_watch_args *args)
{
struct augmented_enter_inotify_add_watch_args augmented_args = { .pathname.reserved = 0, };
probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
augmented_args.pathname.size = probe_read_str(&augmented_args.pathname.value,
sizeof(augmented_args.pathname.value),
args->pathname_ptr);
perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU,
&augmented_args,
sizeof(augmented_args) - sizeof(augmented_args.pathname.value) + augmented_args.pathname.size);
return 0;
}
perf trace: Handle "bpf-output" events associated with "__augmented_syscalls__" BPF map Add an example BPF script that writes syscalls:sys_enter_openat raw tracepoint payloads augmented with the first 64 bytes of the "filename" syscall pointer arg. Then catch it and print it just like with things written to the "__bpf_stdout__" map associated with a PERF_COUNT_SW_BPF_OUTPUT software event, by just letting the default tracepoint handler in 'perf trace', trace__event_handler(), to use bpf_output__fprintf(trace, sample), just like it does with all other PERF_COUNT_SW_BPF_OUTPUT events, i.e. just do a dump on the payload, so that we can check if what is being printed has at least the first 64 bytes of the "filename" arg: The augmented_syscalls.c eBPF script: # cat tools/perf/examples/bpf/augmented_syscalls.c // SPDX-License-Identifier: GPL-2.0 #include <stdio.h> struct bpf_map SEC("maps") __augmented_syscalls__ = { .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, .key_size = sizeof(int), .value_size = sizeof(u32), .max_entries = __NR_CPUS__, }; struct syscall_enter_openat_args { unsigned long long common_tp_fields; long syscall_nr; long dfd; char *filename_ptr; long flags; long mode; }; struct augmented_enter_openat_args { struct syscall_enter_openat_args args; char filename[64]; }; int syscall_enter(openat)(struct syscall_enter_openat_args *args) { struct augmented_enter_openat_args augmented_args; probe_read(&augmented_args.args, sizeof(augmented_args.args), args); probe_read_str(&augmented_args.filename, sizeof(augmented_args.filename), args->filename_ptr); perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, sizeof(augmented_args)); return 1; } license(GPL); # So it will just prepare a raw_syscalls:sys_enter payload for the "openat" syscall. This will eventually be done for all syscalls with pointer args, globally or just when the user asks, using some spec, which args of which syscalls it wants "expanded" this way, we'll probably start with just all the syscalls that have char * pointers with familiar names, the ones we already handle with the probe:vfs_getname kprobe if it is in place hooking the kernel getname_flags() function used to copy from user the paths. Running it we get: # perf trace -e perf/tools/perf/examples/bpf/augmented_syscalls.c,openat cat /etc/passwd > /dev/null 0.000 ( ): __augmented_syscalls__:X?.C......................`\..................../etc/ld.so.cache..#......,....ao.k...............k......1."......... 0.006 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC 0.008 ( 0.005 ms): cat/31292 openat(dfd: CWD, filename: 0x5c600da8, flags: CLOEXEC ) = 3 0.036 ( ): __augmented_syscalls__:X?.C.......................\..................../lib64/libc.so.6......... .\....#........?.......=.C..../."......... 0.037 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC 0.039 ( 0.007 ms): cat/31292 openat(dfd: CWD, filename: 0x5c808ce0, flags: CLOEXEC ) = 3 0.323 ( ): __augmented_syscalls__:X?.C.....................P....................../etc/passwd......>.C....@................>.C.....,....ao.>.C........ 0.325 ( ): syscalls:sys_enter_openat:dfd: CWD, filename: 0xe8be50d6 0.327 ( 0.004 ms): cat/31292 openat(dfd: CWD, filename: 0xe8be50d6 ) = 3 # We need to go on optimizing this to avoid seding trash or zeroes in the pointer content payload, using the return from bpf_probe_read_str(), but to keep things simple at this stage and make incremental progress, lets leave it at that for now. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-g360n1zbj6bkbk6q0qo11c28@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-07 20:40:13 +02:00
license(GPL);