History log of /linux-master/tools/testing/selftests/bpf/progs/trigger_bench.c
Revision Date Author Comments
# 379b97bb 11-Mar-2024 Jiri Olsa <jolsa@kernel.org>

selftests/bpf: Add kprobe multi triggering benchmarks

Adding kprobe multi triggering benchmarks. It's useful now to bench
new fprobe implementation and might be useful later as well.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240311211023.590321-1-jolsa@kernel.org


# 365c2b32 08-Mar-2024 Andrii Nakryiko <andrii@kernel.org>

selftests/bpf: Add fexit and kretprobe triggering benchmarks

We already have kprobe and fentry benchmarks. Let's add kretprobe and
fexit ones for completeness.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240309005124.3004446-1-andrii@kernel.org


# 39f8dc43 30-Mar-2022 Alan Maguire <alan.maguire@oracle.com>

libbpf: Add auto-attach for uprobes based on section name

Now that u[ret]probes can use name-based specification, it makes
sense to add support for auto-attach based on SEC() definition.
The format proposed is

SEC("u[ret]probe/binary:[raw_offset|[function_name[+offset]]")

For example, to trace malloc() in libc:

SEC("uprobe/libc.so.6:malloc")

...or to trace function foo2 in /usr/bin/foo:

SEC("uprobe//usr/bin/foo:foo2")

Auto-attach is done for all tasks (pid -1). prog can be an absolute
path or simply a program/library name; in the latter case, we use
PATH/LD_LIBRARY_PATH to resolve the full path, falling back to
standard locations (/usr/bin:/usr/sbin or /usr/lib64:/usr/lib) if
the file is not found via environment-variable specified locations.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648654000-21758-4-git-send-email-alan.maguire@oracle.com


# e91d280c 04-Feb-2022 Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

selftests/bpf: Fix tests to use arch-dependent syscall entry points

Some of the tests are using x86_64 ABI-specific syscall entry points
(such as __x64_sys_nanosleep and __x64_sys_getpgid). Update them to use
architecture-dependent syscall entry names.

Also update fexit_sleep test to not use BPF_PROG() so that it is clear
that the syscall parameters aren't being accessed in the bpf prog.

Note that none of the bpf progs in these tests are actually accessing
any of the syscall parameters. The only exception is perfbuf_bench, which
passes on the bpf prog context into bpf_perf_event_output() as a pointer
to pt_regs, but that looks to be mostly ignored.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/e35f7051f03e269b623a68b139d8ed131325f7b7.1643973917.git.naveen.n.rao@linux.vnet.ibm.com


# d41bc48b 15-Nov-2021 Andrii Nakryiko <andrii@kernel.org>

selftests/bpf: Add uprobe triggering overhead benchmarks

Add benchmark to measure overhead of uprobes and uretprobes. Also have
a baseline (no uprobe attached) benchmark.

On my dev machine, baseline benchmark can trigger 130M user_target()
invocations. When uprobe is attached, this falls to just 700K. With
uretprobe, we get down to 520K:

$ sudo ./bench trig-uprobe-base -a
Summary: hits 131.289 ± 2.872M/s

# UPROBE
$ sudo ./bench -a trig-uprobe-without-nop
Summary: hits 0.729 ± 0.007M/s

$ sudo ./bench -a trig-uprobe-with-nop
Summary: hits 1.798 ± 0.017M/s

# URETPROBE
$ sudo ./bench -a trig-uretprobe-without-nop
Summary: hits 0.508 ± 0.012M/s

$ sudo ./bench -a trig-uretprobe-with-nop
Summary: hits 0.883 ± 0.008M/s

So there is almost 2.5x performance difference between probing nop vs
non-nop instruction for entry uprobe. And 1.7x difference for uretprobe.

This means that non-nop uprobe overhead is around 1.4 microseconds for uprobe
and 2 microseconds for non-nop uretprobe.

For nop variants, uprobe and uretprobe overhead is down to 0.556 and
1.13 microseconds, respectively.

For comparison, just doing a very low-overhead syscall (with no BPF
programs attached anywhere) gives:

$ sudo ./bench trig-base -a
Summary: hits 4.830 ± 0.036M/s

So uprobes are about 2.67x slower than pure context switch.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211116013041.4072571-1-andrii@kernel.org


# e68a1445 27-Aug-2020 Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add sleepable tests

Modify few tests to sanity test sleepable bpf functionality.

Running 'bench trig-fentry-sleep' vs 'bench trig-fentry' and 'perf report':
sleepable with SRCU:
3.86% bench [k] __srcu_read_unlock
3.22% bench [k] __srcu_read_lock
0.92% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep
0.50% bench [k] bpf_trampoline_10297
0.26% bench [k] __bpf_prog_exit_sleepable
0.21% bench [k] __bpf_prog_enter_sleepable

sleepable with RCU_TRACE:
0.79% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep
0.72% bench [k] bpf_trampoline_10381
0.31% bench [k] __bpf_prog_exit_sleepable
0.29% bench [k] __bpf_prog_enter_sleepable

non-sleepable with RCU:
0.88% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry
0.84% bench [k] bpf_trampoline_10297
0.13% bench [k] __bpf_prog_enter
0.12% bench [k] __bpf_prog_exit

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200827220114.69225-6-alexei.starovoitov@gmail.com


# c5d420c3 12-May-2020 Andrii Nakryiko <andriin@fb.com>

selftest/bpf: Add BPF triggering benchmark

It is sometimes desirable to be able to trigger BPF program from user-space
with minimal overhead. sys_enter would seem to be a good candidate, yet in
a lot of cases there will be a lot of noise from syscalls triggered by other
processes on the system. So while searching for low-overhead alternative, I've
stumbled upon getpgid() syscall, which seems to be specific enough to not
suffer from accidental syscall by other apps.

This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe,
fentry and fmod_ret with returning error (so that syscall would not be
executed), to determine the lowest-overhead way. Here are results on my
machine (using benchs/run_bench_trigger.sh script):

base : 9.200 ± 0.319M/s
tp : 6.690 ± 0.125M/s
rawtp : 8.571 ± 0.214M/s
kprobe : 6.431 ± 0.048M/s
fentry : 8.955 ± 0.241M/s
fmodret : 8.903 ± 0.135M/s

So it seems like fmodret doesn't give much benefit for such lightweight
syscall. Raw tracepoint is pretty decent despite additional filtering logic,
but it will be called for any other syscall in the system, which rules it out.
Fentry, though, seems to be adding the least amoung of overhead and achieves
97.3% of performance of baseline no-BPF-attached syscall.

Using getpgid() seems to be preferable to set_task_comm() approach from
test_overhead, as it's about 2.35x faster in a baseline performance.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com