Cross Reference: /linux-master/tools/testing/selftests/bpf/progs/htab_mem

History log of /linux-master/tools/testing/selftests/bpf/progs/htab_mem_bench.c
Revision		Date	Author	Comments
# fd283ab1		03-Jul-2023	Hou Tao <houtao1@huawei.com>	selftests/bpf: Add benchmark for bpf memory allocator The benchmark could be used to compare the performance of hash map operations and the memory usage between different flavors of bpf memory allocator (e.g., no bpf ma vs bpf ma vs reuse-after-gp bpf ma). It also could be used to check the performance improvement or the memory saving provided by optimization. The benchmark creates a non-preallocated hash map which uses bpf memory allocator and shows the operation performance and the memory usage of the hash map under different use cases: (1) overwrite Each CPU overwrites nonoverlapping part of hash map. When each CPU completes overwriting of 64 elements in hash map, it increases the op_count. (2) batch_add_batch_del Each CPU adds then deletes nonoverlapping part of hash map in batch. When each CPU adds and deletes 64 elements in hash map, it increases the op_count twice. (3) add_del_on_diff_cpu Each two-CPUs pair adds and deletes nonoverlapping part of map cooperatively. When each CPU adds or deletes 64 elements in hash map, it will increase the op_count. The following is the benchmark results when comparing between different flavors of bpf memory allocator. These tests are conducted on a KVM guest with 8 CPUs and 16 GB memory. The command line below is used to do all the following benchmarks: ./bench htab-mem --use-case $name ${OPTS} -w3 -d10 -a -p8 These results show that preallocated hash map has both better performance and smaller memory footprint. (1) non-preallocated + no bpf memory allocator (v6.0.19) use kmalloc() + call_rcu overwrite per-prod-op: 11.24 ± 0.07k/s, avg mem: 82.64 ± 26.32MiB, peak mem: 119.18MiB batch_add_batch_del per-prod-op: 18.45 ± 0.10k/s, avg mem: 50.47 ± 14.51MiB, peak mem: 94.96MiB add_del_on_diff_cpu per-prod-op: 14.50 ± 0.03k/s, avg mem: 4.64 ± 0.73MiB, peak mem: 7.20MiB (2) preallocated OPTS=--preallocated overwrite per-prod-op: 191.42 ± 0.09k/s, avg mem: 1.24 ± 0.00MiB, peak mem: 1.49MiB batch_add_batch_del per-prod-op: 221.83 ± 0.17k/s, avg mem: 1.23 ± 0.00MiB, peak mem: 1.49MiB add_del_on_diff_cpu per-prod-op: 39.66 ± 0.31k/s, avg mem: 1.47 ± 0.13MiB, peak mem: 1.75MiB (3) normal bpf memory allocator overwrite per-prod-op: 126.59 ± 0.02k/s, avg mem: 2.26 ± 0.00MiB, peak mem: 2.74MiB batch_add_batch_del per-prod-op: 83.37 ± 0.20k/s, avg mem: 2.14 ± 0.17MiB, peak mem: 2.74MiB add_del_on_diff_cpu per-prod-op: 21.25 ± 0.24k/s, avg mem: 17.50 ± 3.32MiB, peak mem: 28.87MiB Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230704025039.938914-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>