#
aa70d2d1 |
|
06-Mar-2024 |
Eric Dumazet <edumazet@google.com> |
net: move skbuff_cache(s) to net_hotdata skbuff_cache, skbuff_fclone_cache and skb_small_head_cache are used in rx/tx fast paths. Move them to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-11-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
6f3189f3 |
|
28-Jan-2024 |
Daniel Xu <dxu@dxuuu.xyz> |
bpf: treewide: Annotate BPF kfuncs in BTF This commit marks kfuncs as such inside the .BTF_ids section. The upshot of these annotations is that we'll be able to automatically generate kfunc prototypes for downstream users. The process is as follows: 1. In source, use BTF_KFUNCS_START/END macro pair to mark kfuncs 2. During build, pahole injects into BTF a "bpf_kfunc" BTF_DECL_TAG for each function inside BTF_KFUNCS sets 3. At runtime, vmlinux or module BTF is made available in sysfs 4. At runtime, bpftool (or similar) can look at provided BTF and generate appropriate prototypes for functions with "bpf_kfunc" tag To ensure future kfunc are similarly tagged, we now also return error inside kfunc registration for untagged kfuncs. For vmlinux kfuncs, we also WARN(), as initcall machinery does not handle errors. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/e55150ceecbf0a5d961e608941165c0bee7bc943.1706491398.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
e4c00339 |
|
15-Dec-2023 |
Peter Zijlstra <peterz@infradead.org> |
bpf: Fix dtor CFI Ensure the various dtor functions match their prototype and retain their CFI signatures, since they don't have their address taken, they are prone to not getting CFI, making them impossible to call indirectly. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20231215092707.799451071@infradead.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
b16904fd |
|
26-Nov-2023 |
Yonghong Song <yonghong.song@linux.dev> |
bpf: Fix a few selftest failures due to llvm18 change With latest upstream llvm18, the following test cases failed: $ ./test_progs -j #13/2 bpf_cookie/multi_kprobe_link_api:FAIL #13/3 bpf_cookie/multi_kprobe_attach_api:FAIL #13 bpf_cookie:FAIL #77 fentry_fexit:FAIL #78/1 fentry_test/fentry:FAIL #78 fentry_test:FAIL #82/1 fexit_test/fexit:FAIL #82 fexit_test:FAIL #112/1 kprobe_multi_test/skel_api:FAIL #112/2 kprobe_multi_test/link_api_addrs:FAIL [...] #112 kprobe_multi_test:FAIL #356/17 test_global_funcs/global_func17:FAIL #356 test_global_funcs:FAIL Further analysis shows llvm upstream patch [1] is responsible for the above failures. For example, for function bpf_fentry_test7() in net/bpf/test_run.c, without [1], the asm code is: 0000000000000400 <bpf_fentry_test7>: 400: f3 0f 1e fa endbr64 404: e8 00 00 00 00 callq 0x409 <bpf_fentry_test7+0x9> 409: 48 89 f8 movq %rdi, %rax 40c: c3 retq 40d: 0f 1f 00 nopl (%rax) ... and with [1], the asm code is: 0000000000005d20 <bpf_fentry_test7.specialized.1>: 5d20: e8 00 00 00 00 callq 0x5d25 <bpf_fentry_test7.specialized.1+0x5> 5d25: c3 retq ... and <bpf_fentry_test7.specialized.1> is called instead of <bpf_fentry_test7> and this caused test failures for #13/#77 etc. except #356. For test case #356/17, with [1] (progs/test_global_func17.c)), the main prog looks like: 0000000000000000 <global_func17>: 0: b4 00 00 00 2a 00 00 00 w0 = 0x2a 1: 95 00 00 00 00 00 00 00 exit ... which passed verification while the test itself expects a verification failure. Let us add 'barrier_var' style asm code in both places to prevent function specialization which caused selftests failure. [1] https://github.com/llvm/llvm-project/pull/72903 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231127050342.1945270-1-yonghong.song@linux.dev
|
#
391145ba |
|
31-Oct-2023 |
Dave Marchevsky <davemarchevsky@fb.com> |
bpf: Add __bpf_kfunc_{start,end}_defs macros BPF kfuncs are meant to be called from BPF programs. Accordingly, most kfuncs are not called from anywhere in the kernel, which the -Wmissing-prototypes warning is unhappy about. We've peppered __diag_ignore_all("-Wmissing-prototypes", ... everywhere kfuncs are defined in the codebase to suppress this warning. This patch adds two macros meant to bound one or many kfunc definitions. All existing kfunc definitions which use these __diag calls to suppress -Wmissing-prototypes are migrated to use the newly-introduced macros. A new __diag_ignore_all - for "-Wmissing-declarations" - is added to the __bpf_kfunc_start_defs macro based on feedback from Andrii on an earlier version of this patch [0] and another recent mailing list thread [1]. In the future we might need to ignore different warnings or do other kfunc-specific things. This change will make it easier to make such modifications for all kfunc defs. [0]: https://lore.kernel.org/bpf/CAEf4BzaE5dRWtK6RPLnjTW-MW9sx9K3Fn6uwqCTChK2Dcb1Xig@mail.gmail.com/ [1]: https://lore.kernel.org/bpf/ZT+2qCc%2FaXep0%2FLf@krava/ Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Jiri Olsa <olsajiri@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: David Vernet <void@manifault.com> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20231031215625.2343848-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
32337c0a |
|
26-Aug-2023 |
Yonghong Song <yonghong.song@linux.dev> |
bpf: Prevent inlining of bpf_fentry_test7() With latest clang18, I hit test_progs failures for the following test: #13/2 bpf_cookie/multi_kprobe_link_api:FAIL #13/3 bpf_cookie/multi_kprobe_attach_api:FAIL #13 bpf_cookie:FAIL #75 fentry_fexit:FAIL #76/1 fentry_test/fentry:FAIL #76 fentry_test:FAIL #80/1 fexit_test/fexit:FAIL #80 fexit_test:FAIL #110/1 kprobe_multi_test/skel_api:FAIL #110/2 kprobe_multi_test/link_api_addrs:FAIL #110/3 kprobe_multi_test/link_api_syms:FAIL #110/4 kprobe_multi_test/attach_api_pattern:FAIL #110/5 kprobe_multi_test/attach_api_addrs:FAIL #110/6 kprobe_multi_test/attach_api_syms:FAIL #110 kprobe_multi_test:FAIL For example, for #13/2, the error messages are: [...] kprobe_multi_test_run:FAIL:kprobe_test7_result unexpected kprobe_test7_result: actual 0 != expected 1 [...] kprobe_multi_test_run:FAIL:kretprobe_test7_result unexpected kretprobe_test7_result: actual 0 != expected 1 clang17 does not have this issue. Further investigation shows that kernel func bpf_fentry_test7(), used in the above tests, is inlined by the compiler although it is marked as noinline. int noinline bpf_fentry_test7(struct bpf_fentry_test_t *arg) { return (long)arg; } It is known that for simple functions like the above (e.g. just returning a constant or an input argument), the clang compiler may still do inlining for a noinline function. Adding 'asm volatile ("")' in the beginning of the bpf_fentry_test7() can prevent inlining. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20230826200843.2210074-1-yonghong.song@linux.dev
|
#
a9ca9f9c |
|
04-Aug-2023 |
Yunsheng Lin <linyunsheng@huawei.com> |
page_pool: split types and declarations from page_pool.h Split types and pure function declarations from page_pool.h and add them in page_page/types.h, so that C sources can include page_pool.h and headers should generally only include page_pool/types.h as suggested by jakub. Rename page_pool.h to page_pool/helpers.h to have both in one place. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230804180529.2483231-2-aleksander.lobakin@intel.com [Jakub: change microsoft/mana, fix kdoc paths in Documentation] Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
49e47a5b |
|
02-Aug-2023 |
Jakub Kicinski <kuba@kernel.org> |
net: move struct netdev_rx_queue out of netdevice.h struct netdev_rx_queue is touched in only a few places and having it defined in netdevice.h brings in the dependency on xdp.h, because struct xdp_rxq_info gets embedded in struct netdev_rx_queue. In prep for removal of xdp.h from netdevice.h move all the netdev_rx_queue stuff to a new header. We could technically break the new header up to avoid the sysfs.h include but it's so rarely included it doesn't seem to be worth it at this point. Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230803010230.1755386-3-kuba@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
5e9cf77d |
|
12-Jul-2023 |
Menglong Dong <imagedong@tencent.com> |
selftests/bpf: add testcase for TRACING with 6+ arguments Add fentry_many_args.c and fexit_many_args.c to test the fentry/fexit with 7/11 arguments. As this feature is not supported by arm64 yet, we disable these testcases for arm64 in DENYLIST.aarch64. We can combine them with fentry_test.c/fexit_test.c when arm64 is supported too. Correspondingly, add bpf_testmod_fentry_test7() and bpf_testmod_fentry_test11() to bpf_testmod.c Meanwhile, add bpf_modify_return_test2() to test_run.c to test the MODIFY_RETURN with 7 arguments. Add bpf_testmod_test_struct_arg_7/bpf_testmod_test_struct_arg_7 in bpf_testmod.c to test the struct in the arguments. And the testcases passed on x86_64: ./test_progs -t fexit Summary: 5/14 PASSED, 0 SKIPPED, 0 FAILED ./test_progs -t fentry Summary: 3/2 PASSED, 0 SKIPPED, 0 FAILED ./test_progs -t modify_return Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED ./test_progs -t tracing_struct Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Menglong Dong <imagedong@tencent.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230713040738.1789742-4-imagedong@tencent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
2597a25c |
|
26-Jun-2023 |
Stanislav Fomichev <sdf@google.com> |
selftests/bpf: Add test to exercise typedef walking Add new bpf_fentry_test_sinfo with skb_shared_info argument and try to access frags. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20230626212522.2414485-2-sdf@google.com
|
#
65eb006d |
|
15-May-2023 |
Jiri Olsa <jolsa@kernel.org> |
bpf: Move kernel test kfuncs to bpf_testmod Moving kernel test kfuncs into bpf_testmod kernel module, and adding necessary init calls and BTF IDs records. We need to keep following structs in kernel: struct prog_test_ref_kfunc struct prog_test_member (embedded in prog_test_ref_kfunc) The reason is because they need to be marked as rcu safe (check test prog mark_ref_as_untrusted_or_null) and such objects are being required to be defined only in kernel at the moment (see rcu_safe_kptr check in kernel). We need to keep also dtor functions for both objects in kernel: bpf_kfunc_call_test_release bpf_kfunc_call_memb_release We also keep the copy of these struct in bpf_testmod_kfunc.h, because other test functions use them. This is unfortunate, but this is just temporary solution until we are able to these structs them to bpf_testmod completely. As suggested by David adding bpf_testmod.ko make dependency for bpf programs, so they are rebuilt if we change the bpf_testmod.ko module. Also adding missing __bpf_kfunc to bpf_kfunc_call_test4 functions. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230515133756.1658301-11-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
b51f4113 |
|
10-May-2023 |
Yunsheng Lin <linyunsheng@huawei.com> |
net: introduce and use skb_frag_fill_page_desc() Most users use __skb_frag_set_page()/skb_frag_off_set()/ skb_frag_size_set() to fill the page desc for a skb frag. Introduce skb_frag_fill_page_desc() to do that. net/bpf/test_run.c does not call skb_frag_off_set() to set the offset, "copy_from_user(page_address(page), ...)" and 'shinfo' being part of the 'data' kzalloced in bpf_test_init() suggest that it is assuming offset to be initialized as zero, so call skb_frag_fill_page_desc() with offset being zero for this case. Also, skb_frag_set_page() is not used anymore, so remove it. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2b99ef22 |
|
21-Apr-2023 |
Florian Westphal <fw@strlen.de> |
bpf: add test_run support for netfilter program type add glue code so a bpf program can be run using userspace-provided netfilter state and packet/skb. Default is to use ipv4:output hook point, but this can be overridden by userspace. Userspace provided netfilter state is restricted, only hook and protocol families can be overridden and only to ipv4/ipv6. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-7-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
09b501d9 |
|
16-Apr-2023 |
David Vernet <void@manifault.com> |
bpf: Remove bpf_kfunc_call_test_kptr_get() test kfunc We've managed to improve the UX for kptrs significantly over the last 9 months. All of the prior main use cases, struct bpf_cpumask *, struct task_struct *, and struct cgroup *, have all been updated to be synchronized mainly using RCU. In other words, their KF_ACQUIRE kfunc calls are all KF_RCU, and the pointers themselves are MEM_RCU and can be accessed in an RCU read region in BPF. In a follow-on change, we'll be removing the KF_KPTR_GET kfunc flag. This patch prepares for that by removing the bpf_kfunc_call_test_kptr_get() kfunc, and all associated selftests. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230416084928.326135-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
75dcef8d |
|
10-Apr-2023 |
Feng Zhou <zhoufeng.zf@bytedance.com> |
selftests/bpf: Add test to access u32 ptr argument in tracing program Adding verifier test for accessing u32 pointer argument in tracing programs. The test program loads 1nd argument of bpf_fentry_test9 function which is u32 pointer and checks that verifier allows that. Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20230410085908.98493-3-zhoufeng.zf@bytedance.com
|
#
6c831c46 |
|
25-Mar-2023 |
David Vernet <void@manifault.com> |
bpf: Treat KF_RELEASE kfuncs as KF_TRUSTED_ARGS KF_RELEASE kfuncs are not currently treated as having KF_TRUSTED_ARGS, even though they have a superset of the requirements of KF_TRUSTED_ARGS. Like KF_TRUSTED_ARGS, KF_RELEASE kfuncs require a 0-offset argument, and don't allow NULL-able arguments. Unlike KF_TRUSTED_ARGS which require _either_ an argument with ref_obj_id > 0, _or_ (ref->type & BPF_REG_TRUSTED_MODIFIERS) (and no unsafe modifiers allowed), KF_RELEASE only allows for ref_obj_id > 0. Because KF_RELEASE today doesn't automatically imply KF_TRUSTED_ARGS, some of these requirements are enforced in different ways that can make the behavior of the verifier feel unpredictable. For example, a KF_RELEASE kfunc with a NULL-able argument will currently fail in the verifier with a message like, "arg#0 is ptr_or_null_ expected ptr_ or socket" rather than "Possibly NULL pointer passed to trusted arg0". Our intention is the same, but the semantics are different due to implemenetation details that kfunc authors and BPF program writers should not need to care about. Let's make the behavior of the verifier more consistent and intuitive by having KF_RELEASE kfuncs imply the presence of KF_TRUSTED_ARGS. Our eventual goal is to have all kfuncs assume KF_TRUSTED_ARGS by default anyways, so this takes us a step in that direction. Note that it does not make sense to assume KF_TRUSTED_ARGS for all KF_ACQUIRE kfuncs. KF_ACQUIRE kfuncs can have looser semantics than KF_RELEASE, with e.g. KF_RCU | KF_RET_NULL. We may want to have KF_ACQUIRE imply KF_TRUSTED_ARGS _unless_ KF_RCU is specified, but that can be left to another patch set, and there are no such subtleties to address for KF_RELEASE. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230325213144.486885-4-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
fb2211a5 |
|
25-Mar-2023 |
David Vernet <void@manifault.com> |
bpf: Remove now-unnecessary NULL checks for KF_RELEASE kfuncs Now that we're not invoking kfunc destructors when the kptr in a map was NULL, we no longer require NULL checks in many of our KF_RELEASE kfuncs. This patch removes those NULL checks. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230325213144.486885-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
e5995bc7 |
|
16-Mar-2023 |
Alexander Lobakin <aleksander.lobakin@intel.com> |
bpf, test_run: fix crashes due to XDP frame overwriting/corruption syzbot and Ilya faced the splats when %XDP_PASS happens for bpf_test_run after skb PP recycling was enabled for {__,}xdp_build_skb_from_frame(): BUG: kernel NULL pointer dereference, address: 0000000000000d28 RIP: 0010:memset_erms+0xd/0x20 arch/x86/lib/memset_64.S:66 [...] Call Trace: <TASK> __finalize_skb_around net/core/skbuff.c:321 [inline] __build_skb_around+0x232/0x3a0 net/core/skbuff.c:379 build_skb_around+0x32/0x290 net/core/skbuff.c:444 __xdp_build_skb_from_frame+0x121/0x760 net/core/xdp.c:622 xdp_recv_frames net/bpf/test_run.c:248 [inline] xdp_test_run_batch net/bpf/test_run.c:334 [inline] bpf_test_run_xdp_live+0x1289/0x1930 net/bpf/test_run.c:362 bpf_prog_test_run_xdp+0xa05/0x14e0 net/bpf/test_run.c:1418 [...] This happens due to that it calls xdp_scrub_frame(), which nullifies xdpf->data. bpf_test_run code doesn't reinit the frame when the XDP program doesn't adjust head or tail. Previously, %XDP_PASS meant the page will be released from the pool and returned to the MM layer, but now it does return to the Pool with the nullified xdpf->data, which doesn't get reinitialized then. So, in addition to checking whether the head and/or tail have been adjusted, check also for a potential XDP frame corruption. xdpf->data is 100% affected and also xdpf->flags is the field closest to the metadata / frame start. Checking for these two should be enough for non-extreme cases. Fixes: 9c94bbf9a87b ("xdp: recycle Page Pool backed skbs built from XDP frames") Reported-by: syzbot+e1d1b65f7c32f2a86a9f@syzkaller.appspotmail.com Link: https://lore.kernel.org/bpf/000000000000f1985705f6ef2243@google.com Reported-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/bpf/e07dd94022ad5731705891b9487cc9ed66328b94.camel@linux.ibm.com Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230316175051.922550-2-aleksander.lobakin@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
aa3d65de |
|
10-Mar-2023 |
Viktor Malik <vmalik@redhat.com> |
bpf/selftests: Test fentry attachment to shadowed functions Adds a new test that tries to attach a program to fentry of two functions of the same name, one located in vmlinux and the other in bpf_testmod. To avoid conflicts with existing tests, a new function "bpf_fentry_shadow_test" was created both in vmlinux and in bpf_testmod. The previous commit fixed a bug which caused this test to fail. The verifier would always use the vmlinux function's address as the target trampoline address, hence trying to create two trampolines for a single address, which is forbidden. The test (similarly to other fentry/fexit tests) is not working on arm64 at the moment. Signed-off-by: Viktor Malik <vmalik@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/5fe2f364190b6f79b085066ed7c5989c5bc475fa.1678432753.git.vmalik@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
20c09d92 |
|
02-Mar-2023 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Introduce kptr_rcu. The life time of certain kernel structures like 'struct cgroup' is protected by RCU. Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps. The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU. Derefrence of other kptr-s returns PTR_UNTRUSTED. For example: struct map_value { struct cgroup __kptr *cgrp; }; SEC("tp_btf/cgroup_mkdir") int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path) { struct cgroup *cg, *cg2; cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0 bpf_kptr_xchg(&v->cgrp, cg); cg2 = v->cgrp; // This is new feature introduced by this patch. // cg2 is PTR_MAYBE_NULL | MEM_RCU. // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero if (cg2) bpf_cgroup_ancestor(cg2, level); // safe to do. } Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com
|
#
294635a8 |
|
24-Feb-2023 |
Alexander Lobakin <aleksander.lobakin@intel.com> |
bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES &xdp_buff and &xdp_frame are bound in a way that xdp_buff->data_hard_start == xdp_frame It's always the case and e.g. xdp_convert_buff_to_frame() relies on this. IOW, the following: for (u32 i = 0; i < 0xdead; i++) { xdpf = xdp_convert_buff_to_frame(&xdp); xdp_convert_frame_to_buff(xdpf, &xdp); } shouldn't ever modify @xdpf's contents or the pointer itself. However, "live packet" code wrongly treats &xdp_frame as part of its context placed *before* the data_hard_start. With such flow, data_hard_start is sizeof(*xdpf) off to the right and no longer points to the XDP frame. Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several places and praying that there are no more miscalcs left somewhere in the code, unionize ::frm with ::data in a flex array, so that both starts pointing to the actual data_hard_start and the XDP frame actually starts being a part of it, i.e. a part of the headroom, not the context. A nice side effect is that the maximum frame size for this mode gets increased by 40 bytes, as xdp_buff::frame_sz includes everything from data_hard_start (-> includes xdpf already) to the end of XDP/skb shared info. Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it hardcoded for 64 bit && 4k pages, it can be made more flexible later on. Minor: align `&head->data` with how `head->frm` is assigned for consistency. Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for clarity. (was found while testing XDP traffic generator on ice, which calls xdp_convert_frame_to_buff() for each XDP frame) Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
181127fb |
|
17-Feb-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES" This reverts commit 6c20822fada1b8adb77fa450d03a0d449686a4a9. build bot failed on arch with different cache line size: https://lore.kernel.org/bpf/50c35055-afa9-d01e-9a05-ea5351280e4f@intel.com/ Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
af2d0d09 |
|
16-Feb-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
bpf: Disable bh in bpf_test_run for xdp and tc prog Some of the bpf helpers require bh disabled. eg. The bpf_fib_lookup helper that will be used in a latter selftest. In particular, it calls ___neigh_lookup_noref that expects the bh disabled. This patch disables bh before calling bpf_prog_run[_xdp], so the testing prog can also use those helpers. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217004150.2980689-2-martin.lau@linux.dev
|
#
6c20822f |
|
15-Feb-2023 |
Alexander Lobakin <aleksander.lobakin@intel.com> |
bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES &xdp_buff and &xdp_frame are bound in a way that xdp_buff->data_hard_start == xdp_frame It's always the case and e.g. xdp_convert_buff_to_frame() relies on this. IOW, the following: for (u32 i = 0; i < 0xdead; i++) { xdpf = xdp_convert_buff_to_frame(&xdp); xdp_convert_frame_to_buff(xdpf, &xdp); } shouldn't ever modify @xdpf's contents or the pointer itself. However, "live packet" code wrongly treats &xdp_frame as part of its context placed *before* the data_hard_start. With such flow, data_hard_start is sizeof(*xdpf) off to the right and no longer points to the XDP frame. Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several places and praying that there are no more miscalcs left somewhere in the code, unionize ::frm with ::data in a flex array, so that both starts pointing to the actual data_hard_start and the XDP frame actually starts being a part of it, i.e. a part of the headroom, not the context. A nice side effect is that the maximum frame size for this mode gets increased by 40 bytes, as xdp_buff::frame_sz includes everything from data_hard_start (-> includes xdpf already) to the end of XDP/skb shared info. Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it hardcoded for 64 bit && 4k pages, it can be made more flexible later on. Minor: align `&head->data` with how `head->frm` is assigned for consistency. Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for clarity. (was found while testing XDP traffic generator on ice, which calls xdp_convert_frame_to_buff() for each XDP frame) Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230215185440.4126672-1-aleksander.lobakin@intel.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
025a785f |
|
08-Feb-2023 |
Jakub Kicinski <kuba@kernel.org> |
net: skbuff: drop the word head from skb cache skbuff_head_cache is misnamed (perhaps for historical reasons?) because it does not hold heads. Head is the buffer which skb->data points to, and also where shinfo lives. struct sk_buff is a metadata structure, not the head. Eric recently added skb_small_head_cache (which allocates actual head buffers), let that serve as an excuse to finally clean this up :) Leave the user-space visible name intact, it could possibly be uAPI. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6aed15e3 |
|
01-Feb-2023 |
David Vernet <void@manifault.com> |
selftests/bpf: Add testcase for static kfunc with unused arg kfuncs are allowed to be static, or not use one or more of their arguments. For example, bpf_xdp_metadata_rx_hash() in net/core/xdp.c is meant to be implemented by drivers, with the default implementation just returning -EOPNOTSUPP. As described in [0], such kfuncs can have their arguments elided, which can cause BTF encoding to be skipped. The new __bpf_kfunc macro should address this, and this patch adds a selftest which verifies that a static kfunc with at least one unused argument can still be encoded and invoked by a BPF program. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230201173016.342758-5-void@manifault.com
|
#
400031e0 |
|
01-Feb-2023 |
David Vernet <void@manifault.com> |
bpf: Add __bpf_kfunc tag to all kfuncs Now that we have the __bpf_kfunc tag, we should use add it to all existing kfuncs to ensure that they'll never be elided in LTO builds. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230201173016.342758-4-void@manifault.com
|
#
be6b5c10 |
|
27-Jan-2023 |
Ilya Leoshkevich <iii@linux.ibm.com> |
selftests/bpf: Add a sign-extension test for kfuncs s390x ABI requires the caller to zero- or sign-extend the arguments. eBPF already deals with zero-extension (by definition of its ABI), but not with sign-extension. Add a test to cover that potentially problematic area. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-15-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
3d76a4d3 |
|
19-Jan-2023 |
Stanislav Fomichev <sdf@google.com> |
bpf: XDP metadata RX kfuncs Define a new kfunc set (xdp_metadata_kfunc_ids) which implements all possible XDP metatada kfuncs. Not all devices have to implement them. If kfunc is not supported by the target device, the default implementation is called instead. The verifier, at load time, replaces a call to the generic kfunc with a call to the per-device one. Per-device kfunc pointers are stored in separate struct xdp_metadata_ops. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-8-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
ce098da1 |
|
07-Dec-2022 |
Kees Cook <keescook@chromium.org> |
skbuff: Introduce slab_build_skb() syzkaller reported: BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294 Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295 For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to build_skb(). When build_skb() is passed a frag_size of 0, it means the buffer came from kmalloc. In these cases, ksize() is used to find its actual size, but since the allocation may not have been made to that size, actually perform the krealloc() call so that all the associated buffer size checking will be correctly notified (and use the "new" pointer so that compiler hinting works correctly). Split this logic out into a new interface, slab_build_skb(), but leave the original 0 checking for now to catch any stragglers. Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function") Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: pepsipu <soopthegoop@gmail.com> Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com Cc: Vlastimil Babka <vbabka@suse.cz> Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: ast@kernel.org Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Hao Luo <haoluo@google.com> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: jolsa@kernel.org Cc: KP Singh <kpsingh@kernel.org> Cc: martin.lau@linux.dev Cc: Stanislav Fomichev <sdf@google.com> Cc: song@kernel.org Cc: Yonghong Song <yhs@fb.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221208060256.give.994-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
5b481aca |
|
06-Dec-2022 |
Benjamin Tissoires <benjamin.tissoires@redhat.com> |
bpf: do not rely on ALLOW_ERROR_INJECTION for fmod_ret The current way of expressing that a non-bpf kernel component is willing to accept that bpf programs can be attached to it and that they can change the return value is to abuse ALLOW_ERROR_INJECTION. This is debated in the link below, and the result is that it is not a reasonable thing to do. Reuse the kfunc declaration structure to also tag the kernel functions we want to be fmodret. This way we can control from any subsystem which functions are being modified by bpf without touching the verifier. Link: https://lore.kernel.org/all/20221121104403.1545f9b5@gandalf.local.home/ Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20221206145936.922196-2-benjamin.tissoires@redhat.com
|
#
114039b3 |
|
21-Nov-2022 |
Stanislav Fomichev <sdf@google.com> |
bpf: Move skb->len == 0 checks into __bpf_redirect To avoid potentially breaking existing users. Both mac/no-mac cases have to be amended; mac_header >= network_header is not enough (verified with a new test, see next patch). Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len") Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221121180340.1983627-1-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
d3fd203f |
|
02-Nov-2022 |
Baisong Zhong <zhongbaisong@huawei.com> |
bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb() We got a syzkaller problem because of aarch64 alignment fault if KFENCE enabled. When the size from user bpf program is an odd number, like 399, 407, etc, it will cause the struct skb_shared_info's unaligned access. As seen below: BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 Use-after-free read at 0xffff6254fffac077 (in kfence-#213): __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline] arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline] atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline] __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 skb_clone+0xf4/0x214 net/core/skbuff.c:1481 ____bpf_clone_redirect net/core/filter.c:2433 [inline] bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420 bpf_prog_d3839dd9068ceb51+0x80/0x330 bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline] bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53 bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594 bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512 allocated by task 15074 on cpu 0 at 1342.585390s: kmalloc include/linux/slab.h:568 [inline] kzalloc include/linux/slab.h:675 [inline] bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191 bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512 bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381 To fix the problem, we adjust @size so that (@size + @hearoom) is a multiple of SMP_CACHE_BYTES. So we make sure the struct skb_shared_info is aligned to a cache line. Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command") Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20221102081620.1465154-1-zhongbaisong@huawei.com
|
#
22ed8d5a |
|
06-Sep-2022 |
Benjamin Tissoires <benjamin.tissoires@redhat.com> |
selftests/bpf: Add tests for kfunc returning a memory pointer We add 2 new kfuncs that are following the RET_PTR_TO_MEM capability from the previous commit. Then we test them in selftests: the first tests are testing valid case, and are not failing, and the later ones are actually preventing the program to be loaded because they are wrong. To work around that, we mark the failing ones as not autoloaded (with SEC("?tc")), and we manually enable them one by one, ensuring the verifier rejects them. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220906151303.2780789-8-benjamin.tissoires@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
fb66223a |
|
06-Sep-2022 |
Benjamin Tissoires <benjamin.tissoires@redhat.com> |
selftests/bpf: add test for accessing ctx from syscall program type We need to also export the kfunc set to the syscall program type, and then add a couple of eBPF programs that are testing those calls. The first one checks for valid access, and the second one is OK from a static analysis point of view but fails at run time because we are trying to access outside of the allocated memory. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220906151303.2780789-5-benjamin.tissoires@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
5deedfbe |
|
21-Aug-2022 |
Shmulik Ladkani <shmulik.ladkani@gmail.com> |
bpf, test_run: Propagate bpf_flow_dissect's retval to user's bpf_attr.test.retval Formerly, a boolean denoting whether bpf_flow_dissect returned BPF_OK was set into 'bpf_attr.test.retval'. Augment this, so users can check the actual return code of the dissector program under test. Existing prog_tests/flow_dissector*.c tests were correspondingly changed to check against each test's expected retval. Also, tests' resulting 'flow_keys' are verified only in case the expected retval is BPF_OK. This allows adding new tests that expect non BPF_OK. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220821113519.116765-4-shmulik.ladkani@gmail.com
|
#
0ba98502 |
|
21-Aug-2022 |
Shmulik Ladkani <shmulik.ladkani@gmail.com> |
flow_dissector: Make 'bpf_flow_dissect' return the bpf program retcode Let 'bpf_flow_dissect' callers know the BPF program's retcode and act accordingly. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220821113519.116765-2-shmulik.ladkani@gmail.com
|
#
e3389458 |
|
10-Aug-2022 |
Artem Savkov <asavkov@redhat.com> |
selftests/bpf: add destructive kfunc test Add a test checking that programs calling destructive kfuncs can only do so if they have CAP_SYS_BOOT capabilities. Signed-off-by: Artem Savkov <asavkov@redhat.com> Link: https://lore.kernel.org/r/20220810065905.475418-4-asavkov@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
1f075262 |
|
09-Aug-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Allow calling bpf_prog_test kfuncs in tracing programs In addition to TC hook, enable these in tracing programs so that they can be used in selftests. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220809213033.24147-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
56e948ff |
|
21-Jul-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Add support for forcing kfunc args to be trusted Teach the verifier to detect a new KF_TRUSTED_ARGS kfunc flag, which means each pointer argument must be trusted, which we define as a pointer that is referenced (has non-zero ref_obj_id) and also needs to have its offset unchanged, similar to how release functions expect their argument. This allows a kfunc to receive pointer arguments unchanged from the result of the acquire kfunc. This is required to ensure that kfunc that operate on some object only work on acquired pointers and not normal PTR_TO_BTF_ID with same type which can be obtained by pointer walking. The restrictions applied to release arguments also apply to trusted arguments. This implies that strict type matching (not deducing type by recursively following members at offset) and OBJ_RELEASE offset checks (ensuring they are zero) are used for trusted pointer arguments. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
a4703e31 |
|
21-Jul-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Switch to new kfunc flags infrastructure Instead of populating multiple sets to indicate some attribute and then researching the same BTF ID in them, prepare a single unified BTF set which indicates whether a kfunc is allowed to be called, and also its attributes if any at the same time. Now, only one call is needed to perform the lookup for both kfunc availability and its attributes. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
fd189422 |
|
15-Jul-2022 |
Zhengchao Shao <shaozhengchao@huawei.com> |
bpf: Don't redirect packets with invalid pkt_len Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any skbs, that is, the flow->head is null. The root cause, as the [2] says, is because that bpf_prog_test_run_skb() run a bpf prog which redirects empty skbs. So we should determine whether the length of the packet modified by bpf prog or others like bpf_prog_test is valid before forwarding it directly. LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5 LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
988d0d58 |
|
29-May-2022 |
Daniel Xu <dxu@dxuuu.xyz> |
bpf, test_run: Remove unnecessary prog type checks These checks were effectively noops b/c there's only one way these functions get called: through prog_ops dispatching. And since there's no other callers, we can be sure that `prog` is always the correct type. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/0a9aaac329f76ddb17df1786b001117823ffefa5.1653855302.git.dxu@dxuuu.xyz
|
#
7c4e983c |
|
13-May-2022 |
Alexander Duyck <alexanderduyck@fb.com> |
net: allow gso_max_size to exceed 65536 The code for gso_max_size was added originally to allow for debugging and workaround of buggy devices that couldn't support TSO with blocks 64K in size. The original reason for limiting it to 64K was because that was the existing limits of IPv4 and non-jumbogram IPv6 length fields. With the addition of Big TCP we can remove this limit and allow the value to potentially go up to UINT_MAX and instead be limited by the tso_max_size value. So in order to support this we need to go through and clean up the remaining users of the gso_max_size value so that the values will cap at 64K for non-TCPv6 flows. In addition we can clean up the GSO_MAX_SIZE value so that 64K becomes GSO_LEGACY_MAX_SIZE and UINT_MAX will now be the upper limit for GSO_MAX_SIZE. v6: (edumazet) fixed a compile error if CONFIG_IPV6=n, in a new sk_trim_gso_size() helper. netif_set_tso_max_size() caps the requested TSO size with GSO_MAX_SIZE. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5cdccadc |
|
11-May-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Prepare prog_test_struct kfuncs for runtime tests In an effort to actually test the refcounting logic at runtime, add a refcount_t member to prog_test_ref_kfunc and use it in selftests to verify and test the whole logic more exhaustively. The kfunc calls for prog_test_member do not require runtime refcounting, as they are only used for verifier selftests, not during runtime execution. Hence, their implementation now has a WARN_ON_ONCE as it is not meant to be reachable code at runtime. It is strictly used in tests triggering failure cases in the verifier. bpf_kfunc_call_memb_release is called from map free path, since prog_test_member is embedded in map value for some verifier tests, so we skip WARN_ON_ONCE for it. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220511194654.765705-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
792c0a34 |
|
24-Apr-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
selftests/bpf: Add test for strict BTF type check Ensure that the edge case where first member type was matched successfully even if it didn't match BTF type of register is caught and rejected by the verifier. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-14-memxor@gmail.com
|
#
05a945de |
|
24-Apr-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
selftests/bpf: Add verifier tests for kptr Reuse bpf_prog_test functions to test the support for PTR_TO_BTF_ID in BPF map case, including some tests that verify implementation sanity and corner cases. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-13-memxor@gmail.com
|
#
425d2393 |
|
09-Apr-2022 |
Toke Høiland-Jørgensen <toke@redhat.com> |
bpf: Fix release of page_pool in BPF_PROG_RUN in test runner The live packet mode in BPF_PROG_RUN allocates a page_pool instance for each test run instance and uses it for the packet data. On setup it creates the page_pool, and calls xdp_reg_mem_model() to allow pages to be returned properly from the XDP data path. However, xdp_reg_mem_model() also raises the reference count of the page_pool itself, so the single page_pool_destroy() count on teardown was not enough to actually release the pool. To fix this, add an additional xdp_unreg_mem_model() call on teardown. Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Reported-by: Freysteinn Alfredsson <freysteinn.alfredsson@kau.se> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220409213053.3117305-1-toke@redhat.com
|
#
b6f1f780 |
|
10-Mar-2022 |
Toke Høiland-Jørgensen <toke@redhat.com> |
bpf, test_run: Fix packet size check for live packet mode The live packet mode uses some extra space at the start of each page to cache data structures so they don't have to be rebuilt at every repetition. This space wasn't correctly accounted for in the size checking of the arguments supplied to userspace. In addition, the definition of the frame size should include the size of the skb_shared_info (as there is other logic that subtracts the size of this). Together, these mistakes resulted in userspace being able to trip the XDP_WARN() in xdp_update_frame_from_buff(), which syzbot discovered in short order. Fix this by changing the frame size define and adding the extra headroom to the bpf_prog_test_run_xdp() function. Also drop the max_len parameter to the page_pool init, since this is related to DMA which is not used for the page pool instance in PROG_TEST_RUN. Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Reported-by: syzbot+0e91362d99386dc5de99@syzkaller.appspotmail.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220310225621.53374-1-toke@redhat.com
|
#
743bec1b |
|
10-Mar-2022 |
Yihao Han <hanyihao@vivo.com> |
bpf, test_run: Use kvfree() for memory allocated with kvmalloc() It is allocated with kvmalloc(), the corresponding release function should not be kfree(), use kvfree() instead. Generated by: scripts/coccinelle/api/kfree_mismatch.cocci Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Signed-off-by: Yihao Han <hanyihao@vivo.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20220310092828.13405-1-hanyihao@vivo.com
|
#
eecbfd97 |
|
09-Mar-2022 |
Toke Høiland-Jørgensen <toke@redhat.com> |
bpf: Initialise retval in bpf_prog_test_run_xdp() The kernel test robot pointed out that the newly added bpf_test_run_xdp_live() runner doesn't set the retval in the caller (by design), which means that the variable can be passed unitialised to bpf_test_finish(). Fix this by initialising the variable properly. Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220310110228.161869-1-toke@redhat.com
|
#
b530e9e1 |
|
09-Mar-2022 |
Toke Høiland-Jørgensen <toke@redhat.com> |
bpf: Add "live packet" mode for XDP in BPF_PROG_RUN This adds support for running XDP programs through BPF_PROG_RUN in a mode that enables live packet processing of the resulting frames. Previous uses of BPF_PROG_RUN for XDP returned the XDP program return code and the modified packet data to userspace, which is useful for unit testing of XDP programs. The existing BPF_PROG_RUN for XDP allows userspace to set the ingress ifindex and RXQ number as part of the context object being passed to the kernel. This patch reuses that code, but adds a new mode with different semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES flag. When running BPF_PROG_RUN in this mode, the XDP program return codes will be honoured: returning XDP_PASS will result in the frame being injected into the networking stack as if it came from the selected networking interface, while returning XDP_TX and XDP_REDIRECT will result in the frame being transmitted out that interface. XDP_TX is translated into an XDP_REDIRECT operation to the same interface, since the real XDP_TX action is only possible from within the network drivers themselves, not from the process context where BPF_PROG_RUN is executed. Internally, this new mode of operation creates a page pool instance while setting up the test run, and feeds pages from that into the XDP program. The setup cost of this is amortised over the number of repetitions specified by userspace. To support the performance testing use case, we further optimise the setup step so that all pages in the pool are pre-initialised with the packet data, and pre-computed context and xdp_frame objects stored at the start of each page. This makes it possible to entirely avoid touching the page content on each XDP program invocation, and enables sending up to 9 Mpps/core on my test box. Because the data pages are recycled by the page pool, and the test runner doesn't re-initialise them for each run, subsequent invocations of the XDP program will see the packet data in the state it was after the last time it ran on that particular page. This means that an XDP program that modifies the packet before redirecting it has to be careful about which assumptions it makes about the packet content, but that is only an issue for the most naively written programs. Enabling the new flag is only allowed when not setting ctx_out and data_out in the test specification, since using it means frames will be redirected somewhere else, so they can't be returned. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
|
#
8218ccb5 |
|
04-Mar-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
selftests/bpf: Add tests for kfunc register offset checks Include a few verifier selftests that test against the problems being fixed by previous commits, i.e. release kfunc always require PTR_TO_BTF_ID fixed and var_off to be 0, and negative offset is not permitted and returns a helpful error message. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-9-memxor@gmail.com
|
#
0b206c6d |
|
04-Mar-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Replace __diag_ignore with unified __diag_ignore_all Currently, -Wmissing-prototypes warning is ignored for GCC, but not clang. This leads to clang build warning in W=1 mode. Since the flag used by both compilers is same, we can use the unified __diag_ignore_all macro that works for all supported versions and compilers which have __diag macro support (currently GCC >= 8.0, and Clang >= 11.0). Also add nf_conntrack_bpf.h include to prevent missing prototype warning for register_nf_conntrack_bpf. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-8-memxor@gmail.com
|
#
530e214c |
|
28-Feb-2022 |
Stanislav Fomichev <sdf@google.com> |
bpf, test_run: Fix overflow in XDP frags bpf_test_finish Syzkaller reports another issue: WARNING: CPU: 0 PID: 10775 at include/linux/thread_info.h:230 check_copy_size include/linux/thread_info.h:230 [inline] WARNING: CPU: 0 PID: 10775 at include/linux/thread_info.h:230 copy_to_user include/linux/uaccess.h:199 [inline] WARNING: CPU: 0 PID: 10775 at include/linux/thread_info.h:230 bpf_test_finish.isra.0+0x4b2/0x680 net/bpf/test_run.c:171 This can happen when the userspace buffer is smaller than head + frags. Return ENOSPC in this case. Fixes: 7855e0db150a ("bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature") Reported-by: syzbot+5f81df6205ecbbc56ab5@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/bpf/20220228232332.458871-1-sdf@google.com
|
#
9a69e2b3 |
|
09-Feb-2022 |
Jakub Sitnicki <jakub@cloudflare.com> |
bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide remote_port is another case of a BPF context field documented as a 32-bit value in network byte order for which the BPF context access converter generates a load of a zero-padded 16-bit integer in network byte order. First such case was dst_port in bpf_sock which got addressed in commit 4421a582718a ("bpf: Make dst_port field in struct bpf_sock 16-bit wide"). Loading 4-bytes from the remote_port offset and converting the value with bpf_ntohl() leads to surprising results, as the expected value is shifted by 16 bits. Reduce the confusion by splitting the field in two - a 16-bit field holding a big-endian integer, and a 16-bit zero-padding anonymous field that follows it. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220209184333.654927-2-jakub@cloudflare.com
|
#
5d1e9f43 |
|
04-Feb-2022 |
Stanislav Fomichev <sdf@google.com> |
bpf: test_run: Fix overflow in bpf_test_finish frags parsing This place also uses signed min_t and passes this singed int to copy_to_user (which accepts unsigned argument). I don't think there is an issue, but let's be consistent. Fixes: 7855e0db150ad ("bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220204235849.14658-2-sdf@google.com
|
#
9d63b59d |
|
04-Feb-2022 |
Stanislav Fomichev <sdf@google.com> |
bpf: test_run: Fix overflow in xdp frags parsing When kattr->test.data_size_in > INT_MAX, signed min_t will assign negative value to data_len. This negative value then gets passed over to copy_from_user where it is converted to (big) unsigned. Use unsigned min_t to avoid this overflow. usercopy: Kernel memory overwrite attempt detected to wrapped address (offset 0, size 18446612140539162846)! ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:102! invalid opcode: 0000 [#1] SMP KASAN Modules linked in: CPU: 0 PID: 3781 Comm: syz-executor226 Not tainted 4.15.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:usercopy_abort+0xbd/0xbf mm/usercopy.c:102 RSP: 0018:ffff8801e9703a38 EFLAGS: 00010286 RAX: 000000000000006c RBX: ffffffff84fc7040 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff816560a2 RDI: ffffed003d2e0739 RBP: ffff8801e9703a90 R08: 000000000000006c R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff84fc73a0 R13: ffffffff84fc7180 R14: ffffffff84fc7040 R15: ffffffff84fc7040 FS: 00007f54e0bec300(0000) GS:ffff8801f6600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000280 CR3: 00000001e90ea000 CR4: 00000000003426f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: check_bogus_address mm/usercopy.c:155 [inline] __check_object_size mm/usercopy.c:263 [inline] __check_object_size.cold+0x8c/0xad mm/usercopy.c:253 check_object_size include/linux/thread_info.h:112 [inline] check_copy_size include/linux/thread_info.h:143 [inline] copy_from_user include/linux/uaccess.h:142 [inline] bpf_prog_test_run_xdp+0xe57/0x1240 net/bpf/test_run.c:989 bpf_prog_test_run kernel/bpf/syscall.c:3377 [inline] __sys_bpf+0xdf2/0x4a50 kernel/bpf/syscall.c:4679 SYSC_bpf kernel/bpf/syscall.c:4765 [inline] SyS_bpf+0x26/0x50 kernel/bpf/syscall.c:4763 do_syscall_64+0x21a/0x3e0 arch/x86/entry/common.c:305 entry_SYSCALL_64_after_hwframe+0x46/0xbb Fixes: 1c1949982524 ("bpf: introduce frags support to bpf_prog_test_run_xdp()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220204235849.14658-1-sdf@google.com
|
#
a6763080 |
|
02-Feb-2022 |
Lorenzo Bianconi <lorenzo@kernel.org> |
bpf: test_run: Fix OOB access in bpf_prog_test_run_xdp Fix the following kasan issue reported by syzbot: BUG: KASAN: slab-out-of-bounds in __skb_frag_set_page include/linux/skbuff.h:3242 [inline] BUG: KASAN: slab-out-of-bounds in bpf_prog_test_run_xdp+0x10ac/0x1150 net/bpf/test_run.c:972 Write of size 8 at addr ffff888048c75000 by task syz-executor.5/23405 CPU: 1 PID: 23405 Comm: syz-executor.5 Not tainted 5.16.0-syzkaller #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:459 __skb_frag_set_page include/linux/skbuff.h:3242 [inline] bpf_prog_test_run_xdp+0x10ac/0x1150 net/bpf/test_run.c:972 bpf_prog_test_run kernel/bpf/syscall.c:3356 [inline] __sys_bpf+0x1858/0x59a0 kernel/bpf/syscall.c:4658 __do_sys_bpf kernel/bpf/syscall.c:4744 [inline] __se_sys_bpf kernel/bpf/syscall.c:4742 [inline] __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4742 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f4ea30dd059 RSP: 002b:00007f4ea1a52168 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 00007f4ea31eff60 RCX: 00007f4ea30dd059 RDX: 0000000000000048 RSI: 0000000020000000 RDI: 000000000000000a RBP: 00007f4ea313708d R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc8367c5af R14: 00007f4ea1a52300 R15: 0000000000022000 </TASK> Allocated by task 23405: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:437 [inline] ____kasan_kmalloc mm/kasan/common.c:516 [inline] ____kasan_kmalloc mm/kasan/common.c:475 [inline] __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525 kmalloc include/linux/slab.h:586 [inline] kzalloc include/linux/slab.h:715 [inline] bpf_test_init.isra.0+0x9f/0x150 net/bpf/test_run.c:411 bpf_prog_test_run_xdp+0x2f8/0x1150 net/bpf/test_run.c:941 bpf_prog_test_run kernel/bpf/syscall.c:3356 [inline] __sys_bpf+0x1858/0x59a0 kernel/bpf/syscall.c:4658 __do_sys_bpf kernel/bpf/syscall.c:4744 [inline] __se_sys_bpf kernel/bpf/syscall.c:4742 [inline] __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4742 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff888048c74000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 0 bytes to the right of 4096-byte region [ffff888048c74000, ffff888048c75000) The buggy address belongs to the page: page:ffffea0001231c00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x48c70 head:ffffea0001231c00 order:3 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000010200 dead000000000100 dead000000000122 ffff888010c42140 raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated prep_new_page mm/page_alloc.c:2434 [inline] get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4165 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5389 alloc_pages+0x1aa/0x310 mm/mempolicy.c:2271 alloc_slab_page mm/slub.c:1799 [inline] allocate_slab mm/slub.c:1944 [inline] new_slab+0x28a/0x3b0 mm/slub.c:2004 ___slab_alloc+0x87c/0xe90 mm/slub.c:3018 __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3105 slab_alloc_node mm/slub.c:3196 [inline] __kmalloc_node_track_caller+0x2cb/0x360 mm/slub.c:4957 kmalloc_reserve net/core/skbuff.c:354 [inline] __alloc_skb+0xde/0x340 net/core/skbuff.c:426 alloc_skb include/linux/skbuff.h:1159 [inline] nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:745 [inline] nsim_dev_trap_report drivers/net/netdevsim/dev.c:802 [inline] nsim_dev_trap_report_work+0x29a/0xbc0 drivers/net/netdevsim/dev.c:843 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307 worker_thread+0x657/0x1110 kernel/workqueue.c:2454 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1352 [inline] free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1404 free_unref_page_prepare mm/page_alloc.c:3325 [inline] free_unref_page+0x19/0x690 mm/page_alloc.c:3404 qlink_free mm/kasan/quarantine.c:157 [inline] qlist_free_all+0x6d/0x160 mm/kasan/quarantine.c:176 kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:283 __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447 kasan_slab_alloc include/linux/kasan.h:260 [inline] slab_post_alloc_hook mm/slab.h:732 [inline] slab_alloc_node mm/slub.c:3230 [inline] slab_alloc mm/slub.c:3238 [inline] kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3243 getname_flags.part.0+0x50/0x4f0 fs/namei.c:138 getname_flags include/linux/audit.h:323 [inline] getname+0x8e/0xd0 fs/namei.c:217 do_sys_openat2+0xf5/0x4d0 fs/open.c:1208 do_sys_open fs/open.c:1230 [inline] __do_sys_openat fs/open.c:1246 [inline] __se_sys_openat fs/open.c:1241 [inline] __x64_sys_openat+0x13f/0x1f0 fs/open.c:1241 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Memory state around the buggy address: ffff888048c74f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888048c74f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ ffff888048c75080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888048c75100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Fixes: 1c19499825246 ("bpf: introduce frags support to bpf_prog_test_run_xdp()") Reported-by: syzbot+6d70ca7438345077c549@syzkaller.appspotmail.com Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/688c26f9dd6e885e58e8e834ede3f0139bb7fa95.1643835097.git.lorenzo@kernel.org
|
#
ed8bb032 |
|
21-Jan-2022 |
kernel test robot <lkp@intel.com> |
bpf: Fix flexible_array.cocci warnings Zero-length and one-element arrays are deprecated, see: Documentation/process/deprecated.rst Flexible-array members should be used instead. Generated by: scripts/coccinelle/misc/flexible_array.cocci Fixes: c1ff181ffabc ("selftests/bpf: Extend kfunc selftests") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Julia Lawall <julia.lawall@inria.fr> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/alpine.DEB.2.22.394.2201221206320.12220@hadrien
|
#
7855e0db |
|
21-Jan-2022 |
Lorenzo Bianconi <lorenzo@kernel.org> |
bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature introduce xdp_shared_info pointer in bpf_test_finish signature in order to copy back paged data from a xdp frags frame to userspace buffer Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/c803673798c786f915bcdd6c9338edaa9740d3d6.1642758637.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
1c194998 |
|
21-Jan-2022 |
Lorenzo Bianconi <lorenzo@kernel.org> |
bpf: introduce frags support to bpf_prog_test_run_xdp() Introduce the capability to allocate a xdp frags in bpf_prog_test_run_xdp routine. This is a preliminary patch to introduce the selftests for new xdp frags ebpf helpers Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/b7c0e425a9287f00f601c4fc0de54738ec6ceeea.1642758637.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
be3d72a2 |
|
21-Jan-2022 |
Lorenzo Bianconi <lorenzo@kernel.org> |
bpf: move user_size out of bpf_test_init Rely on data_size_in in bpf_test_init routine signature. This is a preliminary patch to introduce xdp frags selftest Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/6b48d38ed3d60240d7d6bb15e6fa7fabfac8dfb2.1642758637.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
46565696 |
|
14-Jan-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
selftests/bpf: Add test for race in btf_try_get_module This adds a complete test case to ensure we never take references to modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also ensures we never access btf->kfunc_set_tab in an inconsistent state. The test uses userfaultfd to artificially widen the race. When run on an unpatched kernel, it leads to the following splat: [root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym [ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b [ 55.499206] #PF: supervisor read access in kernel mode [ 55.499855] #PF: error_code(0x0000) - not-present page [ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0 [ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151 [ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014 [ 55.504777] Workqueue: events bpf_prog_free_deferred [ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0 [ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282 [ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba [ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458 [ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b [ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598 [ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400 [ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000 [ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0 [ 55.518672] PKRU: 55555554 [ 55.519022] Call Trace: [ 55.519483] <TASK> [ 55.519884] module_put.part.0+0x2a/0x180 [ 55.520642] bpf_prog_free_deferred+0x129/0x2e0 [ 55.521478] process_one_work+0x4fa/0x9e0 [ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100 [ 55.522878] ? rwlock_bug.part.0+0x60/0x60 [ 55.523551] worker_thread+0x2eb/0x700 [ 55.524176] ? __kthread_parkme+0xd8/0xf0 [ 55.524853] ? process_one_work+0x9e0/0x9e0 [ 55.525544] kthread+0x23a/0x270 [ 55.526088] ? set_kthread_struct+0x80/0x80 [ 55.526798] ret_from_fork+0x1f/0x30 [ 55.527413] </TASK> [ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod] [ 55.530846] CR2: fffffbfff802548b [ 55.531341] ---[ end trace 1af41803c054ad6d ]--- [ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0 [ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282 [ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba [ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458 [ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b [ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598 [ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400 [ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000 [ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0 [ 55.545317] PKRU: 55555554 [ 55.545671] note: kworker/0:2[83] exited with preempt_count 1 Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
c1ff181f |
|
14-Jan-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
selftests/bpf: Extend kfunc selftests Use the prog_test kfuncs to test the referenced PTR_TO_BTF_ID kfunc support, and PTR_TO_CTX, PTR_TO_MEM argument passing support. Also testing the various failure cases for invalid kfunc prototypes. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-10-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
b202d844 |
|
14-Jan-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Remove check_kfunc_call callback and old kfunc BTF ID API Completely remove the old code for check_kfunc_call to help it work with modules, and also the callback itself. The previous commit adds infrastructure to register all sets and put them in vmlinux or module BTF, and concatenates all related sets organized by the hook and the type. Once populated, these sets remain immutable for the lifetime of the struct btf. Also, since we don't need the 'owner' module anywhere when doing check_kfunc_call, drop the 'btf_modp' module parameter from find_kfunc_desc_btf. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
db5b6a46 |
|
18-Oct-2021 |
Qing Wang <wangqing@vivo.com> |
net: bpf: Switch over to memdup_user() This patch fixes the following Coccinelle warning: net/bpf/test_run.c:361:8-15: WARNING opportunity for memdup_user net/bpf/test_run.c:1055:8-15: WARNING opportunity for memdup_user Use memdup_user rather than duplicating its implementation This is a little bit restricted to reduce false positives Signed-off-by: Qing Wang <wangqing@vivo.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/1634556651-38702-1-git-send-email-wangqing@vivo.com
|
#
c48e51c8 |
|
01-Oct-2021 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: selftests: Add selftests for module kfunc support This adds selftests that tests the success and failure path for modules kfuncs (in presence of invalid kfunc calls) for both libbpf and gen_loader. It also adds a prog_test kfunc_btf_id_list so that we can add module BTF ID set from bpf_testmod. This also introduces a couple of test cases to verifier selftests for validating whether we get an error or not depending on if invalid kfunc call remains after elimination of unreachable instructions. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-10-memxor@gmail.com
|
#
2357672c |
|
01-Oct-2021 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Introduce BPF support for kernel module function calls This change adds support on the kernel side to allow for BPF programs to call kernel module functions. Userspace will prepare an array of module BTF fds that is passed in during BPF_PROG_LOAD using fd_array parameter. In the kernel, the module BTFs are placed in the auxilliary struct for bpf_prog, and loaded as needed. The verifier then uses insn->off to index into the fd_array. insn->off 0 is reserved for vmlinux BTF (for backwards compat), so userspace must use an fd_array index > 0 for module kfunc support. kfunc_btf_tab is sorted based on offset in an array, and each offset corresponds to one descriptor, with a max limit up to 256 such module BTFs. We also change existing kfunc_tab to distinguish each element based on imm, off pair as each such call will now be distinct. Another change is to check_kfunc_call callback, which now include a struct module * pointer, this is to be used in later patch such that the kfunc_id and module pointer are matched for dynamically registered BTF sets from loadable modules, so that same kfunc_id in two modules doesn't lead to check_kfunc_call succeeding. For the duration of the check_kfunc_call, the reference to struct module exists, as it returns the pointer stored in kfunc_btf_tab. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-2-memxor@gmail.com
|
#
de21d8bf |
|
28-Sep-2021 |
Lorenz Bauer <lmb@cloudflare.com> |
bpf: Do not invoke the XDP dispatcher for PROG_RUN with single repeat We have a unit test that invokes an XDP program with 1m different inputs, aka 1m BPF_PROG_RUN syscalls. We run this test concurrently with slight variations in how we generated the input. Since commit f23c4b3924d2 ("bpf: Start using the BPF dispatcher in BPF_TEST_RUN") the unit test has slowed down significantly. Digging deeper reveals that the concurrent tests are serialised in the kernel on the XDP dispatcher. This is a global resource that is protected by a mutex, on which we contend. Fix this by not calling into the XDP dispatcher if we only want to perform a single run of the BPF program. See: https://lore.kernel.org/bpf/CACAyw9_y4QumOW35qpgTbLsJ532uGq-kVW-VESJzGyiZkypnvw@mail.gmail.com/ Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210928093100.27124-1-lmb@cloudflare.com
|
#
3384c7c7 |
|
09-Sep-2021 |
Vadim Fedorenko <vfedorenko@novek.ru> |
selftests/bpf: Test new __sk_buff field hwtstamp Analogous to the gso_segs selftests introduced in commit d9ff286a0f59 ("bpf: allow BPF programs access skb_shared_info->gso_segs field"). Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210909220409.8804-3-vfedorenko@novek.ru
|
#
b238290b |
|
30-Aug-2021 |
Neil Spring <ntspring@fb.com> |
bpf: Permit ingress_ifindex in bpf_prog_test_run_xattr bpf_prog_test_run_xattr takes a struct __sk_buff, but did not permit that __skbuff to include an nonzero ingress_ifindex. This patch updates to allow ingress_ifindex, convert the __sk_buff field to sk_buff (skb_iif) and back, and tests that the value is present from on BPF program side. The test sets an unlikely distinct value for ingress_ifindex (11) from ifindex (1), which is in line with the rest of the synthetic field tests. Adding this support allows testing BPF that operates differently on incoming and outgoing skbs by discriminating on this field. Signed-off-by: Neil Spring <ntspring@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210831033356.1459316-1-ntspring@fb.com
|
#
435b08ec |
|
27-Sep-2021 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf, test, cgroup: Use sk_{alloc,free} for test cases BPF test infra has some hacks in place which kzalloc() a socket and perform minimum init via sock_net_set() and sock_init_data(). As a result, the sk's skcd->cgroup is NULL since it didn't go through proper initialization as it would have been the case from sk_alloc(). Rather than re-adding a NULL test in sock_cgroup_ptr() just for this, use sk_{alloc,free}() pair for the test socket. The latter also allows to get rid of the bpf_sk_storage_free() special case. Fixes: 8520e224f547 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode") Fixes: b7a1848e8398 ("bpf: add BPF_PROG_TEST_RUN support for flow dissector") Fixes: 2cb494a36c98 ("bpf: add tests for direct packet access from CGROUP_SKB") Reported-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com Reported-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com Tested-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com Link: https://lore.kernel.org/bpf/20210927123921.21535-2-daniel@iogearbox.net
|
#
fb7dd8bc |
|
15-Aug-2021 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Refactor BPF_PROG_RUN into a function Turn BPF_PROG_RUN into a proper always inlined function. No functional and performance changes are intended, but it makes it much easier to understand what's going on with how BPF programs are actually get executed. It's more obvious what types and callbacks are expected. Also extra () around input parameters can be dropped, as well as `__` variable prefixes intended to avoid naming collisions, which makes the code simpler to read and write. This refactoring also highlighted one extra issue. BPF_PROG_RUN is both a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning BPF_PROG_RUN into a function causes naming conflict compilation error. So rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210815070609.987780-2-andrii@kernel.org
|
#
6d4eb36d |
|
04-Aug-2021 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Fix bpf_prog_test_run_xdp logic after incorrect merge resolution During recent net into net-next merge ([0]) a piece of old logic ([1]) got reintroduced accidentally while resolving merge conflict between bpf's [2] and bpf-next's [3]. This check was removed in bpf-next tree to allow extra ctx_in parameter passed for XDP test runs. Reinstating the check breaks bpf_prog_test_run_xdp logic and causes a corresponding xdp_context_test_run selftest failure. Fix by removing the check and allow ctx_in for XDP test runs. [0] 5af84df962dd ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") [1] 947e8b595b82 ("bpf: explicitly prohibit ctx_{in, out} in non-skb BPF_PROG_TEST_RUN") [2] 5e21bb4e8125 ("bpf, test: fix NULL pointer dereference on invalid expected_attach_type") [3] 47316f4a3053 ("bpf: Support input xdp_md context in BPF_PROG_TEST_RUN") Fixes: 5af84df962dd ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
c7603cfa |
|
12-Jul-2021 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Add ambient BPF runtime context stored in current b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") fixed the problem with cgroup-local storage use in BPF by pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate possible BPF program preemptions and nested executions. While this seems to work good in practice, it introduces new and unnecessary failure mode in which not all BPF programs might be executed if we fail to find an unused slot for cgroup storage, however unlikely it is. It might also not be so unlikely when/if we allow sleepable cgroup BPF programs in the future. Further, the way that cgroup storage is implemented as ambiently-available property during entire BPF program execution is a convenient way to pass extra information to BPF program and helpers without requiring user code to pass around extra arguments explicitly. So it would be good to have a generic solution that can allow implementing this without arbitrary restrictions. Ideally, such solution would work for both preemptable and sleepable BPF programs in exactly the same way. This patch introduces such solution, bpf_run_ctx. It adds one pointer field (bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of macros in such a way that it always stays valid throughout BPF program execution. BPF program preemption is handled by remembering previous current->bpf_ctx value locally while executing nested BPF program and restoring old value after nested BPF program finishes. This is handled by two helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are supposed to be used before and after BPF program runs, respectively. Restoring old value of the pointer handles preemption, while bpf_run_ctx pointer being a property of current task_struct naturally solves this problem for sleepable BPF programs by "following" BPF program execution as it is scheduled in and out of CPU. It would even allow CPU migration of BPF programs, even though it's not currently allowed by BPF infra. This patch cleans up cgroup local storage handling as a first application. The design itself is generic, though, with bpf_run_ctx being an empty struct that is supposed to be embedded into a specific struct for a given BPF program type (bpf_cg_run_ctx in this case). Follow up patches are planned that will expand this mechanism for other uses within tracing BPF programs. To verify that this change doesn't revert the fix to the original cgroup storage issue, I ran the same repro as in the original report ([0]) and didn't get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work). [0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/ Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org
|
#
ec94670f |
|
07-Jul-2021 |
Zvi Effron <zeffron@riotgames.com> |
bpf: Support specifying ingress via xdp_md context in BPF_PROG_TEST_RUN Support specifying the ingress_ifindex and rx_queue_index of xdp_md contexts for BPF_PROG_TEST_RUN. The intended use case is to allow testing XDP programs that make decisions based on the ingress interface or RX queue. If ingress_ifindex is specified, look up the device by the provided index in the current namespace and use its xdp_rxq for the xdp_buff. If the rx_queue_index is out of range, or is non-zero when the ingress_ifindex is 0, return -EINVAL. Co-developed-by: Cody Haas <chaas@riotgames.com> Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Cody Haas <chaas@riotgames.com> Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Zvi Effron <zeffron@riotgames.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210707221657.3985075-4-zeffron@riotgames.com
|
#
47316f4a |
|
07-Jul-2021 |
Zvi Effron <zeffron@riotgames.com> |
bpf: Support input xdp_md context in BPF_PROG_TEST_RUN Support passing a xdp_md via ctx_in/ctx_out in bpf_attr for BPF_PROG_TEST_RUN. The intended use case is to pass some XDP meta data to the test runs of XDP programs that are used as tail calls. For programs that use bpf_prog_test_run_xdp, support xdp_md input and output. Unlike with an actual xdp_md during a non-test run, data_meta must be 0 because it must point to the start of the provided user data. From the initial xdp_md, use data and data_end to adjust the pointers in the generated xdp_buff. All other non-zero fields are prohibited (with EINVAL). If the user has set ctx_out/ctx_size_out, copy the (potentially different) xdp_md back to the userspace. We require all fields of input xdp_md except the ones we explicitly support to be set to zero. The expectation is that in the future we might add support for more fields and we want to fail explicitly if the user runs the program on the kernel where we don't yet support them. Co-developed-by: Cody Haas <chaas@riotgames.com> Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Cody Haas <chaas@riotgames.com> Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Zvi Effron <zeffron@riotgames.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210707221657.3985075-3-zeffron@riotgames.com
|
#
87b7b533 |
|
09-Aug-2021 |
Yonghong Song <yhs@fb.com> |
bpf: Add missing bpf_read_[un]lock_trace() for syscall program Commit 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.") added support for syscall program, which is a sleepable program. But the program run missed bpf_read_lock_trace()/bpf_read_unlock_trace(), which is needed to ensure proper rcu callback invocations. This patch adds bpf_read_[un]lock_trace() properly. Fixes: 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210809235151.1663680-1-yhs@fb.com
|
#
5e21bb4e |
|
08-Jul-2021 |
Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
bpf, test: fix NULL pointer dereference on invalid expected_attach_type These two types of XDP progs (BPF_XDP_DEVMAP, BPF_XDP_CPUMAP) will not be executed directly in the driver, therefore we should also not directly run them from here. To run in these two situations, there must be further preparations done, otherwise these may cause a kernel panic. For more details, see also dev_xdp_attach(). [ 46.982479] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 46.984295] #PF: supervisor read access in kernel mode [ 46.985777] #PF: error_code(0x0000) - not-present page [ 46.987227] PGD 800000010dca4067 P4D 800000010dca4067 PUD 10dca6067 PMD 0 [ 46.989201] Oops: 0000 [#1] SMP PTI [ 46.990304] CPU: 7 PID: 562 Comm: a.out Not tainted 5.13.0+ #44 [ 46.992001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/24 [ 46.995113] RIP: 0010:___bpf_prog_run+0x17b/0x1710 [ 46.996586] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02 [ 47.001562] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246 [ 47.003115] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000 [ 47.005163] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98 [ 47.007135] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff [ 47.009171] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98 [ 47.011172] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8 [ 47.013244] FS: 00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000 [ 47.015705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 47.017475] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0 [ 47.019558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 47.021595] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 47.023574] PKRU: 55555554 [ 47.024571] Call Trace: [ 47.025424] __bpf_prog_run32+0x32/0x50 [ 47.026296] ? printk+0x53/0x6a [ 47.027066] ? ktime_get+0x39/0x90 [ 47.027895] bpf_test_run.cold.28+0x23/0x123 [ 47.028866] ? printk+0x53/0x6a [ 47.029630] bpf_prog_test_run_xdp+0x149/0x1d0 [ 47.030649] __sys_bpf+0x1305/0x23d0 [ 47.031482] __x64_sys_bpf+0x17/0x20 [ 47.032316] do_syscall_64+0x3a/0x80 [ 47.033165] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 47.034254] RIP: 0033:0x7f04a51364dd [ 47.035133] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 48 [ 47.038768] RSP: 002b:00007fff8f9fc518 EFLAGS: 00000213 ORIG_RAX: 0000000000000141 [ 47.040344] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f04a51364dd [ 47.041749] RDX: 0000000000000048 RSI: 0000000020002a80 RDI: 000000000000000a [ 47.043171] RBP: 00007fff8f9fc530 R08: 0000000002049300 R09: 0000000020000100 [ 47.044626] R10: 0000000000000004 R11: 0000000000000213 R12: 0000000000401070 [ 47.046088] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 47.047579] Modules linked in: [ 47.048318] CR2: 0000000000000000 [ 47.049120] ---[ end trace 7ad34443d5be719a ]--- [ 47.050273] RIP: 0010:___bpf_prog_run+0x17b/0x1710 [ 47.051343] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02 [ 47.054943] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246 [ 47.056068] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000 [ 47.057522] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98 [ 47.058961] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff [ 47.060390] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98 [ 47.061803] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8 [ 47.063249] FS: 00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000 [ 47.065070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 47.066307] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0 [ 47.067747] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 47.069217] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 47.070652] PKRU: 55555554 [ 47.071318] Kernel panic - not syncing: Fatal exception [ 47.072854] Kernel Offset: disabled [ 47.073683] ---[ end Kernel panic - not syncing: Fatal exception ]--- Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap") Fixes: fbee97feed9b ("bpf: Add support to attach bpf program to a devmap entry") Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: David Ahern <dsahern@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210708080409.73525-1-xuanzhuo@linux.alibaba.com
|
#
af2ac3e1 |
|
13-May-2021 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Prepare bpf syscall to be used from kernel and user space. With the help from bpfptr_t prepare relevant bpf syscall commands to be used from kernel and user space. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-4-alexei.starovoitov@gmail.com
|
#
79a7f8bd |
|
13-May-2021 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Introduce bpf_sys_bpf() helper and program type. Add placeholders for bpf_sys_bpf() helper and new program type. Make sure to check that expected_attach_type is zero for future extensibility. Allow tracing helper functions to be used in this program type, since they will only execute from user context via bpf_prog_test_run. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-2-alexei.starovoitov@gmail.com
|
#
7bd1590d |
|
24-Mar-2021 |
Martin KaFai Lau <kafai@fb.com> |
bpf: selftests: Add kfunc_call test This patch adds a few kernel function bpf_kfunc_call_test*() for the selftest's test_run purpose. They will be allowed for tc_cls prog. The selftest calling the kernel function bpf_kfunc_call_test*() is also added in this patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210325015252.1551395-1-kafai@fb.com
|
#
b910eaaa |
|
22-Mar-2021 |
Yonghong Song <yhs@fb.com> |
bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper Jiri Olsa reported a bug ([1]) in kernel where cgroup local storage pointer may be NULL in bpf_get_local_storage() helper. There are two issues uncovered by this bug: (1). kprobe or tracepoint prog incorrectly sets cgroup local storage before prog run, (2). due to change from preempt_disable to migrate_disable, preemption is possible and percpu storage might be overwritten by other tasks. This issue (1) is fixed in [2]. This patch tried to address issue (2). The following shows how things can go wrong: task 1: bpf_cgroup_storage_set() for percpu local storage preemption happens task 2: bpf_cgroup_storage_set() for percpu local storage preemption happens task 1: run bpf program task 1 will effectively use the percpu local storage setting by task 2 which will be either NULL or incorrect ones. Instead of just one common local storage per cpu, this patch fixed the issue by permitting 8 local storages per cpu and each local storage is identified by a task_struct pointer. This way, we allow at most 8 nested preemption between bpf_cgroup_storage_set() and bpf_cgroup_storage_unset(). The percpu local storage slot is released (calling bpf_cgroup_storage_unset()) by the same task after bpf program finished running. bpf_test_run() is also fixed to use the new bpf_cgroup_storage_set() interface. The patch is tested on top of [2] with reproducer in [1]. Without this patch, kernel will emit error in 2-3 minutes. With this patch, after one hour, still no error. [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T [2] https://lore.kernel.org/bpf/20210309185028.3763817-1-yhs@fb.com Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Link: https://lore.kernel.org/bpf/20210323055146.3334476-1-yhs@fb.com
|
#
7c32e8f8 |
|
03-Mar-2021 |
Lorenz Bauer <lmb@cloudflare.com> |
bpf: Add PROG_TEST_RUN support for sk_lookup programs Allow to pass sk_lookup programs to PROG_TEST_RUN. User space provides the full bpf_sk_lookup struct as context. Since the context includes a socket pointer that can't be exposed to user space we define that PROG_TEST_RUN returns the cookie of the selected socket or zero in place of the socket pointer. We don't support testing programs that select a reuseport socket, since this would mean running another (unrelated) BPF program from the sk_lookup test handler. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210303101816.36774-3-lmb@cloudflare.com
|
#
607b9cc9 |
|
03-Mar-2021 |
Lorenz Bauer <lmb@cloudflare.com> |
bpf: Consolidate shared test timing code Share the timing / signal interruption logic between different implementations of PROG_TEST_RUN. There is a change in behaviour as well. We check the loop exit condition before checking for pending signals. This resolves an edge case where a signal arrives during the last iteration. Instead of aborting with EINTR we return the successful result to user space. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210303101816.36774-2-lmb@cloudflare.com
|
#
be9df4af |
|
22-Dec-2020 |
Lorenzo Bianconi <lorenzo@kernel.org> |
net, xdp: Introduce xdp_prepare_buff utility routine Introduce xdp_prepare_buff utility routine to initialize per-descriptor xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in all XDP capable drivers. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Camelia Groza <camelia.groza@nxp.com> Acked-by: Marcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
43b5169d |
|
22-Dec-2020 |
Lorenzo Bianconi <lorenzo@kernel.org> |
net, xdp: Introduce xdp_init_buff utility routine Introduce xdp_init_buff utility routine to initialize xdp_buff fields const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on xdp_init_buff in all XDP capable drivers. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Camelia Groza <camelia.groza@nxp.com> Acked-by: Marcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/bpf/7f8329b6da1434dc2b05a77f2e800b29628a8913.1608670965.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
7ac6ad05 |
|
12-Jan-2021 |
Song Liu <songliubraving@fb.com> |
bpf: Reject too big ctx_size_in for raw_tp test run syzbot reported a WARNING for allocating too big memory: WARNING: CPU: 1 PID: 8484 at mm/page_alloc.c:4976 __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5011 Modules linked in: CPU: 1 PID: 8484 Comm: syz-executor862 Not tainted 5.11.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:4976 Code: 00 00 0c 00 0f 85 a7 00 00 00 8b 3c 24 4c 89 f2 44 89 e6 c6 44 24 70 00 48 89 6c 24 58 e8 d0 d7 ff ff 49 89 c5 e9 ea fc ff ff <0f> 0b e9 b5 fd ff ff 89 74 24 14 4c 89 4c 24 08 4c 89 74 24 18 e8 RSP: 0018:ffffc900012efb10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 1ffff9200025df66 RCX: 0000000000000000 RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000140dc0 RBP: 0000000000140dc0 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff81b1f7e1 R11: 0000000000000000 R12: 0000000000000014 R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000 FS: 000000000190c880(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f08b7f316c0 CR3: 0000000012073000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267 alloc_pages include/linux/gfp.h:547 [inline] kmalloc_order+0x2e/0xb0 mm/slab_common.c:837 kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853 kmalloc include/linux/slab.h:557 [inline] kzalloc include/linux/slab.h:682 [inline] bpf_prog_test_run_raw_tp+0x4b5/0x670 net/bpf/test_run.c:282 bpf_prog_test_run kernel/bpf/syscall.c:3120 [inline] __do_sys_bpf+0x1ea9/0x4f10 kernel/bpf/syscall.c:4398 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x440499 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffe1f3bfb18 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440499 RDX: 0000000000000048 RSI: 0000000020000600 RDI: 000000000000000a RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401ca0 R13: 0000000000401d30 R14: 0000000000000000 R15: 0000000000000000 This is because we didn't filter out too big ctx_size_in. Fix it by rejecting ctx_size_in that are bigger than MAX_BPF_FUNC_ARGS (12) u64 numbers. Fixes: 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint") Reported-by: syzbot+4f98876664c7337a4ae6@syzkaller.appspotmail.com Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210112234254.1906829-1-songliubraving@fb.com
|
#
963ec27a |
|
29-Sep-2020 |
Song Liu <songliubraving@fb.com> |
bpf: fix raw_tp test run in preempt kernel In preempt kernel, BPF_PROG_TEST_RUN on raw_tp triggers: [ 35.874974] BUG: using smp_processor_id() in preemptible [00000000] code: new_name/87 [ 35.893983] caller is bpf_prog_test_run_raw_tp+0xd4/0x1b0 [ 35.900124] CPU: 1 PID: 87 Comm: new_name Not tainted 5.9.0-rc6-g615bd02bf #1 [ 35.907358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 35.916941] Call Trace: [ 35.919660] dump_stack+0x77/0x9b [ 35.923273] check_preemption_disabled+0xb4/0xc0 [ 35.928376] bpf_prog_test_run_raw_tp+0xd4/0x1b0 [ 35.933872] ? selinux_bpf+0xd/0x70 [ 35.937532] __do_sys_bpf+0x6bb/0x21e0 [ 35.941570] ? find_held_lock+0x2d/0x90 [ 35.945687] ? vfs_write+0x150/0x220 [ 35.949586] do_syscall_64+0x2d/0x40 [ 35.953443] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix this by calling migrate_disable() before smp_processor_id(). Fixes: 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
1b4d60ec |
|
25-Sep-2020 |
Song Liu <songliubraving@fb.com> |
bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint Add .test_run for raw_tracepoint. Also, introduce a new feature that runs the target program on a specific CPU. This is achieved by a new flag in bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for BPF programs that handle perf_event and other percpu resources, as the program can access these resource locally. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com
|
#
df561f66 |
|
23-Aug-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
#
21594c44 |
|
02-Aug-2020 |
Dmitry Yakunin <zeil@yandex-team.ru> |
bpf: Allow to specify ifindex for skb in bpf_prog_test_run_skb Now skb->dev is unconditionally set to the loopback device in current net namespace. But if we want to test bpf program which contains code branch based on ifindex condition (eg filters out localhost packets) it is useful to allow specifying of ifindex from userspace. This patch adds such option through ctx_in (__sk_buff) parameter. Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200803090545.82046-3-zeil@yandex-team.ru
|
#
fa5cb548 |
|
02-Aug-2020 |
Dmitry Yakunin <zeil@yandex-team.ru> |
bpf: Setup socket family and addresses in bpf_prog_test_run_skb Now it's impossible to test all branches of cgroup_skb bpf program which accesses skb->family and skb->{local,remote}_ip{4,6} fields because they are zeroed during socket allocation. This commit fills socket family and addresses from related fields in constructed skb. Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200803090545.82046-2-zeil@yandex-team.ru
|
#
d923021c |
|
30-Jun-2020 |
Yonghong Song <yhs@fb.com> |
bpf: Add tests for PTR_TO_BTF_ID vs. null comparison Add two tests for PTR_TO_BTF_ID vs. null ptr comparison, one for PTR_TO_BTF_ID in the ctx structure and the other for PTR_TO_BTF_ID after one level pointer chasing. In both cases, the test ensures condition is not removed. For example, for this test struct bpf_fentry_test_t { struct bpf_fentry_test_t *a; }; int BPF_PROG(test7, struct bpf_fentry_test_t *arg) { if (arg == 0) test7_result = 1; return 0; } Before the previous verifier change, we have xlated codes: int test7(long long unsigned int * ctx): ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg) 0: (79) r1 = *(u64 *)(r1 +0) ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg) 1: (b4) w0 = 0 2: (95) exit After the previous verifier change, we have: int test7(long long unsigned int * ctx): ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg) 0: (79) r1 = *(u64 *)(r1 +0) ; if (arg == 0) 1: (55) if r1 != 0x0 goto pc+4 ; test7_result = 1; 2: (18) r1 = map[id:6][0]+48 4: (b7) r2 = 1 5: (7b) *(u64 *)(r1 +0) = r2 ; int BPF_PROG(test7, struct bpf_fentry_test_t *arg) 6: (b4) w0 = 0 7: (95) exit Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200630171241.2523875-1-yhs@fb.com
|
#
d800bad6 |
|
18-May-2020 |
Jesper Dangaard Brouer <brouer@redhat.com> |
bpf: Fix too large copy from user in bpf_test_init Commit bc56c919fce7 ("bpf: Add xdp.frame_sz in bpf_prog_test_run_xdp().") recently changed bpf_prog_test_run_xdp() to use larger frames for XDP in order to test tail growing frames (via bpf_xdp_adjust_tail) and to have memory backing frame better resemble drivers. The commit contains a bug, as it tries to copy the max data size from userspace, instead of the size provided by userspace. This cause XDP unit tests to fail sporadically with EFAULT, an unfortunate behavior. The fix is to only copy the size specified by userspace. Fixes: bc56c919fce7 ("bpf: Add xdp.frame_sz in bpf_prog_test_run_xdp().") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/158980712729.256597.6115007718472928659.stgit@firesoul
|
#
bc56c919 |
|
13-May-2020 |
Jesper Dangaard Brouer <brouer@redhat.com> |
bpf: Add xdp.frame_sz in bpf_prog_test_run_xdp(). Update the memory requirements, when adding xdp.frame_sz in BPF test_run function bpf_prog_test_run_xdp() which e.g. is used by XDP selftests. Specifically add the expected reserved tailroom, but also allocated a larger memory area to reflect that XDP frames usually comes in this format. Limit the provided packet data size to 4096 minus headroom + tailroom, as this also reflect a common 3520 bytes MTU limit with XDP. Note that bpf_test_init already use a memory allocation method that clears memory. Thus, this already guards against leaking uninit kernel memory. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158945349549.97035.15316291762482444006.stgit@firesoul
|
#
e9ff9d52 |
|
27-Mar-2020 |
Jean-Philippe Menil <jpmenil@gmail.com> |
bpf: Fix build warning regarding missing prototypes Fix build warnings when building net/bpf/test_run.o with W=1 due to missing prototype for bpf_fentry_test{1..6}. Instead of declaring prototypes, turn off warnings with __diag_{push,ignore,pop} as pointed out by Alexei. Signed-off-by: Jean-Philippe Menil <jpmenil@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200327204713.28050-1-jpmenil@gmail.com
|
#
3d08b6f2 |
|
04-Mar-2020 |
KP Singh <kpsingh@google.com> |
bpf: Add selftests for BPF_MODIFY_RETURN Test for two scenarios: * When the fmod_ret program returns 0, the original function should be called along with fentry and fexit programs. * When the fmod_ret program returns a non-zero value, the original function should not be called, no side effect should be observed and fentry and fexit programs should be called. The result from the kernel function call and whether a side-effect is observed is returned via the retval attr of the BPF_PROG_TEST_RUN (bpf) syscall. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200304191853.1529-8-kpsingh@chromium.org
|
#
da00d2f1 |
|
04-Mar-2020 |
KP Singh <kpsingh@google.com> |
bpf: Add test ops for BPF_PROG_TYPE_TRACING The current fexit and fentry tests rely on a different program to exercise the functions they attach to. Instead of doing this, implement the test operations for tracing which will also be used for BPF_MODIFY_RETURN in a subsequent patch. Also, clean up the fexit test to use the generated skeleton. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200304191853.1529-7-kpsingh@chromium.org
|
#
cf62089b |
|
03-Mar-2020 |
Willem de Bruijn <willemb@google.com> |
bpf: Add gso_size to __sk_buff BPF programs may want to know whether an skb is gso. The canonical answer is skb_is_gso(skb), which tests that gso_size != 0. Expose this field in the same manner as gso_segs. That field itself is not a sufficient signal, as the comment in skb_shared_info makes clear: gso_segs may be zero, e.g., from dodgy sources. Also prepare net/bpf/test_run for upcoming BPF_PROG_TEST_RUN tests of the feature. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200303200503.226217-2-willemdebruijn.kernel@gmail.com
|
#
6eac7795 |
|
24-Feb-2020 |
David Miller <davem@davemloft.net> |
bpf/tests: Use migrate disable instead of preempt disable Replace the preemption disable/enable with migrate_disable/enable() to reflect the actual requirement and to allow PREEMPT_RT to substitute it with an actual migration disable mechanism which does not disable preemption. [ tglx: Switched it over to migrate disable ] Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145643.785306549@linutronix.de
|
#
6de6c1f8 |
|
18-Dec-2019 |
Nikita V. Shirokov <tehnerd@tehnerd.com> |
bpf: Allow to change skb mark in test_run allow to pass skb's mark field into bpf_prog_test_run ctx for BPF_PROG_TYPE_SCHED_CLS prog type. that would allow to test bpf programs which are doing decision based on this field Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
850a88cc |
|
13-Dec-2019 |
Stanislav Fomichev <sdf@google.com> |
bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN wire_len should not be less than real len and is capped by GSO_MAX_SIZE. gso_segs is capped by GSO_MAX_SEGS. v2: * set wire_len to skb->len when passed wire_len is 0 (Alexei Starovoitov) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191213223028.161282-1-sdf@google.com
|
#
f23c4b39 |
|
13-Dec-2019 |
Björn Töpel <bjorn@kernel.org> |
bpf: Start using the BPF dispatcher in BPF_TEST_RUN In order to properly exercise the BPF dispatcher, this commit adds BPF dispatcher usage to BPF_TEST_RUN when executing XDP programs. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-5-bjorn.topel@gmail.com
|
#
b590cb5f |
|
10-Dec-2019 |
Stanislav Fomichev <sdf@google.com> |
bpf: Switch to offsetofend in BPF_PROG_TEST_RUN Switch existing pattern of "offsetof(..., member) + FIELD_SIZEOF(..., member)' to "offsetofend(..., member)" which does exactly what we need without all the copy-paste. Suggested-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191210191933.105321-1-sdf@google.com
|
#
c593642c |
|
09-Dec-2019 |
Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> |
treewide: Use sizeof_field() macro Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
|
#
a25ecd9d |
|
18-Nov-2019 |
Colin Ian King <colin.king@canonical.com> |
bpf: Fix memory leak on object 'data' The error return path on when bpf_fentry_test* tests fail does not kfree 'data'. Fix this by adding the missing kfree. Addresses-Coverity: ("Resource leak") Fixes: faeb2dce084a ("bpf: Add kernel test functions for fentry testing") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191118114059.37287-1-colin.king@canonical.com
|
#
faeb2dce |
|
14-Nov-2019 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Add kernel test functions for fentry testing Add few kernel functions with various number of arguments, their types and sizes for BPF trampoline testing to cover different calling conventions. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-9-ast@kernel.org
|
#
ba940948 |
|
15-Oct-2019 |
Stanislav Fomichev <sdf@google.com> |
bpf: Allow __sk_buff tstamp in BPF_PROG_TEST_RUN It's useful for implementing EDT related tests (set tstamp, run the test, see how the tstamp is changed or observe some other parameter). Note that bpf_ktime_get_ns() helper is using monotonic clock, so for the BPF programs that compare tstamp against it, tstamp should be derived from clock_gettime(CLOCK_MONOTONIC, ...). Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191015183125.124413-1-sdf@google.com
|
#
b2ca4e1c |
|
25-Jul-2019 |
Stanislav Fomichev <sdf@google.com> |
bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN This will allow us to write tests for those flags. v2: * Swap kfree(data) and kfree(user_ctx) (Song Liu) Acked-by: Petar Penkov <ppenkov@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Petar Penkov <ppenkov@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
086f9568 |
|
25-Jul-2019 |
Stanislav Fomichev <sdf@google.com> |
bpf/flow_dissector: pass input flags to BPF flow dissector program C flow dissector supports input flags that tell it to customize parsing by either stopping early or trying to parse as deep as possible. Pass those flags to the BPF flow dissector so it can make the same decisions. In the next commits I'll add support for those flags to our reference bpf_flow.c v3: * Export copy of flow dissector flags instead of moving (Alexei Starovoitov) Acked-by: Petar Penkov <ppenkov@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Petar Penkov <ppenkov@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
25763b3c |
|
28-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of version 2 of the gnu general public license as published by the free software foundation extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 107 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190528171438.615055994@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
6ac99e8f |
|
26-Apr-2019 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Introduce bpf sk local storage After allowing a bpf prog to - directly read the skb->sk ptr - get the fullsock bpf_sock by "bpf_sk_fullsock()" - get the bpf_tcp_sock by "bpf_tcp_sock()" - get the listener sock by "bpf_get_listener_sock()" - avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock" into different bpf running context. this patch is another effort to make bpf's network programming more intuitive to do (together with memory and performance benefit). When bpf prog needs to store data for a sk, the current practice is to define a map with the usual 4-tuples (src/dst ip/port) as the key. If multiple bpf progs require to store different sk data, multiple maps have to be defined. Hence, wasting memory to store the duplicated keys (i.e. 4 tuples here) in each of the bpf map. [ The smallest key could be the sk pointer itself which requires some enhancement in the verifier and it is a separate topic. ] Also, the bpf prog needs to clean up the elem when sk is freed. Otherwise, the bpf map will become full and un-usable quickly. The sk-free tracking currently could be done during sk state transition (e.g. BPF_SOCK_OPS_STATE_CB). The size of the map needs to be predefined which then usually ended-up with an over-provisioned map in production. Even the map was re-sizable, while the sk naturally come and go away already, this potential re-size operation is arguably redundant if the data can be directly connected to the sk itself instead of proxy-ing through a bpf map. This patch introduces sk->sk_bpf_storage to provide local storage space at sk for bpf prog to use. The space will be allocated when the first bpf prog has created data for this particular sk. The design optimizes the bpf prog's lookup (and then optionally followed by an inline update). bpf_spin_lock should be used if the inline update needs to be protected. BPF_MAP_TYPE_SK_STORAGE: ----------------------- To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in this patch) needs to be created. Multiple BPF_MAP_TYPE_SK_STORAGE maps can be created to fit different bpf progs' needs. The map enforces BTF to allow printing the sk-local-storage during a system-wise sk dump (e.g. "ss -ta") in the future. The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete a "sk-local-storage" data from a particular sk. Think of the map as a meta-data (or "type") of a "sk-local-storage". This particular "type" of "sk-local-storage" data can then be stored in any sk. The main purposes of this map are mostly: 1. Define the size of a "sk-local-storage" type. 2. Provide a similar syscall userspace API as the map (e.g. lookup/update, map-id, map-btf...etc.) 3. Keep track of all sk's storages of this "type" and clean them up when the map is freed. sk->sk_bpf_storage: ------------------ The main lookup/update/delete is done on sk->sk_bpf_storage (which is a "struct bpf_sk_storage"). When doing a lookup, the "map" pointer is now used as the "key" to search on the sk_storage->list. The "map" pointer is actually serving as the "type" of the "sk-local-storage" that is being requested. To allow very fast lookup, it should be as fast as looking up an array at a stable-offset. At the same time, it is not ideal to set a hard limit on the number of sk-local-storage "type" that the system can have. Hence, this patch takes a cache approach. The last search result from sk_storage->list is cached in sk_storage->cache[] which is a stable sized array. Each "sk-local-storage" type has a stable offset to the cache[] array. In the future, a map's flag could be introduced to do cache opt-out/enforcement if it became necessary. The cache size is 16 (i.e. 16 types of "sk-local-storage"). Programs can share map. On the program side, having a few bpf_progs running in the networking hotpath is already a lot. The bpf_prog should have already consolidated the existing sock-key-ed map usage to minimize the map lookup penalty. 16 has enough runway to grow. All sk-local-storage data will be removed from sk->sk_bpf_storage during sk destruction. bpf_sk_storage_get() and bpf_sk_storage_delete(): ------------------------------------------------ Instead of using bpf_map_(lookup|update|delete)_elem(), the bpf prog needs to use the new helper bpf_sk_storage_get() and bpf_sk_storage_delete(). The verifier can then enforce the ARG_PTR_TO_SOCKET argument. The bpf_sk_storage_get() also allows to "create" new elem if one does not exist in the sk. It is done by the new BPF_SK_STORAGE_GET_F_CREATE flag. An optional value can also be provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE. The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock. Together, it has eliminated the potential use cases for an equivalent bpf_map_update_elem() API (for bpf_prog) in this patch. Misc notes: ---------- 1. map_get_next_key is not supported. From the userspace syscall perspective, the map has the socket fd as the key while the map can be shared by pinned-file or map-id. Since btf is enforced, the existing "ss" could be enhanced to pretty print the local-storage. Supporting a kernel defined btf with 4 tuples as the return key could be explored later also. 2. The sk->sk_lock cannot be acquired. Atomic operations is used instead. e.g. cmpxchg is done on the sk->sk_bpf_storage ptr. Please refer to the source code comments for the details in synchronization cases and considerations. 3. The mem is charged to the sk->sk_omem_alloc as the sk filter does. Benchmark: --------- Here is the benchmark data collected by turning on the "kernel.bpf_stats_enabled" sysctl. Two bpf progs are tested: One bpf prog with the usual bpf hashmap (max_entries = 8192) with the sk ptr as the key. (verifier is modified to support sk ptr as the key That should have shortened the key lookup time.) Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE. Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for each egress skb and then bump the cnt. netperf is used to drive data with 4096 connected UDP sockets. BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run) 27: cgroup_skb name egress_sk_map tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633 loaded_at 2019-04-15T13:46:39-0700 uid 0 xlated 344B jited 258B memlock 4096B map_ids 16 btf_id 5 BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run) 30: cgroup_skb name egress_sk_stora tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739 loaded_at 2019-04-15T13:47:54-0700 uid 0 xlated 168B jited 156B memlock 4096B map_ids 17 btf_id 6 Here is a high-level picture on how are the objects organized: sk ┌──────┐ │ │ │ │ │ │ │*sk_bpf_storage─────▶ bpf_sk_storage └──────┘ ┌───────┐ ┌───────────┤ list │ │ │ │ │ │ │ │ │ │ │ └───────┘ │ │ elem │ ┌────────┐ ├─▶│ snode │ │ ├────────┤ │ │ data │ bpf_map │ ├────────┤ ┌─────────┐ │ │map_node│◀─┬─────┤ list │ │ └────────┘ │ │ │ │ │ │ │ │ elem │ │ │ │ ┌────────┐ │ └─────────┘ └─▶│ snode │ │ ├────────┤ │ bpf_map │ data │ │ ┌─────────┐ ├────────┤ │ │ list ├───────▶│map_node│ │ │ │ └────────┘ │ │ │ │ │ │ elem │ └─────────┘ ┌────────┐ │ ┌─▶│ snode │ │ │ ├────────┤ │ │ │ data │ │ │ ├────────┤ │ │ │map_node│◀─┘ │ └────────┘ │ │ │ ┌───────┐ sk └──────────│ list │ ┌──────┐ │ │ │ │ │ │ │ │ │ │ │ │ └───────┘ │*sk_bpf_storage───────▶bpf_sk_storage └──────┘ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
e950e843 |
|
26-Apr-2019 |
Matt Mullins <mmullins@fb.com> |
selftests: bpf: test writable buffers in raw tps This tests that: * a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it uses either: * a variable offset to the tracepoint buffer, or * an offset beyond the size of the tracepoint buffer * a tracer can modify the buffer provided when attached to a writable tracepoint in bpf_prog_test_run Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
02ee0658 |
|
22-Apr-2019 |
Stanislav Fomichev <sdf@google.com> |
bpf/flow_dissector: don't adjust nhoff by ETH_HLEN in BPF_PROG_TEST_RUN Now that we use skb-less flow dissector let's return true nhoff and thoff. We used to adjust them by ETH_HLEN because that's how it was done in the skb case. For VLAN tests that looks confusing: nhoff is pointing to vlan parts :-\ Warning, this is an API change for BPF_PROG_TEST_RUN! Feel free to drop if you think that it's too late at this point to fix it. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
7b8a1304 |
|
22-Apr-2019 |
Stanislav Fomichev <sdf@google.com> |
bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode Now that we have bpf_flow_dissect which can work on raw data, use it when doing BPF_PROG_TEST_RUN for flow dissector. Simplifies bpf_prog_test_run_flow_dissector and allows us to test no-skb mode. Note, that previously, with bpf_flow_dissect_skb we used to call eth_type_trans which pulled L2 (ETH_HLEN) header and we explicitly called skb_reset_network_header. That means flow_keys->nhoff would be initialized to 0 (skb_network_offset) in init_flow_keys. Now we call bpf_flow_dissect with nhoff set to ETH_HLEN and need to undo it once the dissection is done to preserve the existing behavior. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
089b19a9 |
|
22-Apr-2019 |
Stanislav Fomichev <sdf@google.com> |
flow_dissector: switch kernel context to struct bpf_flow_dissector struct bpf_flow_dissector has a small subset of sk_buff fields that flow dissector BPF program is allowed to access and an optional pointer to real skb. Real skb is used only in bpf_skb_load_bytes helper to read non-linear data. The real motivation for this is to be able to call flow dissector from eth_get_headlen context where we don't have an skb and need to dissect raw bytes. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
947e8b59 |
|
11-Apr-2019 |
Stanislav Fomichev <sdf@google.com> |
bpf: explicitly prohibit ctx_{in, out} in non-skb BPF_PROG_TEST_RUN This should allow us later to extend BPF_PROG_TEST_RUN for non-skb case and be sure that nobody is erroneously setting ctx_{in,out}. Fixes: b0b9395d865e ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
b0b9395d |
|
09-Apr-2019 |
Stanislav Fomichev <sdf@google.com> |
bpf: support input __sk_buff context in BPF_PROG_TEST_RUN Add new set of arguments to bpf_attr for BPF_PROG_TEST_RUN: * ctx_in/ctx_size_in - input context * ctx_out/ctx_size_out - output context The intended use case is to pass some meta data to the test runs that operate on skb (this has being brought up on recent LPC). For programs that use bpf_prog_test_run_skb, support __sk_buff input and output. Initially, from input __sk_buff, copy _only_ cb and priority into skb, all other non-zero fields are prohibited (with EINVAL). If the user has set ctx_out/ctx_size_out, copy the potentially modified __sk_buff back to the userspace. We require all fields of input __sk_buff except the ones we explicitly support to be set to zero. The expectation is that in the future we might add support for more fields and we want to fail explicitly if the user runs the program on the kernel where we don't yet support them. The API is intentionally vague (i.e. we don't explicitly add __sk_buff to bpf_attr, but ctx_in) to potentially let other test_run types use this interface in the future (this can be xdp_md for xdp types for example). v4: * don't copy more than allowed in bpf_ctx_init [Martin] v3: * handle case where ctx_in is NULL, but ctx_out is not [Martin] * convert size==0 checks to ptr==NULL checks and add some extra ptr checks [Martin] v2: * Addressed comments from Martin Lau Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
71b91a50 |
|
07-Mar-2019 |
Bo YU <tsu.yubo@gmail.com> |
bpf: fix warning about using plain integer as NULL Sparse warning below: sudo make C=2 CF=-D__CHECK_ENDIAN__ M=net/bpf/ CHECK net/bpf//test_run.c net/bpf//test_run.c:19:77: warning: Using plain integer as NULL pointer ./include/linux/bpf-cgroup.h:295:77: warning: Using plain integer as NULL pointer Fixes: 8bad74f9840f ("bpf: extend cgroup bpf core to allow multiple cgroup storage types") Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Bo YU <tsu.yubo@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
a439184d |
|
19-Feb-2019 |
Stanislav Fomichev <sdf@google.com> |
bpf/test_run: fix unkillable BPF_PROG_TEST_RUN for flow dissector Syzbot found out that running BPF_PROG_TEST_RUN with repeat=0xffffffff makes process unkillable. The problem is that when CONFIG_PREEMPT is enabled, we never see need_resched() return true. This is due to the fact that preempt_enable() (which we do in bpf_test_run_one on each iteration) now handles resched if it's needed. Let's disable preemption for the whole run, not per test. In this case we can properly see whether resched is needed. Let's also properly return -EINTR to the userspace in case of a signal interrupt. This is a follow up for a recently fixed issue in bpf_test_run, see commit df1a2cb7c74b ("bpf/test_run: fix unkillable BPF_PROG_TEST_RUN"). Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
df1a2cb7 |
|
12-Feb-2019 |
Stanislav Fomichev <sdf@google.com> |
bpf/test_run: fix unkillable BPF_PROG_TEST_RUN Syzbot found out that running BPF_PROG_TEST_RUN with repeat=0xffffffff makes process unkillable. The problem is that when CONFIG_PREEMPT is enabled, we never see need_resched() return true. This is due to the fact that preempt_enable() (which we do in bpf_test_run_one on each iteration) now handles resched if it's needed. Let's disable preemption for the whole run, not per test. In this case we can properly see whether resched is needed. Let's also properly return -EINTR to the userspace in case of a signal interrupt. See recent discussion: http://lore.kernel.org/netdev/CAH3MdRWHr4N8jei8jxDppXjmw-Nw=puNDLbu1dQOFQHxfU2onA@mail.gmail.com I'll follow up with the same fix bpf_prog_test_run_flow_dissector in bpf-next. Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
b7a1848e |
|
28-Jan-2019 |
Stanislav Fomichev <sdf@google.com> |
bpf: add BPF_PROG_TEST_RUN support for flow dissector The input is packet data, the output is struct bpf_flow_key. This should make it easy to test flow dissector programs without elaborate setup. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
b5a36b1e |
|
03-Dec-2018 |
Lorenz Bauer <lmb@cloudflare.com> |
bpf: respect size hint to BPF_PROG_TEST_RUN if present Use data_size_out as a size hint when copying test output to user space. ENOSPC is returned if the output buffer is too small. Callers which so far did not set data_size_out are not affected. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
dcb40590 |
|
01-Dec-2018 |
Roman Gushchin <guro@fb.com> |
bpf: refactor bpf_test_run() to separate own failures and test program result After commit f42ee093be29 ("bpf/test_run: support cgroup local storage") the bpf_test_run() function may fail with -ENOMEM, if it's not possible to allocate memory for a cgroup local storage. This error shouldn't be mixed with the return value of the testing program. Let's add an additional argument with a pointer where to store the testing program's result; and make bpf_test_run() return either 0 or -ENOMEM. Fixes: f42ee093be29 ("bpf/test_run: support cgroup local storage") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
2cb494a3 |
|
19-Oct-2018 |
Song Liu <songliubraving@fb.com> |
bpf: add tests for direct packet access from CGROUP_SKB Tests are added to make sure CGROUP_SKB cannot access: tc_classid, data_meta, flow_keys and can read and write: mark, prority, and cb[0-4] and can read other fields. To make selftest with skb->sk work, a dummy sk is added in bpf_prog_test_run_skb(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
8bad74f9 |
|
28-Sep-2018 |
Roman Gushchin <guro@fb.com> |
bpf: extend cgroup bpf core to allow multiple cgroup storage types In order to introduce per-cpu cgroup storage, let's generalize bpf cgroup core to support multiple cgroup storage types. Potentially, per-node cgroup storage can be added later. This commit is mostly a formal change that replaces cgroup_storage pointer with a array of cgroup_storage pointers. It doesn't actually introduce a new storage type, it will be done later. Each bpf program is now able to have one cgroup storage of each type. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
f42ee093 |
|
02-Aug-2018 |
Roman Gushchin <guro@fb.com> |
bpf/test_run: support cgroup local storage Allocate a temporary cgroup storage to use for bpf program test runs. Because the test program is not actually attached to a cgroup, the storage is allocated manually just for the execution of the bpf program. If the program is executed multiple times, the storage is not zeroed on each run, emulating multiple runs of the program, attached to a real cgroup. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
6e6fddc7 |
|
11-Jul-2018 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: fix panic due to oob in bpf_prog_test_run_skb sykzaller triggered several panics similar to the below: [...] [ 248.851531] BUG: KASAN: use-after-free in _copy_to_user+0x5c/0x90 [ 248.857656] Read of size 985 at addr ffff8808017ffff2 by task a.out/1425 [...] [ 248.865902] CPU: 1 PID: 1425 Comm: a.out Not tainted 4.18.0-rc4+ #13 [ 248.865903] Hardware name: Supermicro SYS-5039MS-H12TRF/X11SSE-F, BIOS 2.1a 03/08/2018 [ 248.865905] Call Trace: [ 248.865910] dump_stack+0xd6/0x185 [ 248.865911] ? show_regs_print_info+0xb/0xb [ 248.865913] ? printk+0x9c/0xc3 [ 248.865915] ? kmsg_dump_rewind_nolock+0xe4/0xe4 [ 248.865919] print_address_description+0x6f/0x270 [ 248.865920] kasan_report+0x25b/0x380 [ 248.865922] ? _copy_to_user+0x5c/0x90 [ 248.865924] check_memory_region+0x137/0x190 [ 248.865925] kasan_check_read+0x11/0x20 [ 248.865927] _copy_to_user+0x5c/0x90 [ 248.865930] bpf_test_finish.isra.8+0x4f/0xc0 [ 248.865932] bpf_prog_test_run_skb+0x6a0/0xba0 [...] After scrubbing the BPF prog a bit from the noise, turns out it called bpf_skb_change_head() for the lwt_xmit prog with headroom of 2. Nothing wrong in that, however, this was run with repeat >> 0 in bpf_prog_test_run_skb() and the same skb thus keeps changing until the pskb_expand_head() called from skb_cow() keeps bailing out in atomic alloc context with -ENOMEM. So upon return we'll basically have 0 headroom left yet blindly do the __skb_push() of 14 bytes and keep copying data from there in bpf_test_finish() out of bounds. Fix to check if we have enough headroom and if pskb_expand_head() fails, bail out with error. Another bug independent of this fix (but related in triggering above) is that BPF_PROG_TEST_RUN should be reworked to reset the skb/xdp buffer to it's original state from input as otherwise repeating the same test in a loop won't work for benchmarking when underlying input buffer is getting changed by the prog each time and reused for the next run leading to unexpected results. Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command") Reported-by: syzbot+709412e651e55ed96498@syzkaller.appspotmail.com Reported-by: syzbot+54f39d6ab58f39720a55@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
587b80cc |
|
17-Apr-2018 |
Nikita V. Shirokov <tehnerd@tehnerd.com> |
bpf: making bpf_prog_test run aware of possible data_end ptr change after introduction of bpf_xdp_adjust_tail helper packet length could be changed not only if xdp->data pointer has been changed but xdp->data_end as well. making bpf_prog_test_run aware of this possibility Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
65073a67 |
|
30-Jan-2018 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: fix null pointer deref in bpf_prog_test_run_xdp syzkaller was able to generate the following XDP program ... (18) r0 = 0x0 (61) r5 = *(u32 *)(r1 +12) (04) (u32) r0 += (u32) 0 (95) exit ... and trigger a NULL pointer dereference in ___bpf_prog_run() via bpf_prog_test_run_xdp() where this was attempted to run. Reason is that recent xdp_rxq_info addition to XDP programs updated all drivers, but not bpf_prog_test_run_xdp(), where xdp_buff is set up. Thus when context rewriter does the deref on the netdev it's NULL at runtime. Fix it by using xdp_rxq from loopback dev. __netif_get_rx_queue() helper can also be reused in various other locations later on. Fixes: 02dd3291b2f0 ("bpf: finally expose xdp_rxq_info to XDP bpf-programs") Reported-by: syzbot+1eb094057b338eb1fc00@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
de8f3a83 |
|
24-Sep-2017 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: add meta pointer for direct access This work enables generic transfer of metadata from XDP into skb. The basic idea is that we can make use of the fact that the resulting skb must be linear and already comes with a larger headroom for supporting bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work on a similar principle and introduce a small helper bpf_xdp_adjust_meta() for adjusting a new pointer called xdp->data_meta. Thus, the packet has a flexible and programmable room for meta data, followed by the actual packet data. struct xdp_buff is therefore laid out that we first point to data_hard_start, then data_meta directly prepended to data followed by data_end marking the end of packet. bpf_xdp_adjust_head() takes into account whether we have meta data already prepended and if so, memmove()s this along with the given offset provided there's enough room. xdp->data_meta is optional and programs are not required to use it. The rationale is that when we process the packet in XDP (e.g. as DoS filter), we can push further meta data along with it for the XDP_PASS case, and give the guarantee that a clsact ingress BPF program on the same device can pick this up for further post-processing. Since we work with skb there, we can also set skb->mark, skb->priority or other skb meta data out of BPF, thus having this scratch space generic and programmable allows for more flexibility than defining a direct 1:1 transfer of potentially new XDP members into skb (it's also more efficient as we don't need to initialize/handle each of such new members). The facility also works together with GRO aggregation. The scratch space at the head of the packet can be multiple of 4 byte up to 32 byte large. Drivers not yet supporting xdp->data_meta can simply be set up with xdp->data_meta as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out, such that the subsequent match against xdp->data for later access is guaranteed to fail. The verifier treats xdp->data_meta/xdp->data the same way as we treat xdp->data/xdp->data_end pointer comparisons. The requirement for doing the compare against xdp->data is that it hasn't been modified from it's original address we got from ctx access. It may have a range marking already from prior successful xdp->data/xdp->data_end pointer comparisons though. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6aaae2b6 |
|
24-Sep-2017 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: rename bpf_compute_data_end into bpf_compute_data_pointers Just do the rename into bpf_compute_data_pointers() as we'll add one more pointer here to recompute. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
586f8525 |
|
02-May-2017 |
David Miller <davem@davemloft.net> |
bpf: Align packet data properly in program testing framework. Make sure we apply NET_IP_ALIGN when reserving headroom for SKB and XDP test runs, just like a real driver would. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
78e52272 |
|
02-May-2017 |
David Miller <davem@davemloft.net> |
bpf: Do not dereference user pointer in bpf_test_finish(). Instead, pass the kattr in which has a kernel side copy of this data structure from userspace already. Fix based upon a suggestion from Alexei Starovoitov. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
1cf1cae9 |
|
30-Mar-2017 |
Alexei Starovoitov <ast@kernel.org> |
bpf: introduce BPF_PROG_TEST_RUN command development and testing of networking bpf programs is quite cumbersome. Despite availability of user space bpf interpreters the kernel is the ultimate authority and execution environment. Current test frameworks for TC include creation of netns, veth, qdiscs and use of various packet generators just to test functionality of a bpf program. XDP testing is even more complicated, since qemu needs to be started with gro/gso disabled and precise queue configuration, transferring of xdp program from host into guest, attaching to virtio/eth0 and generating traffic from the host while capturing the results from the guest. Moreover analyzing performance bottlenecks in XDP program is impossible in virtio environment, since cost of running the program is tiny comparing to the overhead of virtio packet processing, so performance testing can only be done on physical nic with another server generating traffic. Furthermore ongoing changes to user space control plane of production applications cannot be run on the test servers leaving bpf programs stubbed out for testing. Last but not least, the upstream llvm changes are validated by the bpf backend testsuite which has no ability to test the code generated. To improve this situation introduce BPF_PROG_TEST_RUN command to test and performance benchmark bpf programs. Joint work with Daniel Borkmann. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|