#
2edc3de6 |
|
07-Mar-2024 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Recognize btf_decl_tag("arg: Arena") as PTR_TO_ARENA. In global bpf functions recognize btf_decl_tag("arg:arena") as PTR_TO_ARENA. Note, when the verifier sees: __weak void foo(struct bar *p) it recognizes 'p' as PTR_TO_MEM and 'struct bar' has to be a struct with scalars. Hence the only way to use arena pointers in global functions is to tag them with "arg:arena". Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20240308010812.89848-7-alexei.starovoitov@gmail.com
|
#
bd70a8fb |
|
05-Mar-2024 |
Eduard Zingerman <eddyz87@gmail.com> |
bpf: Allow all printable characters in BTF DATASEC names The intent is to allow libbpf to use SEC("?.struct_ops") to identify struct_ops maps that are optional, e.g. like in the following BPF code: SEC("?.struct_ops") struct test_ops optional_map = { ... }; Which yields the following BTF: ... [13] DATASEC '?.struct_ops' size=0 vlen=... ... To load such BTF libbpf rewrites DATASEC name before load. After this patch the rewrite won't be necessary. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240306104529.6453-15-eddyz87@gmail.com
|
#
879bbe7a |
|
12-Feb-2024 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: don't infer PTR_TO_CTX for programs with unnamed context type For program types that don't have named context type name (e.g., BPF iterator programs or tracepoint programs), ctx_tname will be a non-NULL empty string. For such programs it shouldn't be possible to have PTR_TO_CTX argument for global subprogs based on type name alone. arg:ctx tag is the only way to have PTR_TO_CTX passed into global subprog for such program types. Fix this loophole, which currently would assume PTR_TO_CTX whenever user uses a pointer to anonymous struct as an argument to their global subprogs. This happens in practice with the following (quite common, in practice) approach: typedef struct { /* anonymous */ int x; } my_type_t; int my_subprog(my_type_t *arg) { ... } User's intent is to have PTR_TO_MEM argument for `arg`, but verifier will complain about expecting PTR_TO_CTX. This fix also closes unintended s390x-specific KPROBE handling of PTR_TO_CTX case. Selftest change is necessary to accommodate this. Fixes: 91cc1a99740e ("bpf: Annotate context types") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240212233221.2575350-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
824c58fb |
|
12-Feb-2024 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: handle bpf_user_pt_regs_t typedef explicitly for PTR_TO_CTX global arg Expected canonical argument type for global function arguments representing PTR_TO_CTX is `bpf_user_pt_regs_t *ctx`. This currently works on s390x by accident because kernel resolves such typedef to underlying struct (which is anonymous on s390x), and erroneously accepting it as expected context type. We are fixing this problem next, which would break s390x arch, so we need to handle `bpf_user_pt_regs_t` case explicitly for KPROBE programs. Fixes: 91cc1a99740e ("bpf: Annotate context types") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240212233221.2575350-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
fb5b86cf |
|
12-Feb-2024 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: simplify btf_get_prog_ctx_type() into btf_is_prog_ctx_type() Return result of btf_get_prog_ctx_type() is never used and callers only check NULL vs non-NULL case to determine if given type matches expected PTR_TO_CTX type. So rename function to `btf_is_prog_ctx_type()` and return a simple true/false. We'll use this simpler interface to handle kprobe program type's special typedef case in the next patch. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240212233221.2575350-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
16116035 |
|
08-Feb-2024 |
Kui-Feng Lee <thinker.li@gmail.com> |
bpf: Create argument information for nullable arguments. Collect argument information from the type information of stub functions to mark arguments of BPF struct_ops programs with PTR_MAYBE_NULL if they are nullable. A nullable argument is annotated by suffixing "__nullable" at the argument name of stub function. For nullable arguments, this patch sets a struct bpf_ctx_arg_aux to label their reg_type with PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL. This makes the verifier to check programs and ensure that they properly check the pointer. The programs should check if the pointer is null before accessing the pointed memory. The implementer of a struct_ops type should annotate the arguments that can be null. The implementer should define a stub function (empty) as a placeholder for each defined operator. The name of a stub function should be in the pattern "<st_op_type>__<operator name>". For example, for test_maybe_null of struct bpf_testmod_ops, it's stub function name should be "bpf_testmod_ops__test_maybe_null". You mark an argument nullable by suffixing the argument name with "__nullable" at the stub function. Since we already has stub functions for kCFI, we just reuse these stub functions with the naming convention mentioned earlier. These stub functions with the naming convention is only required if there are nullable arguments to annotate. For functions having not nullable arguments, stub functions are not necessary for the purpose of this patch. This patch will prepare a list of struct bpf_ctx_arg_aux, aka arg_info, for each member field of a struct_ops type. "arg_info" will be assigned to "prog->aux->ctx_arg_info" of BPF struct_ops programs in check_struct_ops_btf_id() so that it can be used by btf_ctx_access() later to set reg_type properly for the verifier. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240209023750.1153905-4-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
6115a0ae |
|
08-Feb-2024 |
Kui-Feng Lee <thinker.li@gmail.com> |
bpf: Move __kfunc_param_match_suffix() to btf.c. Move __kfunc_param_match_suffix() to btf.c and rename it as btf_param_match_suffix(). It can be reused by bpf_struct_ops later. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240209023750.1153905-3-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
77c0208e |
|
08-Feb-2024 |
Kui-Feng Lee <thinker.li@gmail.com> |
bpf: add btf pointer to struct bpf_ctx_arg_aux. Enable the providers to use types defined in a module instead of in the kernel (btf_vmlinux). Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240209023750.1153905-2-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
947e56f8 |
|
07-Feb-2024 |
Geliang Tang <tanggeliang@kylinos.cn> |
bpf, btf: Check btf for register_bpf_struct_ops Similar to the handling in the functions __register_btf_kfunc_id_set() and register_btf_id_dtor_kfuncs(), this patch uses the newly added helper check_btf_kconfigs() to handle module with its btf section stripped. While at it, the patch also adds the missed IS_ERR() check to fix the commit f6be98d19985 ("bpf, net: switch to dynamic registration") Fixes: f6be98d19985 ("bpf, net: switch to dynamic registration") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/69082b9835463fe36f9e354bddf2d0a97df39c2b.1707373307.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
9e60b0e0 |
|
07-Feb-2024 |
Geliang Tang <tanggeliang@kylinos.cn> |
bpf, btf: Add check_btf_kconfigs helper This patch extracts duplicate code on error path when btf_get_module_btf() returns NULL from the functions __register_btf_kfunc_id_set() and register_btf_id_dtor_kfuncs() into a new helper named check_btf_kconfigs() to check CONFIG_DEBUG_INFO_BTF and CONFIG_DEBUG_INFO_BTF_MODULES in it. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/fa5537fc55f1e4d0bfd686598c81b7ab9dbd82b7.1707373307.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
b9a395f0 |
|
07-Feb-2024 |
Geliang Tang <tanggeliang@kylinos.cn> |
bpf, btf: Fix return value of register_btf_id_dtor_kfuncs The same as __register_btf_kfunc_id_set(), to let the modules with stripped btf section loaded, this patch changes the return value of register_btf_id_dtor_kfuncs() too from -ENOENT to 0 when btf is NULL. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/eab65586d7fb0e72f2707d3747c7d4a5d60c823f.1707373307.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
1eb98674 |
|
02-Feb-2024 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: don't emit warnings intended for global subprogs for static subprogs When btf_prepare_func_args() was generalized to handle both static and global subprogs, a few warnings/errors that are meant only for global subprog cases started to be emitted for static subprogs, where they are sort of expected and irrelavant. Stop polutting verifier logs with irrelevant scary-looking messages. Fixes: e26080d0da87 ("bpf: prepare btf_prepare_func_args() for handling static subprogs") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240202190529.2374377-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
6f3189f3 |
|
28-Jan-2024 |
Daniel Xu <dxu@dxuuu.xyz> |
bpf: treewide: Annotate BPF kfuncs in BTF This commit marks kfuncs as such inside the .BTF_ids section. The upshot of these annotations is that we'll be able to automatically generate kfunc prototypes for downstream users. The process is as follows: 1. In source, use BTF_KFUNCS_START/END macro pair to mark kfuncs 2. During build, pahole injects into BTF a "bpf_kfunc" BTF_DECL_TAG for each function inside BTF_KFUNCS sets 3. At runtime, vmlinux or module BTF is made available in sysfs 4. At runtime, bpftool (or similar) can look at provided BTF and generate appropriate prototypes for functions with "bpf_kfunc" tag To ensure future kfunc are similarly tagged, we now also return error inside kfunc registration for untagged kfuncs. For vmlinux kfuncs, we also WARN(), as initcall machinery does not handle errors. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/e55150ceecbf0a5d961e608941165c0bee7bc943.1706491398.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
8f2b44cd |
|
29-Jan-2024 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: add arg:nullable tag to be combined with trusted pointers Add ability to mark arg:trusted arguments with optional arg:nullable tag to mark it as PTR_TO_BTF_ID_OR_NULL variant, which will allow callers to pass NULL, and subsequently will force global subprog's code to do NULL check. This allows to have "optional" PTR_TO_BTF_ID values passed into global subprogs. For now arg:nullable cannot be combined with anything else. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240130000648.2144827-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
e2b3c4ff |
|
29-Jan-2024 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: add __arg_trusted global func arg tag Add support for passing PTR_TO_BTF_ID registers to global subprogs. Currently only PTR_TRUSTED flavor of PTR_TO_BTF_ID is supported. Non-NULL semantics is assumed, so caller will be forced to prove PTR_TO_BTF_ID can't be NULL. Note, we disallow global subprogs to destroy passed in PTR_TO_BTF_ID arguments, even the trusted one. We achieve that by not setting ref_obj_id when validating subprog code. This basically enforces (in Rust terms) borrowing semantics vs move semantics. Borrowing semantics seems to be a better fit for isolated global subprog validation approach. Implementation-wise, we utilize existing logic for matching user-provided BTF type to kernel-side BTF type, used by BPF CO-RE logic and following same matching rules. We enforce a unique match for types. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240130000648.2144827-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
add9c58c |
|
25-Jan-2024 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: move arg:ctx type enforcement check inside the main logic loop Now that bpf and bpf-next trees converged and we don't run the risk of merge conflicts, move btf_validate_prog_ctx_type() into its most logical place inside the main logic loop. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240125205510.3642094-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
7c81c249 |
|
19-Jan-2024 |
Kui-Feng Lee <thinker.li@gmail.com> |
bpf: export btf_ctx_access to modules. The module requires the use of btf_ctx_access() to invoke bpf_tracing_btf_ctx_access() from a module. This function is valuable for implementing validation functions that ensure proper access to ctx. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-14-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
f6be98d1 |
|
19-Jan-2024 |
Kui-Feng Lee <thinker.li@gmail.com> |
bpf, net: switch to dynamic registration Replace the static list of struct_ops types with per-btf struct_ops_tab to enable dynamic registration. Both bpf_dummy_ops and bpf_tcp_ca now utilize the registration function instead of being listed in bpf_struct_ops_types.h. Cc: netdev@vger.kernel.org Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-12-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
e6199511 |
|
19-Jan-2024 |
Kui-Feng Lee <thinker.li@gmail.com> |
bpf: add struct_ops_tab to btf. Maintain a registry of registered struct_ops types in the per-btf (module) struct_ops_tab. This registry allows for easy lookup of struct_ops types that are registered by a specific module. It is a preparation work for supporting kernel module struct_ops in a latter patch. Each struct_ops will be registered under its own kernel module btf and will be stored in the newly added btf->struct_ops_tab. The bpf verifier and bpf syscall (e.g. prog and map cmd) can find the struct_ops and its btf type/size/id... information from btf->struct_ops_tab. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-5-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
3b1f89e7 |
|
19-Jan-2024 |
Kui-Feng Lee <thinker.li@gmail.com> |
bpf: refactory struct_ops type initialization to a function. Move the majority of the code to bpf_struct_ops_init_one(), which can then be utilized for the initialization of newly registered dynamically allocated struct_ops types in the following patches. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-2-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
522bb2c1 |
|
04-Jan-2024 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: support multiple tags per argument Add ability to iterate multiple decl_tag types pointed to the same function argument. Use this to support multiple __arg_xxx tags per global subprog argument. We leave btf_find_decl_tag_value() intact, but change its implementation to use a new btf_find_next_decl_tag() which can be straightforwardly used to find next BTF type ID of a matching btf_decl_tag type. btf_prepare_func_args() is switched from btf_find_decl_tag_value() to btf_find_next_decl_tag() to gain multiple tags per argument support. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240105000909.2818934-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
54c11ec4 |
|
04-Jan-2024 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: prepare btf_prepare_func_args() for multiple tags per argument Add btf_arg_tag flags enum to be able to record multiple tags per argument. Also streamline pointer argument processing some more. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240105000909.2818934-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
18810ad3 |
|
04-Jan-2024 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: make sure scalar args don't accept __arg_nonnull tag Move scalar arg processing in btf_prepare_func_args() after all pointer arg processing is done. This makes it easier to do validation. One example of unintended behavior right now is ability to specify __arg_nonnull for integer/enum arguments. This patch fixes this. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240105000909.2818934-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
0ba97151 |
|
17-Jan-2024 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: enforce types for __arg_ctx-tagged arguments in global subprogs Add enforcement of expected types for context arguments tagged with arg:ctx (__arg_ctx) tag. First, any program type will accept generic `void *` context type when combined with __arg_ctx tag. Besides accepting "canonical" struct names and `void *`, for a bunch of program types for which program context is actually a named struct, we allows a bunch of pragmatic exceptions to match real-world and expected usage: - for both kprobes and perf_event we allow `bpf_user_pt_regs_t *` as canonical context argument type, where `bpf_user_pt_regs_t` is a *typedef*, not a struct; - for kprobes, we also always accept `struct pt_regs *`, as that's what actually is passed as a context to any kprobe program; - for perf_event, we resolve typedefs (unless it's `bpf_user_pt_regs_t`) down to actual struct type and accept `struct pt_regs *`, or `struct user_pt_regs *`, or `struct user_regs_struct *`, depending on the actual struct type kernel architecture points `bpf_user_pt_regs_t` typedef to; otherwise, canonical `struct bpf_perf_event_data *` is expected; - for raw_tp/raw_tp.w programs, `u64/long *` are accepted, as that's what's expected with BPF_PROG() usage; otherwise, canonical `struct bpf_raw_tracepoint_args *` is expected; - tp_btf supports both `struct bpf_raw_tracepoint_args *` and `u64 *` formats, both are coded as expections as tp_btf is actually a TRACING program type, which has no canonical context type; - iterator programs accept `struct bpf_iter__xxx *` structs, currently with no further iterator-type specific enforcement; - fentry/fexit/fmod_ret/lsm/struct_ops all accept `u64 *`; - classic tracepoint programs, as well as syscall and freplace programs allow any user-provided type. In all other cases kernel will enforce exact match of struct name to expected canonical type. And if user-provided type doesn't match that expectation, verifier will emit helpful message with expected type name. Note a bit unnatural way the check is done after processing all the arguments. This is done to avoid conflict between bpf and bpf-next trees. Once trees converge, a small follow up patch will place a simple btf_validate_prog_ctx_type() check into a proper ARG_PTR_TO_CTX branch (which bpf-next tree patch refactored already), removing duplicated arg:ctx detection logic. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240118033143.3384355-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
66967a32 |
|
17-Jan-2024 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: extract bpf_ctx_convert_map logic and make it more reusable Refactor btf_get_prog_ctx_type() a bit to allow reuse of bpf_ctx_convert_map logic in more than one places. Simplify interface by returning btf_type instead of btf_member (field reference in BTF). To do the above we need to touch and start untangling btf_translate_to_vmlinux() implementation. We do the bare minimum to not regress anything for btf_translate_to_vmlinux(), but its implementation is very questionable for what it claims to be doing. Mapping kfunc argument types to kernel corresponding types conceptually is quite different from recognizing program context types. Fixing this is out of scope for this change though. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240118033143.3384355-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
a64bfe61 |
|
14-Dec-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: add support for passing dynptr pointer to global subprog Add ability to pass a pointer to dynptr into global functions. This allows to have global subprogs that accept and work with generic dynptrs that are created by caller. Dynptr argument is detected based on the name of a struct type, if it's "bpf_dynptr", it's assumed to be a proper dynptr pointer. Both actual struct and forward struct declaration types are supported. This is conceptually exactly the same semantics as bpf_user_ringbuf_drain()'s use of dynptr to pass a variable-sized pointer to ringbuf record. So we heavily rely on CONST_PTR_TO_DYNPTR bits of already existing logic in the verifier. During global subprog validation, we mark such CONST_PTR_TO_DYNPTR as having LOCAL type, as that's the most unassuming type of dynptr and it doesn't have any special helpers that can try to free or acquire extra references (unlike skb, xdp, or ringbuf dynptr). So that seems like a safe "choice" to make from correctness standpoint. It's still possible to pass any type of dynptr to such subprog, though, because generic dynptr helpers, like getting data/slice pointers, read/write memory copying routines, dynptr adjustment and getter routines all work correctly with any type of dynptr. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
94e1c70a |
|
14-Dec-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: support 'arg:xxx' btf_decl_tag-based hints for global subprog args Add support for annotating global BPF subprog arguments to provide more information about expected semantics of the argument. Currently, verifier relies purely on argument's BTF type information, and supports three general use cases: scalar, pointer-to-context, and pointer-to-fixed-size-memory. Scalar and pointer-to-fixed-mem work well in practice and are quite natural to use. But pointer-to-context is a bit problematic, as typical BPF users don't realize that they need to use a special type name to signal to verifier that argument is not just some pointer, but actually a PTR_TO_CTX. Further, even if users do know which type to use, it is limiting in situations where the same BPF program logic is used across few different program types. Common case is kprobes, tracepoints, and perf_event programs having a helper to send some data over BPF perf buffer. bpf_perf_event_output() requires `ctx` argument, and so it's quite cumbersome to share such global subprog across few BPF programs of different types, necessitating extra static subprog that is context type-agnostic. Long story short, there is a need to go beyond types and allow users to add hints to global subprog arguments to define expectations. This patch adds such support for two initial special tags: - pointer to context; - non-null qualifier for generic pointer arguments. All of the above came up in practice already and seem generally useful additions. Non-null qualifier is an often requested feature, which currently has to be worked around by having unnecessary NULL checks inside subprogs even if we know that arguments are never NULL. Pointer to context was discussed earlier. As for implementation, we utilize btf_decl_tag attribute and set up an "arg:xxx" convention to specify argument hint. As such: - btf_decl_tag("arg:ctx") is a PTR_TO_CTX hint; - btf_decl_tag("arg:nonnull") marks pointer argument as not allowed to be NULL, making NULL check inside global subprog unnecessary. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
c5a72447 |
|
14-Dec-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: move subprog call logic back to verifier.c Subprog call logic in btf_check_subprog_call() currently has both a lot of BTF parsing logic (which is, presumably, what justified putting it into btf.c), but also a bunch of register state checks, some of each utilize deep verifier logic helpers, necessarily exported from verifier.c: check_ptr_off_reg(), check_func_arg_reg_off(), and check_mem_reg(). Going forward, btf_check_subprog_call() will have a minimum of BTF-related logic, but will get more internal verifier logic related to register state manipulation. So move it into verifier.c to minimize amount of verifier-specific logic exposed to btf.c. We do this move before refactoring btf_check_func_arg_match() to preserve as much history post-refactoring as possible. No functional changes. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
e26080d0 |
|
14-Dec-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: prepare btf_prepare_func_args() for handling static subprogs Generalize btf_prepare_func_args() to support both global and static subprogs. We are going to utilize this property in the next patch, reusing btf_prepare_func_args() for subprog call logic instead of reparsing BTF information in a completely separate implementation. btf_prepare_func_args() now detects whether subprog is global or static makes slight logic adjustments for static func cases, like not failing fatally (-EFAULT) for conditions that are allowable for static subprogs. Somewhat subtle (but major!) difference is the handling of pointer arguments. Both global and static functions need to handle special context arguments (which are pointers to predefined type names), but static subprogs give up on any other pointers, falling back to marking subprog as "unreliable", disabling the use of BTF type information altogether. For global functions, though, we are assuming that such pointers to unrecognized types are just pointers to fixed-sized memory region (or error out if size cannot be established, like for `void *` pointers). This patch accommodates these small differences and sets up a stage for refactoring in the next patch, eliminating a separate BTF-based parsing logic in btf_check_func_arg_match(). Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
5eccd2db |
|
14-Dec-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: reuse btf_prepare_func_args() check for main program BTF validation Instead of btf_check_subprog_arg_match(), use btf_prepare_func_args() logic to validate "trustworthiness" of main BPF program's BTF information, if it is present. We ignored results of original BTF check anyway, often times producing confusing and ominously-sounding "reg type unsupported for arg#0 function" message, which has no apparent effect on program correctness and verification process. All the -EFAULT returning sanity checks are already performed in check_btf_info_early(), so there is zero reason to have this duplication of logic between btf_check_subprog_call() and btf_check_subprog_arg_match(). Dropping btf_check_subprog_arg_match() simplifies btf_check_func_arg_match() further removing `bool processing_call` flag. One subtle bit that was done by btf_check_subprog_arg_match() was potentially marking main program's BTF as unreliable. We do this explicitly now with a dedicated simple check, preserving the original behavior, but now based on well factored btf_prepare_func_args() logic. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
4ba1d0f2 |
|
14-Dec-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: abstract away global subprog arg preparation logic from reg state setup btf_prepare_func_args() is used to understand expectations and restrictions on global subprog arguments. But current implementation is hard to extend, as it intermixes BTF-based func prototype parsing and interpretation logic with setting up register state at subprog entry. Worse still, those registers are not completely set up inside btf_prepare_func_args(), requiring some more logic later in do_check_common(). Like calling mark_reg_unknown() and similar initialization operations. This intermixing of BTF interpretation and register state setup is problematic. First, it causes duplication of BTF parsing logic for global subprog verification (to set up initial state of global subprog) and global subprog call sites analysis (when we need to check that whatever is being passed into global subprog matches expectations), performed in btf_check_subprog_call(). Given we want to extend global func argument with tags later, this duplication is problematic. So refactor btf_prepare_func_args() to do only BTF-based func proto and args parsing, returning high-level argument "expectations" only, with no regard to specifics of register state. I.e., if it's a context argument, instead of setting register state to PTR_TO_CTX, we return ARG_PTR_TO_CTX enum for that argument as "an argument specification" for further processing inside do_check_common(). Similarly for SCALAR arguments, PTR_TO_MEM, etc. This allows to reuse btf_prepare_func_args() in following patches at global subprog call site analysis time. It also keeps register setup code consistently in one place, do_check_common(). Besides all this, we cache this argument specs information inside env->subprog_info, eliminating the need to redo these potentially expensive BTF traversals, especially if BPF program's BTF is big and/or there are lots of global subprog calls. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231215011334.2307144-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
1a1ad782 |
|
04-Dec-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: tidy up exception callback management a bit Use the fact that we are passing subprog index around and have a corresponding struct bpf_subprog_info in bpf_verifier_env for each subprogram. We don't need to separately pass around a flag whether subprog is exception callback or not, each relevant verifier function can determine this using provided subprog index if we maintain bpf_subprog_info properly. Also move out exception callback-specific logic from btf_prepare_func_args(), keeping it generic. We can enforce all these restriction right before exception callback verification pass. We add out parameter, arg_cnt, for now, but this will be unnecessary with subsequent refactoring and will be removed. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20231204233931.49758-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
790ce3cf |
|
07-Nov-2023 |
Dave Marchevsky <davemarchevsky@fb.com> |
bpf: Move GRAPH_{ROOT,NODE}_MASK macros into btf_field_type enum This refactoring patch removes the unused BPF_GRAPH_NODE_OR_ROOT btf_field_type and moves BPF_GRAPH_{NODE,ROOT} macros into the btf_field_type enum. Further patches in the series will use BPF_GRAPH_NODE, so let's move this useful definition out of btf.c. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20231107085639.3016113-5-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
53e380d2 |
|
11-Oct-2023 |
Daan De Meyer <daan.j.demeyer@gmail.com> |
bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf As prep for adding unix socket support to the cgroup sockaddr hooks, let's add a kfunc bpf_sock_addr_set_sun_path() that allows modifying a unix sockaddr from bpf. While this is already possible for AF_INET and AF_INET6, we'll need this kfunc when we add unix socket support since modifying the address for those requires modifying both the address and the sockaddr length. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-4-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
b9ae0c9d |
|
12-Sep-2023 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Add support for custom exception callbacks By default, the subprog generated by the verifier to handle a thrown exception hardcodes a return value of 0. To allow user-defined logic and modification of the return value when an exception is thrown, introduce the 'exception_callback:' declaration tag, which marks a callback as the default exception handler for the program. The format of the declaration tag is 'exception_callback:<value>', where <value> is the name of the exception callback. Each main program can be tagged using this BTF declaratiion tag to associate it with an exception callback. In case the tag is absent, the default callback is used. As such, the exception callback cannot be modified at runtime, only set during verification. Allowing modification of the callback for the current program execution at runtime leads to issues when the programs begin to nest, as any per-CPU state maintaing this information will have to be saved and restored. We don't want it to stay in bpf_prog_aux as this takes a global effect for all programs. An alternative solution is spilling the callback pointer at a known location on the program stack on entry, and then passing this location to bpf_throw as a parameter. However, since exceptions are geared more towards a use case where they are ideally never invoked, optimizing for this use case and adding to the complexity has diminishing returns. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230912233214.1518551-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
55db92f4 |
|
27-Aug-2023 |
Yonghong Song <yonghong.song@linux.dev> |
bpf: Add BPF_KPTR_PERCPU as a field type BPF_KPTR_PERCPU represents a percpu field type like below struct val_t { ... fields ... }; struct t { ... struct val_t __percpu_kptr *percpu_data_ptr; ... }; where #define __percpu_kptr __attribute__((btf_type_tag("percpu_kptr"))) While BPF_KPTR_REF points to a trusted kernel object or a trusted local object, BPF_KPTR_PERCPU points to a trusted local percpu object. This patch added basic support for BPF_KPTR_PERCPU related to percpu_kptr field parsing, recording and free operations. BPF_KPTR_PERCPU also supports the same map types as BPF_KPTR_REF does. Note that unlike a local kptr, it is possible that a BPF_KTPR_PERCPU struct may not contain any special fields like other kptr, bpf_spin_lock, bpf_list_head, etc. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230827152739.1996391-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
a8f12572 |
|
08-Sep-2023 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
bpf: Fix a erroneous check after snprintf() snprintf() does not return negative error code on error, it returns the number of characters which *would* be generated for the given input. Fix the error handling check. Fixes: 57539b1c0ac2 ("bpf: Enable annotating trusted nested pointers") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/393bdebc87b22563c08ace094defa7160eb7a6c0.1694190795.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
b1d1e904 |
|
22-Aug-2023 |
Masami Hiramatsu (Google) <mhiramat@kernel.org> |
tracing/probes: Support BTF argument on module functions Since the btf returned from bpf_get_btf_vmlinux() only covers functions in the vmlinux, BTF argument is not available on the functions in the modules. Use bpf_find_btf_id() instead of bpf_get_btf_vmlinux()+btf_find_name_kind() so that BTF argument can find the correct struct btf and btf_type in it. With this fix, fprobe events can use `$arg*` on module functions as below # grep nf_log_ip_packet /proc/kallsyms ffffffffa0005c00 t nf_log_ip_packet [nf_log_syslog] ffffffffa0005bf0 t __pfx_nf_log_ip_packet [nf_log_syslog] # echo 'f nf_log_ip_packet $arg*' > dynamic_events # cat dynamic_events f:fprobes/nf_log_ip_packet__entry nf_log_ip_packet net=net pf=pf hooknum=hooknum skb=skb in=in out=out loginfo=loginfo prefix=prefix To support the module's btf which is removable, the struct btf needs to be ref-counted. So this also records the btf in the traceprobe_parse_context and returns the refcount when the parse has done. Link: https://lore.kernel.org/all/169272154223.160970.3507930084247934031.stgit@devnote2/ Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
#
680ee045 |
|
02-Aug-2023 |
Jakub Kicinski <kuba@kernel.org> |
net: invert the netdevice.h vs xdp.h dependency xdp.h is far more specific and is included in only 67 other files vs netdevice.h's 1538 include sites. Make xdp.h include netdevice.h, instead of the other way around. This decreases the incremental allmodconfig builds size when xdp.h is touched from 5947 to 662 objects. Move bpf_prog_run_xdp() to xdp.h, seems appropriate and filter.h is a mega-header in its own right so it's nice to avoid xdp.h getting included there as well. The only unfortunate part is that the typedef for xdp_features_t has to move to netdevice.h, since its embedded in struct netdevice. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230803010230.1755386-4-kuba@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
33937607 |
|
12-Jul-2023 |
Yafang Shao <laoar.shao@gmail.com> |
bpf: Fix an error in verifying a field in a union We are utilizing BPF LSM to monitor BPF operations within our container environment. When we add support for raw_tracepoint, it hits below error. ; (const void *)attr->raw_tracepoint.name); 27: (79) r3 = *(u64 *)(r2 +0) access beyond the end of member map_type (mend:4) in struct (anon) with off 0 size 8 It can be reproduced with below BPF prog. SEC("lsm/bpf") int BPF_PROG(bpf_audit, int cmd, union bpf_attr *attr, unsigned int size) { switch (cmd) { case BPF_RAW_TRACEPOINT_OPEN: bpf_printk("raw_tracepoint is %s", attr->raw_tracepoint.name); break; default: break; } return 0; } The reason is that when accessing a field in a union, such as bpf_attr, if the field is located within a nested struct that is not the first member of the union, it can result in incorrect field verification. union bpf_attr { struct { __u32 map_type; <<<< Actually it will find that field. __u32 key_size; __u32 value_size; ... }; ... struct { __u64 name; <<<< We want to verify this field. __u32 prog_fd; } raw_tracepoint; }; Considering the potential deep nesting levels, finding a perfect solution to address this issue has proven challenging. Therefore, I propose a solution where we simply skip the verification process if the field in question is located within a union. Fixes: 7e3617a72df3 ("bpf: Add array support to btf_struct_access") Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230713025642.27477-4-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
7ce4dc3e |
|
12-Jul-2023 |
Yafang Shao <laoar.shao@gmail.com> |
bpf: Fix an error around PTR_UNTRUSTED Per discussion with Alexei, the PTR_UNTRUSTED flag should not been cleared when we start to walk a new struct, because the struct in question may be a struct nested in a union. We should also check and set this flag before we walk its each member, in case itself is a union. We will clear this flag if the field is BTF_TYPE_SAFE_RCU_OR_NULL. Fixes: 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the verifier.") Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230713025642.27477-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
819d4342 |
|
26-Jun-2023 |
Stanislav Fomichev <sdf@google.com> |
bpf: Resolve modifiers when walking structs It is impossible to use skb_frag_t in the tracing program. Resolve typedefs when walking structs. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20230626212522.2414485-1-sdf@google.com
|
#
3de4d22c |
|
01-Jul-2023 |
SeongJae Park <sj@kernel.org> |
bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set() __register_btf_kfunc_id_set() assumes .BTF to be part of the module's .ko file if CONFIG_DEBUG_INFO_BTF is enabled. If that's not the case, the function prints an error message and return an error. As a result, such modules cannot be loaded. However, the section could be stripped out during a build process. It would be better to let the modules loaded, because their basic functionalities have no problem [0], though the BTF functionalities will not be supported. Make the function to lower the level of the message from error to warn, and return no error. [0] https://lore.kernel.org/bpf/20220219082037.ow2kbq5brktf4f2u@apollo.legion Fixes: c446fdacb10d ("bpf: fix register_btf_kfunc_id_set for !CONFIG_DEBUG_INFO_BTF") Reported-by: Alexander Egorenkov <Alexander.Egorenkov@ibm.com> Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/87y228q66f.fsf@oc8242746057.ibm.com Link: https://lore.kernel.org/bpf/20220219082037.ow2kbq5brktf4f2u@apollo.legion Link: https://lore.kernel.org/bpf/20230701171447.56464-1-sj@kernel.org
|
#
9724160b |
|
15-Jun-2023 |
Florent Revest <revest@chromium.org> |
bpf/btf: Accept function names that contain dots When building a kernel with LLVM=1, LLVM_IAS=0 and CONFIG_KASAN=y, LLVM leaves DWARF tags for the "asan.module_ctor" & co symbols. In turn, pahole creates BTF_KIND_FUNC entries for these and this makes the BTF metadata validation fail because they contain a dot. In a dramatic turn of event, this BTF verification failure can cause the netfilter_bpf initialization to fail, causing netfilter_core to free the netfilter_helper hashmap and netfilter_ftp to trigger a use-after-free. The risk of u-a-f in netfilter will be addressed separately but the existence of "asan.module_ctor" debug info under some build conditions sounds like a good enough reason to accept functions that contain dots in BTF. Although using only LLVM=1 is the recommended way to compile clang-based kernels, users can certainly do LLVM=1, LLVM_IAS=0 as well and we still try to support that combination according to Nick. To clarify: - > v5.10 kernel, LLVM=1 (LLVM_IAS=0 is not the default) is recommended, but user can still have LLVM=1, LLVM_IAS=0 to trigger the issue - <= 5.10 kernel, LLVM=1 (LLVM_IAS=0 is the default) is recommended in which case GNU as will be used Fixes: 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec") Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Yonghong Song <yhs@meta.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/bpf/20230615145607.3469985-1-revest@chromium.org
|
#
e6c2f594 |
|
30-May-2023 |
Yonghong Song <yhs@fb.com> |
bpf: Silence a warning in btf_type_id_size() syzbot reported a warning in [1] with the following stacktrace: WARNING: CPU: 0 PID: 5005 at kernel/bpf/btf.c:1988 btf_type_id_size+0x2d9/0x9d0 kernel/bpf/btf.c:1988 ... RIP: 0010:btf_type_id_size+0x2d9/0x9d0 kernel/bpf/btf.c:1988 ... Call Trace: <TASK> map_check_btf kernel/bpf/syscall.c:1024 [inline] map_create+0x1157/0x1860 kernel/bpf/syscall.c:1198 __sys_bpf+0x127f/0x5420 kernel/bpf/syscall.c:5040 __do_sys_bpf kernel/bpf/syscall.c:5162 [inline] __se_sys_bpf kernel/bpf/syscall.c:5160 [inline] __x64_sys_bpf+0x79/0xc0 kernel/bpf/syscall.c:5160 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd With the following btf [1] DECL_TAG 'a' type_id=4 component_idx=-1 [2] PTR '(anon)' type_id=0 [3] TYPE_TAG 'a' type_id=2 [4] VAR 'a' type_id=3, linkage=static and when the bpf_attr.btf_key_type_id = 1 (DECL_TAG), the following WARN_ON_ONCE in btf_type_id_size() is triggered: if (WARN_ON_ONCE(!btf_type_is_modifier(size_type) && !btf_type_is_var(size_type))) return NULL; Note that 'return NULL' is the correct behavior as we don't want a DECL_TAG type to be used as a btf_{key,value}_type_id even for the case like 'DECL_TAG -> STRUCT'. So there is no correctness issue here, we just want to silence warning. To silence the warning, I added DECL_TAG as one of kinds in btf_type_nosize() which will cause btf_type_id_size() returning NULL earlier without the warning. [1] https://lore.kernel.org/bpf/000000000000e0df8d05fc75ba86@google.com/ Reported-by: syzbot+958967f249155967d42a@syzkaller.appspotmail.com Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230530205029.264910-1-yhs@fb.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
e924e80e |
|
19-May-2023 |
Aditi Ghag <aditi.ghag@isovalent.com> |
bpf: Add kfunc filter function to 'struct btf_kfunc_id_set' This commit adds the ability to filter kfuncs to certain BPF program types. This is required to limit bpf_sock_destroy kfunc implemented in follow-up commits to programs with attach type 'BPF_TRACE_ITER'. The commit adds a callback filter to 'struct btf_kfunc_id_set'. The filter has access to the `bpf_prog` construct including its properties such as `expected_attached_type`. Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20230519225157.760788-7-aditi.ghag@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
fd9c663b |
|
21-Apr-2023 |
Florian Westphal <fw@strlen.de> |
bpf: minimal support for programs hooked into netfilter framework This adds minimal support for BPF_PROG_TYPE_NETFILTER bpf programs that will be invoked via the NF_HOOK() points in the ip stack. Invocation incurs an indirect call. This is not a necessity: Its possible to add 'DEFINE_BPF_DISPATCHER(nf_progs)' and handle the program invocation with the same method already done for xdp progs. This isn't done here to keep the size of this chunk down. Verifier restricts verdicts to either DROP or ACCEPT. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-3-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
acf1c3d6 |
|
20-Apr-2023 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Fix race between btf_put and btf_idr walk. Florian and Eduard reported hard dead lock: [ 58.433327] _raw_spin_lock_irqsave+0x40/0x50 [ 58.433334] btf_put+0x43/0x90 [ 58.433338] bpf_find_btf_id+0x157/0x240 [ 58.433353] btf_parse_fields+0x921/0x11c0 This happens since btf->refcount can be 1 at the time of btf_put() and btf_put() will call btf_free_id() which will try to grab btf_idr_lock and will dead lock. Avoid the issue by doing btf_put() without locking. Fixes: 3d78417b60fb ("bpf: Add bpf_btf_find_by_name_kind() helper.") Fixes: 1e89106da253 ("bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().") Reported-by: Florian Westphal <fw@strlen.de> Reported-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20230421014901.70908-1-alexei.starovoitov@gmail.com
|
#
2569c7b8 |
|
19-Apr-2023 |
Feng Zhou <zhoufeng.zf@bytedance.com> |
bpf: support access variable length array of integer type After this commit: bpf: Support variable length array in tracing programs (9c5f8a1008a1) Trace programs can access variable length array, but for structure type. This patch adds support for integer type. Example: Hook load_balance struct sched_domain { ... unsigned long span[]; } The access: sd->span[0]. Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com> Link: https://lore.kernel.org/r/20230420032735.27760-2-zhoufeng.zf@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
404ad75a |
|
15-Apr-2023 |
Dave Marchevsky <davemarchevsky@fb.com> |
bpf: Migrate bpf_rbtree_remove to possibly fail This patch modifies bpf_rbtree_remove to account for possible failure due to the input rb_node already not being in any collection. The function can now return NULL, and does when the aforementioned scenario occurs. As before, on successful removal an owning reference to the removed node is returned. Adding KF_RET_NULL to bpf_rbtree_remove's kfunc flags - now KF_RET_NULL | KF_ACQUIRE - provides the desired verifier semantics: * retval must be checked for NULL before use * if NULL, retval's ref_obj_id is released * retval is a "maybe acquired" owning ref, not a non-owning ref, so it will live past end of critical section (bpf_spin_unlock), and thus can be checked for NULL after the end of the CS BPF programs must add checks ============================ This does change bpf_rbtree_remove's verifier behavior. BPF program writers will need to add NULL checks to their programs, but the resulting UX looks natural: bpf_spin_lock(&glock); n = bpf_rbtree_first(&ghead); if (!n) { /* ... */} res = bpf_rbtree_remove(&ghead, &n->node); bpf_spin_unlock(&glock); if (!res) /* Newly-added check after this patch */ return 1; n = container_of(res, /* ... */); /* Do something else with n */ bpf_obj_drop(n); return 0; The "if (!res)" check above is the only addition necessary for the above program to pass verification after this patch. bpf_rbtree_remove no longer clobbers non-owning refs ==================================================== An issue arises when bpf_rbtree_remove fails, though. Consider this example: struct node_data { long key; struct bpf_list_node l; struct bpf_rb_node r; struct bpf_refcount ref; }; long failed_sum; void bpf_prog() { struct node_data *n = bpf_obj_new(/* ... */); struct bpf_rb_node *res; n->key = 10; bpf_spin_lock(&glock); bpf_list_push_back(&some_list, &n->l); /* n is now a non-owning ref */ res = bpf_rbtree_remove(&some_tree, &n->r, /* ... */); if (!res) failed_sum += n->key; /* not possible */ bpf_spin_unlock(&glock); /* if (res) { do something useful and drop } ... */ } The bpf_rbtree_remove in this example will always fail. Similarly to bpf_spin_unlock, bpf_rbtree_remove is a non-owning reference invalidation point. The verifier clobbers all non-owning refs after a bpf_rbtree_remove call, so the "failed_sum += n->key" line will fail verification, and in fact there's no good way to get information about the node which failed to add after the invalidation. This patch removes non-owning reference invalidation from bpf_rbtree_remove to allow the above usecase to pass verification. The logic for why this is now possible is as follows: Before this series, bpf_rbtree_add couldn't fail and thus assumed that its input, a non-owning reference, was in the tree. But it's easy to construct an example where two non-owning references pointing to the same underlying memory are acquired and passed to rbtree_remove one after another (see rbtree_api_release_aliasing in selftests/bpf/progs/rbtree_fail.c). So it was necessary to clobber non-owning refs to prevent this case and, more generally, to enforce "non-owning ref is definitely in some collection" invariant. This series removes that invariant and the failure / runtime checking added in this patch provide a clean way to deal with the aliasing issue - just fail to remove. Because the aliasing issue prevented by clobbering non-owning refs is no longer an issue, this patch removes the invalidate_non_owning_refs call from verifier handling of bpf_rbtree_remove. Note that bpf_spin_unlock - the other caller of invalidate_non_owning_refs - clobbers non-owning refs for a different reason, so its clobbering behavior remains unchanged. No BPF program changes are necessary for programs to remain valid as a result of this clobbering change. A valid program before this patch passed verification with its non-owning refs having shorter (or equal) lifetimes due to more aggressive clobbering. Also, update existing tests to check bpf_rbtree_remove retval for NULL where necessary, and move rbtree_api_release_aliasing from progs/rbtree_fail.c to progs/rbtree.c since it's now expected to pass verification. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230415201811.343116-8-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
d54730b5 |
|
15-Apr-2023 |
Dave Marchevsky <davemarchevsky@fb.com> |
bpf: Introduce opaque bpf_refcount struct and add btf_record plumbing A 'struct bpf_refcount' is added to the set of opaque uapi/bpf.h types meant for use in BPF programs. Similarly to other opaque types like bpf_spin_lock and bpf_rbtree_node, the verifier needs to know where in user-defined struct types a bpf_refcount can be located, so necessary btf_record plumbing is added to enable this. bpf_refcount is sized to hold a refcount_t. Similarly to bpf_spin_lock, the offset of a bpf_refcount is cached in btf_record as refcount_off in addition to being in the field array. Caching refcount_off makes sense for this field because further patches in the series will modify functions that take local kptrs (e.g. bpf_obj_drop) to change their behavior if the type they're operating on is refcounted. So enabling fast "is this type refcounted?" checks is desirable. No such verifier behavior changes are introduced in this patch, just logic to recognize 'struct bpf_refcount' in btf_record. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230415201811.343116-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
cd2a8079 |
|
15-Apr-2023 |
Dave Marchevsky <davemarchevsky@fb.com> |
bpf: Remove btf_field_offs, use btf_record's fields instead The btf_field_offs struct contains (offset, size) for btf_record fields, sorted by offset. btf_field_offs is always used in conjunction with btf_record, which has btf_field 'fields' array with (offset, type), the latter of which btf_field_offs' size is derived from via btf_field_type_size. This patch adds a size field to struct btf_field and sorts btf_record's fields by offset, making it possible to get rid of btf_field_offs. Less data duplication and less code complexity results. Since btf_field_offs' lifetime closely followed the btf_record used to populate it, most complexity wins are from removal of initialization code like: if (btf_record_successfully_initialized) { foffs = btf_parse_field_offs(rec); if (IS_ERR_OR_NULL(foffs)) // free the btf_record and return err } Other changes in this patch are pretty mechanical: * foffs->field_off[i] -> rec->fields[i].offset * foffs->field_sz[i] -> rec->fields[i].size * Sort rec->fields in btf_parse_fields before returning * It's possible that this is necessary independently of other changes in this patch. btf_record_find in syscall.c expects btf_record's fields to be sorted by offset, yet there's no explicit sorting of them before this patch, record's fields are populated in the order they're read from BTF struct definition. BTF docs don't say anything about the sortedness of struct fields. * All functions taking struct btf_field_offs * input now instead take struct btf_record *. All callsites of these functions already have access to the correct btf_record. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230415201811.343116-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
91f2dc68 |
|
10-Apr-2023 |
Feng Zhou <zhoufeng.zf@bytedance.com> |
bpf/btf: Fix is_int_ptr() When tracing a kernel function with arg type is u32*, btf_ctx_access() would report error: arg2 type INT is not a struct. The commit bb6728d75611 ("bpf: Allow access to int pointer arguments in tracing programs") added support for int pointer, but did not skip modifiers before checking it's type. This patch fixes it. Fixes: bb6728d75611 ("bpf: Allow access to int pointer arguments in tracing programs") Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20230410085908.98493-2-zhoufeng.zf@bytedance.com
|
#
bdcab414 |
|
06-Apr-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Simplify internal verifier log interface Simplify internal verifier log API down to bpf_vlog_init() and bpf_vlog_finalize(). The former handles input arguments validation in one place and makes it easier to change it. The latter subsumes -ENOSPC (truncation) and -EFAULT handling and simplifies both caller's code (bpf_check() and btf_parse()). For btf_parse(), this patch also makes sure that verifier log finalization happens even if there is some error condition during BTF verification process prior to normal finalization step. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/bpf/20230406234205.323208-14-andrii@kernel.org
|
#
47a71c1f |
|
06-Apr-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Add log_true_size output field to return necessary log buffer size Add output-only log_true_size and btf_log_true_size field to BPF_PROG_LOAD and BPF_BTF_LOAD commands, respectively. It will return the size of log buffer necessary to fit in all the log contents at specified log_level. This is very useful for BPF loader libraries like libbpf to be able to size log buffer correctly, but could be used by users directly, if necessary, as well. This patch plumbs all this through the code, taking into account actual bpf_attr size provided by user to determine if these new fields are expected by users. And if they are, set them from kernel on return. We refactory btf_parse() function to accommodate this, moving attr and uattr handling inside it. The rest is very straightforward code, which is split from the logging accounting changes in the previous patch to make it simpler to review logic vs UAPI changes. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/bpf/20230406234205.323208-13-andrii@kernel.org
|
#
8a6ca6bc |
|
06-Apr-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Simplify logging-related error conditions handling Move log->level == 0 check into bpf_vlog_truncated() instead of doing it explicitly. Also remove unnecessary goto in kernel/bpf/verifier.c. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/bpf/20230406234205.323208-11-andrii@kernel.org
|
#
971fb505 |
|
06-Apr-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Fix missing -EFAULT return on user log buf error in btf_parse() btf_parse() is missing -EFAULT error return if log->ubuf was NULL-ed out due to error while copying data into user-provided buffer. Add it, but handle a special case of BPF_LOG_KERNEL in which log->ubuf is always NULL. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/bpf/20230406234205.323208-9-andrii@kernel.org
|
#
12166409 |
|
06-Apr-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Switch BPF verifier log to be a rotating log by default Currently, if user-supplied log buffer to collect BPF verifier log turns out to be too small to contain full log, bpf() syscall returns -ENOSPC, fails BPF program verification/load, and preserves first N-1 bytes of the verifier log (where N is the size of user-supplied buffer). This is problematic in a bunch of common scenarios, especially when working with real-world BPF programs that tend to be pretty complex as far as verification goes and require big log buffers. Typically, it's when debugging tricky cases at log level 2 (verbose). Also, when BPF program is successfully validated, log level 2 is the only way to actually see verifier state progression and all the important details. Even with log level 1, it's possible to get -ENOSPC even if the final verifier log fits in log buffer, if there is a code path that's deep enough to fill up entire log, even if normally it would be reset later on (there is a logic to chop off successfully validated portions of BPF verifier log). In short, it's not always possible to pre-size log buffer. Also, what's worse, in practice, the end of the log most often is way more important than the beginning, but verifier stops emitting log as soon as initial log buffer is filled up. This patch switches BPF verifier log behavior to effectively behave as rotating log. That is, if user-supplied log buffer turns out to be too short, verifier will keep overwriting previously written log, effectively treating user's log buffer as a ring buffer. -ENOSPC is still going to be returned at the end, to notify user that log contents was truncated, but the important last N bytes of the log would be returned, which might be all that user really needs. This consistent -ENOSPC behavior, regardless of rotating or fixed log behavior, allows to prevent backwards compatibility breakage. The only user-visible change is which portion of verifier log user ends up seeing *if buffer is too small*. Given contents of verifier log itself is not an ABI, there is no breakage due to this behavior change. Specialized tools that rely on specific contents of verifier log in -ENOSPC scenario are expected to be easily adapted to accommodate old and new behaviors. Importantly, though, to preserve good user experience and not require every user-space application to adopt to this new behavior, before exiting to user-space verifier will rotate log (in place) to make it start at the very beginning of user buffer as a continuous zero-terminated string. The contents will be a chopped off N-1 last bytes of full verifier log, of course. Given beginning of log is sometimes important as well, we add BPF_LOG_FIXED (which equals 8) flag to force old behavior, which allows tools like veristat to request first part of verifier log, if necessary. BPF_LOG_FIXED flag is also a simple and straightforward way to check if BPF verifier supports rotating behavior. On the implementation side, conceptually, it's all simple. We maintain 64-bit logical start and end positions. If we need to truncate the log, start position will be adjusted accordingly to lag end position by N bytes. We then use those logical positions to calculate their matching actual positions in user buffer and handle wrap around the end of the buffer properly. Finally, right before returning from bpf_check(), we rotate user log buffer contents in-place as necessary, to make log contents contiguous. See comments in relevant functions for details. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/bpf/20230406234205.323208-4-andrii@kernel.org
|
#
63260df1 |
|
03-Apr-2023 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Refactor btf_nested_type_is_trusted(). btf_nested_type_is_trusted() tries to find a struct member at corresponding offset. It works for flat structures and falls apart in more complex structs with nested structs. The offset->member search is already performed by btf_struct_walk() including nested structs. Reuse this work and pass {field name, field btf id} into btf_nested_type_is_trusted() instead of offset to make BTF_TYPE_SAFE*() logic more robust. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230404045029.82870-4-alexei.starovoitov@gmail.com
|
#
9e36a204 |
|
13-Mar-2023 |
Dave Marchevsky <davemarchevsky@fb.com> |
bpf: Disable migration when freeing stashed local kptr using obj drop When a local kptr is stashed in a map and freed when the map goes away, currently an error like the below appears: [ 39.195695] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u32:15/2875 [ 39.196549] caller is bpf_mem_free+0x56/0xc0 [ 39.196958] CPU: 15 PID: 2875 Comm: kworker/u32:15 Tainted: G O 6.2.0-13016-g22df776a9a86 #4477 [ 39.197897] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 39.198949] Workqueue: events_unbound bpf_map_free_deferred [ 39.199470] Call Trace: [ 39.199703] <TASK> [ 39.199911] dump_stack_lvl+0x60/0x70 [ 39.200267] check_preemption_disabled+0xbf/0xe0 [ 39.200704] bpf_mem_free+0x56/0xc0 [ 39.201032] ? bpf_obj_new_impl+0xa0/0xa0 [ 39.201430] bpf_obj_free_fields+0x1cd/0x200 [ 39.201838] array_map_free+0xad/0x220 [ 39.202193] ? finish_task_switch+0xe5/0x3c0 [ 39.202614] bpf_map_free_deferred+0xea/0x210 [ 39.203006] ? lockdep_hardirqs_on_prepare+0xe/0x220 [ 39.203460] process_one_work+0x64f/0xbe0 [ 39.203822] ? pwq_dec_nr_in_flight+0x110/0x110 [ 39.204264] ? do_raw_spin_lock+0x107/0x1c0 [ 39.204662] ? lockdep_hardirqs_on_prepare+0xe/0x220 [ 39.205107] worker_thread+0x74/0x7a0 [ 39.205451] ? process_one_work+0xbe0/0xbe0 [ 39.205818] kthread+0x171/0x1a0 [ 39.206111] ? kthread_complete_and_exit+0x20/0x20 [ 39.206552] ret_from_fork+0x1f/0x30 [ 39.206886] </TASK> This happens because the call to __bpf_obj_drop_impl I added in the patch adding support for stashing local kptrs doesn't disable migration. Prior to that patch, __bpf_obj_drop_impl logic only ran when called by a BPF progarm, whereas now it can be called from map free path, so it's necessary to explicitly disable migration. Also, refactor a bit to just call __bpf_obj_drop_impl directly instead of bothering w/ dtor union and setting pointer-to-obj_drop. Fixes: c8e187540914 ("bpf: Support __kptr to local kptrs") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230313214641.3731908-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
c8e18754 |
|
10-Mar-2023 |
Dave Marchevsky <davemarchevsky@fb.com> |
bpf: Support __kptr to local kptrs If a PTR_TO_BTF_ID type comes from program BTF - not vmlinux or module BTF - it must have been allocated by bpf_obj_new and therefore must be free'd with bpf_obj_drop. Such a PTR_TO_BTF_ID is considered a "local kptr" and is tagged with MEM_ALLOC type tag by bpf_obj_new. This patch adds support for treating __kptr-tagged pointers to "local kptrs" as having an implicit bpf_obj_drop destructor for referenced kptr acquire / release semantics. Consider the following example: struct node_data { long key; long data; struct bpf_rb_node node; }; struct map_value { struct node_data __kptr *node; }; struct { __uint(type, BPF_MAP_TYPE_ARRAY); __type(key, int); __type(value, struct map_value); __uint(max_entries, 1); } some_nodes SEC(".maps"); If struct node_data had a matching definition in kernel BTF, the verifier would expect a destructor for the type to be registered. Since struct node_data does not match any type in kernel BTF, the verifier knows that there is no kfunc that provides a PTR_TO_BTF_ID to this type, and that such a PTR_TO_BTF_ID can only come from bpf_obj_new. So instead of searching for a registered dtor, a bpf_obj_drop dtor can be assumed. This allows the runtime to properly destruct such kptrs in bpf_obj_free_fields, which enables maps to clean up map_vals w/ such kptrs when going away. Implementation notes: * "kernel_btf" variable is renamed to "kptr_btf" in btf_parse_kptr. Before this patch, the variable would only ever point to vmlinux or module BTFs, but now it can point to some program BTF for local kptr type. It's later used to populate the (btf, btf_id) pair in kptr btf field. * It's necessary to btf_get the program BTF when populating btf_field for local kptr. btf_record_free later does a btf_put. * Behavior for non-local referenced kptrs is not modified, as bpf_find_btf_id helper only searches vmlinux and module BTFs for matching BTF type. If such a type is found, btf_field_kptr's btf will pass btf_is_kernel check, and the associated release function is some one-argument dtor. If btf_is_kernel check fails, associated release function is two-arg bpf_obj_drop_impl. Before this patch only btf_field_kptr's w/ kernel or module BTFs were created. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230310230743.2320707-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
a4aa38897 |
|
09-Mar-2023 |
Dave Marchevsky <davemarchevsky@fb.com> |
bpf: btf: Remove unused btf_field_info_type enum This enum was added and used in commit aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record"). Later refactoring in commit db559117828d ("bpf: Consolidate spin_lock, timer management into btf_record") resulted in the enum values no longer being used anywhere. Let's remove them. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230309180111.1618459-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
215bf496 |
|
08-Mar-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: add iterator kfuncs registration and validation logic Add ability to register kfuncs that implement BPF open-coded iterator contract and enforce naming and function proto convention. Enforcement happens at the time of kfunc registration and significantly simplifies the rest of iterators logic in the verifier. More details follow in subsequent patches, but we enforce the following conditions. All kfuncs (constructor, next, destructor) have to be named consistenly as bpf_iter_<type>_{new,next,destroy}(), respectively. <type> represents iterator type, and iterator state should be represented as a matching `struct bpf_iter_<type>` state type. Also, all iter kfuncs should have a pointer to this `struct bpf_iter_<type>` as the very first argument. Additionally: - Constructor, i.e., bpf_iter_<type>_new(), can have arbitrary extra number of arguments. Return type is not enforced either. - Next method, i.e., bpf_iter_<type>_next(), has to return a pointer type and should have exactly one argument: `struct bpf_iter_<type> *` (const/volatile/restrict and typedefs are ignored). - Destructor, i.e., bpf_iter_<type>_destroy(), should return void and should have exactly one argument, similar to the next method. - struct bpf_iter_<type> size is enforced to be positive and a multiple of 8 bytes (to fit stack slots correctly). Such strictness and consistency allows to build generic helpers abstracting important, but boilerplate, details to be able to use open-coded iterators effectively and ergonomically (see bpf_for_each() in subsequent patches). It also simplifies the verifier logic in some places. At the same time, this doesn't hurt generality of possible iterator implementations. Win-win. Constructor kfunc is marked with a new KF_ITER_NEW flags, next method is marked with KF_ITER_NEXT (and should also have KF_RET_NULL, of course), while destructor kfunc is marked as KF_ITER_DESTROY. Additionally, we add a trivial kfunc name validation: it should be a valid non-NULL and non-empty string. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230308184121.1165081-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
6fcd486b |
|
02-Mar-2023 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Refactor RCU enforcement in the verifier. bpf_rcu_read_lock/unlock() are only available in clang compiled kernels. Lack of such key mechanism makes it impossible for sleepable bpf programs to use RCU pointers. Allow bpf_rcu_read_lock/unlock() in GCC compiled kernels (though GCC doesn't support btf_type_tag yet) and allowlist certain field dereferences in important data structures like tast_struct, cgroup, socket that are used by sleepable programs either as RCU pointer or full trusted pointer (which is valid outside of RCU CS). Use BTF_TYPE_SAFE_RCU and BTF_TYPE_SAFE_TRUSTED macros for such tagging. They will be removed once GCC supports btf_type_tag. With that refactor check_ptr_to_btf_access(). Make it strict in enforcing PTR_TRUSTED and PTR_UNTRUSTED while deprecating old PTR_TO_BTF_ID without modifier flags. There is a chance that this strict enforcement might break existing programs (especially on GCC compiled kernels), but this cleanup has to start sooner than later. Note PTR_TO_CTX access still yields old deprecated PTR_TO_BTF_ID. Once it's converted to strict PTR_TRUSTED or PTR_UNTRUSTED the kfuncs and helpers will be able to default to KF_TRUSTED_ARGS. KF_RCU will remain as a weaker version of KF_TRUSTED_ARGS where obj refcnt could be 0. Adjust rcu_read_lock selftest to run on gcc and clang compiled kernels. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-7-alexei.starovoitov@gmail.com
|
#
03b77e17 |
|
02-Mar-2023 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted. __kptr meant to store PTR_UNTRUSTED kernel pointers inside bpf maps. The concept felt useful, but didn't get much traction, since bpf_rdonly_cast() was added soon after and bpf programs received a simpler way to access PTR_UNTRUSTED kernel pointers without going through restrictive __kptr usage. Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted to indicate its intended usage. The main goal of __kptr_untrusted was to read/write such pointers directly while bpf_kptr_xchg was a mechanism to access refcnted kernel pointers. The next patch will allow RCU protected __kptr access with direct read. At that point __kptr_untrusted will be deprecated. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-2-alexei.starovoitov@gmail.com
|
#
b5964b96 |
|
01-Mar-2023 |
Joanne Koong <joannelkoong@gmail.com> |
bpf: Add skb dynptrs Add skb dynptrs, which are dynptrs whose underlying pointer points to a skb. The dynptr acts on skb data. skb dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of skb->data and skb->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For bpf prog types that don't support writes on skb data, the dynptr is read-only (bpf_dynptr_write() will return an error) For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write() interfaces, reading and writing from/to data in the head as well as from/to non-linear paged buffers is supported. Data slices through the bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used. For examples of how skb dynptrs can be used, please see the attached selftests. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
2f464393 |
|
01-Mar-2023 |
Joanne Koong <joannelkoong@gmail.com> |
bpf: Support "sk_buff" and "xdp_buff" as valid kfunc arg types The bpf mirror of the in-kernel sk_buff and xdp_buff data structures are __sk_buff and xdp_md. Currently, when we pass in the program ctx to a kfunc where the program ctx is a skb or xdp buffer, we reject the program if the in-kernel definition is sk_buff/xdp_buff instead of __sk_buff/xdp_md. This change allows "sk_buff <--> __sk_buff" and "xdp_buff <--> xdp_md" to be recognized as valid matches. The user program may pass in their program ctx as a __sk_buff or xdp_md, and the in-kernel definition of the kfunc may define this arg as a sk_buff or xdp_buff. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-2-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
9b459804 |
|
06-Mar-2023 |
Lorenz Bauer <lorenz.bauer@isovalent.com> |
btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR btf_datasec_resolve contains a bug that causes the following BTF to fail loading: [1] DATASEC a size=2 vlen=2 type_id=4 offset=0 size=1 type_id=7 offset=1 size=1 [2] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none) [3] PTR (anon) type_id=2 [4] VAR a type_id=3 linkage=0 [5] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none) [6] TYPEDEF td type_id=5 [7] VAR b type_id=6 linkage=0 This error message is printed during btf_check_all_types: [1] DATASEC a size=2 vlen=2 type_id=7 offset=1 size=1 Invalid type By tracing btf_*_resolve we can pinpoint the problem: btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_TBD) = 0 btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_TBD) = 0 btf_ptr_resolve(depth: 3, type_id: 3, mode: RESOLVE_PTR) = 0 btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_PTR) = 0 btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_PTR) = -22 The last invocation of btf_datasec_resolve should invoke btf_var_resolve by means of env_stack_push, instead it returns EINVAL. The reason is that env_stack_push is never executed for the second VAR. if (!env_type_is_resolve_sink(env, var_type) && !env_type_is_resolved(env, var_type_id)) { env_stack_set_next_member(env, i + 1); return env_stack_push(env, var_type, var_type_id); } env_type_is_resolve_sink() changes its behaviour based on resolve_mode. For RESOLVE_PTR, we can simplify the if condition to the following: (btf_type_is_modifier() || btf_type_is_ptr) && !env_type_is_resolved() Since we're dealing with a VAR the clause evaluates to false. This is not sufficient to trigger the bug however. The log output and EINVAL are only generated if btf_type_id_size() fails. if (!btf_type_id_size(btf, &type_id, &type_size)) { btf_verifier_log_vsi(env, v->t, vsi, "Invalid type"); return -EINVAL; } Most types are sized, so for example a VAR referring to an INT is not a problem. The bug is only triggered if a VAR points at a modifier. Since we skipped btf_var_resolve that modifier was also never resolved, which means that btf_resolved_type_id returns 0 aka VOID for the modifier. This in turn causes btf_type_id_size to return NULL, triggering EINVAL. To summarise, the following conditions are necessary: - VAR pointing at PTR, STRUCT, UNION or ARRAY - Followed by a VAR pointing at TYPEDEF, VOLATILE, CONST, RESTRICT or TYPE_TAG The fix is to reset resolve_mode to RESOLVE_TBD before attempting to resolve a VAR from a DATASEC. Fixes: 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec") Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/r/20230306112138.155352-2-lmb@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
d384dce2 |
|
15-Feb-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Fix global subprog context argument resolution logic KPROBE program's user-facing context type is defined as typedef bpf_user_pt_regs_t. This leads to a problem when trying to passing kprobe/uprobe/usdt context argument into global subprog, as kernel always strip away mods and typedefs of user-supplied type, but takes expected type from bpf_ctx_convert as is, which causes mismatch. Current way to work around this is to define a fake struct with the same name as expected typedef: struct bpf_user_pt_regs_t {}; __noinline my_global_subprog(struct bpf_user_pt_regs_t *ctx) { ... } This patch fixes the issue by resolving expected type, if it's not a struct. It still leaves the above work-around working for backwards compatibility. Fixes: 91cc1a99740e ("bpf: Annotate context types") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230216045954.3002473-2-andrii@kernel.org
|
#
a40d3632 |
|
13-Feb-2023 |
Dave Marchevsky <davemarchevsky@fb.com> |
bpf: Special verifier handling for bpf_rbtree_{remove, first} Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties that require handling in the verifier: * both bpf_rbtree_remove and bpf_rbtree_first return the type containing the bpf_rb_node field, with the offset set to that field's offset, instead of a struct bpf_rb_node * * mark_reg_graph_node helper added in previous patch generalizes this logic, use it * bpf_rbtree_remove's node input is a node that's been inserted in the tree - a non-owning reference. * bpf_rbtree_remove must invalidate non-owning references in order to avoid aliasing issue. Use previously-added invalidate_non_owning_refs helper to mark this function as a non-owning ref invalidation point. * Unlike other functions, which convert one of their input arg regs to non-owning reference, bpf_rbtree_first takes no arguments and just returns a non-owning reference (possibly null) * For now verifier logic for this is special-cased instead of adding new kfunc flag. This patch, along with the previous one, complete special verifier handling for all rbtree API functions added in this series. With functional verifier handling of rbtree_remove, under current non-owning reference scheme, a node type with both bpf_{list,rb}_node fields could cause the verifier to accept programs which remove such nodes from collections they haven't been added to. In order to prevent this, this patch adds a check to btf_parse_fields which rejects structs with both bpf_{list,rb}_node fields. This is a temporary measure that can be removed after "collection identity" followup. See comment added in btf_parse_fields. A linked_list BTF test exercising the new check is added in this patch as well. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-6-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
9c395c1b |
|
13-Feb-2023 |
Dave Marchevsky <davemarchevsky@fb.com> |
bpf: Add basic bpf_rb_{root,node} support This patch adds special BPF_RB_{ROOT,NODE} btf_field_types similar to BPF_LIST_{HEAD,NODE}, adds the necessary plumbing to detect the new types, and adds bpf_rb_root_free function for freeing bpf_rb_root in map_values. structs bpf_rb_root and bpf_rb_node are opaque types meant to obscure structs rb_root_cached rb_node, respectively. btf_struct_access will prevent BPF programs from touching these special fields automatically now that they're recognized. btf_check_and_fixup_fields now groups list_head and rb_root together as "graph root" fields and {list,rb}_node as "graph node", and does same ownership cycle checking as before. Note that this function does _not_ prevent ownership type mixups (e.g. rb_root owning list_node) - that's handled by btf_parse_graph_root. After this patch, a bpf program can have a struct bpf_rb_root in a map_value, but not add anything to nor do anything useful with it. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
49f67f39 |
|
27-Jan-2023 |
Ilya Leoshkevich <iii@linux.ibm.com> |
bpf: btf: Add BTF_FMODEL_SIGNED_ARG flag s390x eBPF JIT needs to know whether a function return value is signed and which function arguments are signed, in order to generate code compliant with the s390x ABI. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-26-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
b613d335 |
|
20-Jan-2023 |
David Vernet <void@manifault.com> |
bpf: Allow trusted args to walk struct when checking BTF IDs When validating BTF types for KF_TRUSTED_ARGS kfuncs, the verifier currently enforces that the top-level type must match when calling the kfunc. In other words, the verifier does not allow the BPF program to pass a bitwise equivalent struct, despite it being allowed according to the C standard. For example, if you have the following type: struct nf_conn___init { struct nf_conn ct; }; The C standard stipulates that it would be safe to pass a struct nf_conn___init to a kfunc expecting a struct nf_conn. The verifier currently disallows this, however, as semantically kfuncs may want to enforce that structs that have equivalent types according to the C standard, but have different BTF IDs, are not able to be passed to kfuncs expecting one or the other. For example, struct nf_conn___init may not be queried / looked up, as it is allocated but may not yet be fully initialized. On the other hand, being able to pass types that are equivalent according to the C standard will be useful for other types of kfunc / kptrs enabled by BPF. For example, in a follow-on patch, a series of kfuncs will be added which allow programs to do bitwise queries on cpumasks that are either allocated by the program (in which case they'll be a 'struct bpf_cpumask' type that wraps a cpumask_t as its first element), or a cpumask that was allocated by the main kernel (in which case it will just be a straight cpumask_t, as in task->cpus_ptr). Having the two types of cpumasks allows us to distinguish between the two for when a cpumask is read-only vs. mutatable. A struct bpf_cpumask can be mutated by e.g. bpf_cpumask_clear(), whereas a regular cpumask_t cannot be. On the other hand, a struct bpf_cpumask can of course be queried in the exact same manner as a cpumask_t, with e.g. bpf_cpumask_test_cpu(). If we were to enforce that top level types match, then a user that's passing a struct bpf_cpumask to a read-only cpumask_t argument would have to cast with something like bpf_cast_to_kern_ctx() (which itself would need to be updated to expect the alias, and currently it only accommodates a single alias per prog type). Additionally, not specifying KF_TRUSTED_ARGS is not an option, as some kfuncs take one argument as a struct bpf_cpumask *, and another as a struct cpumask * (i.e. cpumask_t). In order to enable this, this patch relaxes the constraint that a KF_TRUSTED_ARGS kfunc must have strict type matching, and instead only enforces strict type matching if a type is observed to be a "no-cast alias" (i.e., that the type names are equivalent, but one is suffixed with ___init). Additionally, in order to try and be conservative and match existing behavior / expectations, this patch also enforces strict type checking for acquire kfuncs. We were already enforcing it for release kfuncs, so this should also improve the consistency of the semantics for kfuncs. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230120192523.3650503-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
57539b1c |
|
20-Jan-2023 |
David Vernet <void@manifault.com> |
bpf: Enable annotating trusted nested pointers In kfuncs, a "trusted" pointer is a pointer that the kfunc can assume is safe, and which the verifier will allow to be passed to a KF_TRUSTED_ARGS kfunc. Currently, a KF_TRUSTED_ARGS kfunc disallows any pointer to be passed at a nonzero offset, but sometimes this is in fact safe if the "nested" pointer's lifetime is inherited from its parent. For example, the const cpumask_t *cpus_ptr field in a struct task_struct will remain valid until the task itself is destroyed, and thus would also be safe to pass to a KF_TRUSTED_ARGS kfunc. While it would be conceptually simple to enable this by using BTF tags, gcc unfortunately does not yet support this. In the interim, this patch enables support for this by using a type-naming convention. A new BTF_TYPE_SAFE_NESTED macro is defined in verifier.c which allows a developer to specify the nested fields of a type which are considered trusted if its parent is also trusted. The verifier is also updated to account for this. A patch with selftests will be added in a follow-on change, along with documentation for this feature. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230120192523.3650503-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
9cb61e50 |
|
06-Jan-2023 |
Connor O'Brien <connoro@google.com> |
bpf: btf: limit logging of ignored BTF mismatches Enabling CONFIG_MODULE_ALLOW_BTF_MISMATCH is an indication that BTF mismatches are expected and module loading should proceed anyway. Logging with pr_warn() on every one of these "benign" mismatches creates unnecessary noise when many such modules are loaded. Instead, handle this case with a single log warning that BTF info may be unavailable. Mismatches also result in calls to __btf_verifier_log() via __btf_verifier_log_type() or btf_verifier_log_member(), adding several additional lines of logging per mismatched module. Add checks to these paths to skip logging for module BTF mismatches in the "allow mismatch" case. All existing logging behavior is preserved in the default CONFIG_MODULE_ALLOW_BTF_MISMATCH=n case. Signed-off-by: Connor O'Brien <connoro@google.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230107025331.3240536-1-connoro@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
30465003 |
|
17-Dec-2022 |
Dave Marchevsky <davemarchevsky@fb.com> |
bpf: rename list_head -> graph_root in field info types Many of the structs recently added to track field info for linked-list head are useful as-is for rbtree root. So let's do a mechanical renaming of list_head-related types and fields: include/linux/bpf.h: struct btf_field_list_head -> struct btf_field_graph_root list_head -> graph_root in struct btf_field union kernel/bpf/btf.c: list_head -> graph_root in struct btf_field_info This is a nonfunctional change, functionality to actually use these fields for rbtree will be added in further patches. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20221217082506.1570898-5-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
74bc3a5a |
|
20-Jan-2023 |
Jiri Olsa <jolsa@kernel.org> |
bpf: Add missing btf_put to register_btf_id_dtor_kfuncs We take the BTF reference before we register dtors and we need to put it back when it's done. We probably won't se a problem with kernel BTF, but module BTF would stay loaded (because of the extra ref) even when its module is removed. Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Fixes: 5ce937d613a4 ("bpf: Populate pairs of btf_id and destructor kfunc in btf") Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230120122148.1522359-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
5b481aca |
|
06-Dec-2022 |
Benjamin Tissoires <benjamin.tissoires@redhat.com> |
bpf: do not rely on ALLOW_ERROR_INJECTION for fmod_ret The current way of expressing that a non-bpf kernel component is willing to accept that bpf programs can be attached to it and that they can change the return value is to abuse ALLOW_ERROR_INJECTION. This is debated in the link below, and the result is that it is not a reasonable thing to do. Reuse the kfunc declaration structure to also tag the kernel functions we want to be fmodret. This way we can control from any subsystem which functions are being modified by bpf without touching the verifier. Link: https://lore.kernel.org/all/20221121104403.1545f9b5@gandalf.local.home/ Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20221206145936.922196-2-benjamin.tissoires@redhat.com
|
#
c0c852dd |
|
03-Dec-2022 |
Yonghong Song <yhs@fb.com> |
bpf: Do not mark certain LSM hook arguments as trusted Martin mentioned that the verifier cannot assume arguments from LSM hook sk_alloc_security being trusted since after the hook is called, the sk ref_count is set to 1. This will overwrite the ref_count changed by the bpf program and may cause ref_count underflow later on. I then further checked some other hooks. For example, for bpf_lsm_file_alloc() hook in fs/file_table.c, f->f_cred = get_cred(cred); error = security_file_alloc(f); if (unlikely(error)) { file_free_rcu(&f->f_rcuhead); return ERR_PTR(error); } atomic_long_set(&f->f_count, 1); The input parameter 'f' to security_file_alloc() cannot be trusted as well. Specifically, I investiaged bpf_map/bpf_prog/file/sk/task alloc/free lsm hooks. Except bpf_map_alloc and task_alloc, arguments for all other hooks should not be considered as trusted. This may not be a complete list, but it covers common usage for sk and task. Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs") Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221203204954.2043348-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
c6b0337f |
|
24-Nov-2022 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Don't mark arguments to fentry/fexit programs as trusted. The PTR_TRUSTED flag should only be applied to pointers where the verifier can guarantee that such pointers are valid. The fentry/fexit/fmod_ret programs are not in this category. Only arguments of SEC("tp_btf") and SEC("iter") programs are trusted (which have BPF_TRACE_RAW_TP and BPF_TRACE_ITER attach_type correspondingly) This bug was masked because convert_ctx_accesses() was converting trusted loads into BPF_PROBE_MEM loads. Fix it as well. The loads from trusted pointers don't need exception handling. Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221124215314.55890-1-alexei.starovoitov@gmail.com
|
#
9bb00b28 |
|
23-Nov-2022 |
Yonghong Song <yhs@fb.com> |
bpf: Add kfunc bpf_rcu_read_lock/unlock() Add two kfunc's bpf_rcu_read_lock() and bpf_rcu_read_unlock(). These two kfunc's can be used for all program types. The following is an example about how rcu pointer are used w.r.t. bpf_rcu_read_lock()/bpf_rcu_read_unlock(). struct task_struct { ... struct task_struct *last_wakee; struct task_struct __rcu *real_parent; ... }; Let us say prog does 'task = bpf_get_current_task_btf()' to get a 'task' pointer. The basic rules are: - 'real_parent = task->real_parent' should be inside bpf_rcu_read_lock region. This is to simulate rcu_dereference() operation. The 'real_parent' is marked as MEM_RCU only if (1). task->real_parent is inside bpf_rcu_read_lock region, and (2). task is a trusted ptr. So MEM_RCU marked ptr can be 'trusted' inside the bpf_rcu_read_lock region. - 'last_wakee = real_parent->last_wakee' should be inside bpf_rcu_read_lock region since it tries to access rcu protected memory. - the ptr 'last_wakee' will be marked as PTR_UNTRUSTED since in general it is not clear whether the object pointed by 'last_wakee' is valid or not even inside bpf_rcu_read_lock region. The verifier will reset all rcu pointer register states to untrusted at bpf_rcu_read_unlock() kfunc call site, so any such rcu pointer won't be trusted any more outside the bpf_rcu_read_lock() region. The current implementation does not support nested rcu read lock region in the prog. Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221124053217.2373910-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
5bad3587 |
|
23-Nov-2022 |
Stanislav Fomichev <sdf@google.com> |
bpf: Unify and simplify btf_func_proto_check error handling Replace 'err = x; break;' with 'return x;'. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221124002838.2700179-1-sdf@google.com
|
#
f17472d4 |
|
22-Nov-2022 |
Stanislav Fomichev <sdf@google.com> |
bpf: Prevent decl_tag from being referenced in func_proto arg Syzkaller managed to hit another decl_tag issue: btf_func_proto_check kernel/bpf/btf.c:4506 [inline] btf_check_all_types kernel/bpf/btf.c:4734 [inline] btf_parse_type_sec+0x1175/0x1980 kernel/bpf/btf.c:4763 btf_parse kernel/bpf/btf.c:5042 [inline] btf_new_fd+0x65a/0xb00 kernel/bpf/btf.c:6709 bpf_btf_load+0x6f/0x90 kernel/bpf/syscall.c:4342 __sys_bpf+0x50a/0x6c0 kernel/bpf/syscall.c:5034 __do_sys_bpf kernel/bpf/syscall.c:5093 [inline] __se_sys_bpf kernel/bpf/syscall.c:5091 [inline] __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5091 do_syscall_64+0x54/0x70 arch/x86/entry/common.c:48 This seems similar to commit ea68376c8bed ("bpf: prevent decl_tag from being referenced in func_proto") but for the argument. Reported-by: syzbot+8dd0551dda6020944c5d@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221123035422.872531-2-sdf@google.com
|
#
fd264ca0 |
|
20-Nov-2022 |
Yonghong Song <yhs@fb.com> |
bpf: Add a kfunc to type cast from bpf uapi ctx to kernel ctx Implement bpf_cast_to_kern_ctx() kfunc which does a type cast of a uapi ctx object to the corresponding kernel ctx. Previously if users want to access some data available in kctx but not in uapi ctx, bpf_probe_read_kernel() helper is needed. The introduction of bpf_cast_to_kern_ctx() allows direct memory access which makes code simpler and easier to understand. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221120195432.3113982-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
cfe14564 |
|
20-Nov-2022 |
Yonghong Song <yhs@fb.com> |
bpf: Add support for kfunc set with common btf_ids Later on, we will introduce kfuncs bpf_cast_to_kern_ctx() and bpf_rdonly_cast() which apply to all program types. Currently kfunc set only supports individual prog types. This patch added support for kfunc applying to all program types. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221120195426.3113828-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
3f00c523 |
|
19-Nov-2022 |
David Vernet <void@manifault.com> |
bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal to the verifier that it should enforce that a BPF program passes it a "safe", trusted pointer. Currently, "safe" means that the pointer is either PTR_TO_CTX, or is refcounted. There may be cases, however, where the kernel passes a BPF program a safe / trusted pointer to an object that the BPF program wishes to use as a kptr, but because the object does not yet have a ref_obj_id from the perspective of the verifier, the program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS kfunc. The solution is to expand the set of pointers that are considered trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs with these pointers without getting rejected by the verifier. There is already a PTR_UNTRUSTED flag that is set in some scenarios, such as when a BPF program reads a kptr directly from a map without performing a bpf_kptr_xchg() call. These pointers of course can and should be rejected by the verifier. Unfortunately, however, PTR_UNTRUSTED does not cover all the cases for safety that need to be addressed to adequately protect kfuncs. Specifically, pointers obtained by a BPF program "walking" a struct are _not_ considered PTR_UNTRUSTED according to BPF. For example, say that we were to add a kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal that a task was unsafe to pass to a kfunc, the verifier would mistakenly allow the following unsafe BPF program to be loaded: SEC("tp_btf/task_newtask") int BPF_PROG(unsafe_acquire_task, struct task_struct *task, u64 clone_flags) { struct task_struct *acquired, *nested; nested = task->last_wakee; /* Would not be rejected by the verifier. */ acquired = bpf_task_acquire(nested); if (!acquired) return 0; bpf_task_release(acquired); return 0; } To address this, this patch defines a new type flag called PTR_TRUSTED which tracks whether a PTR_TO_BTF_ID pointer is safe to pass to a KF_TRUSTED_ARGS kfunc or a BPF helper function. PTR_TRUSTED pointers are passed directly from the kernel as a tracepoint or struct_ops callback argument. Any nested pointer that is obtained from walking a PTR_TRUSTED pointer is no longer PTR_TRUSTED. From the example above, the struct task_struct *task argument is PTR_TRUSTED, but the 'nested' pointer obtained from 'task->last_wakee' is not PTR_TRUSTED. A subsequent patch will add kfuncs for storing a task kfunc as a kptr, and then another patch will add selftests to validate. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221120051004.3605026-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
c22dfdd2 |
|
17-Nov-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Add comments for map BTF matching requirement for bpf_list_head The old behavior of bpf_map_meta_equal was that it compared timer_off to be equal (but not spin_lock_off, because that was not allowed), and did memcmp of kptr_off_tab. Now, we memcmp the btf_record of two bpf_map structs, which has all fields. We preserve backwards compat as we kzalloc the array, so if only spin lock and timer exist in map, we only compare offset while the rest of unused members in the btf_field struct are zeroed out. In case of kptr, btf and everything else is of vmlinux or module, so as long type is same it will match, since kernel btf, module, dtor pointer will be same across maps. Now with list_head in the mix, things are a bit complicated. We implicitly add a requirement that both BTFs are same, because struct btf_field_list_head has btf and value_rec members. We obviously shouldn't force BTFs to be equal by default, as that breaks backwards compatibility. Currently it is only implicitly required due to list_head matching struct btf and value_rec member. value_rec points back into a btf_record stashed in the map BTF (btf member of btf_field_list_head). So that pointer and btf member has to match exactly. Document all these subtle details so that things don't break in the future when touching this code. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-19-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
00b85860 |
|
17-Nov-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Rewrite kfunc argument handling As we continue to add more features, argument types, kfunc flags, and different extensions to kfuncs, the code to verify the correctness of the kfunc prototype wrt the passed in registers has become ad-hoc and ugly to read. To make life easier, and make a very clear split between different stages of argument processing, move all the code into verifier.c and refactor into easier to read helpers and functions. This also makes sharing code within the verifier easier with kfunc argument processing. This will be more and more useful in later patches as we are now moving to implement very core BPF helpers as kfuncs, to keep them experimental before baking into UAPI. Remove all kfunc related bits now from btf_check_func_arg_match, as users have been converted away to refactored kfunc argument handling. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-12-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
865ce09a |
|
17-Nov-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Verify ownership relationships for user BTF types Ensure that there can be no ownership cycles among different types by way of having owning objects that can hold some other type as their element. For instance, a map value can only hold allocated objects, but these are allowed to have another bpf_list_head. To prevent unbounded recursion while freeing resources, elements of bpf_list_head in local kptrs can never have a bpf_list_head which are part of list in a map value. Later patches will verify this by having dedicated BTF selftests. Also, to make runtime destruction easier, once btf_struct_metas is fully populated, we can stash the metadata of the value type directly in the metadata of the list_head fields, as that allows easier access to the value type's layout to destruct it at runtime from the btf_field entry of the list head itself. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
8ffa5cc1 |
|
17-Nov-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Recognize lock and list fields in allocated objects Allow specifying bpf_spin_lock, bpf_list_head, bpf_list_node fields in a allocated object. Also update btf_struct_access to reject direct access to these special fields. A bpf_list_head allows implementing map-in-map style use cases, where an allocated object with bpf_list_head is linked into a list in a map value. This would require embedding a bpf_list_node, support for which is also included. The bpf_spin_lock is used to protect the bpf_list_head and other data. While we strictly don't require to hold a bpf_spin_lock while touching the bpf_list_head in such objects, as when have access to it, we have complete ownership of the object, the locking constraint is still kept and may be conditionally lifted in the future. Note that the specification of such types can be done just like map values, e.g.: struct bar { struct bpf_list_node node; }; struct foo { struct bpf_spin_lock lock; struct bpf_list_head head __contains(bar, node); struct bpf_list_node node; }; struct map_value { struct bpf_spin_lock lock; struct bpf_list_head head __contains(foo, node); }; To recognize such types in user BTF, we build a btf_struct_metas array of metadata items corresponding to each BTF ID. This is done once during the btf_parse stage to avoid having to do it each time during the verification process's requirement to inspect the metadata. Moreover, the computed metadata needs to be passed to some helpers in future patches which requires allocating them and storing them in the BTF that is pinned by the program itself, so that valid access can be assumed to such data during program runtime. A key thing to note is that once a btf_struct_meta is available for a type, both the btf_record and btf_field_offs should be available. It is critical that btf_field_offs is available in case special fields are present, as we extensively rely on special fields being zeroed out in map values and allocated objects in later patches. The code ensures that by bailing out in case of errors and ensuring both are available together. If the record is not available, the special fields won't be recognized, so not having both is also fine (in terms of being a verification error and not a runtime bug). Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
282de143 |
|
17-Nov-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Introduce allocated objects support Introduce support for representing pointers to objects allocated by the BPF program, i.e. PTR_TO_BTF_ID that point to a type in program BTF. This is indicated by the presence of MEM_ALLOC type flag in reg->type to avoid having to check btf_is_kernel when trying to match argument types in helpers. Whenever walking such types, any pointers being walked will always yield a SCALAR instead of pointer. In the future we might permit kptr inside such allocated objects (either kernel or program allocated), and it will then form a PTR_TO_BTF_ID of the respective type. For now, such allocated objects will always be referenced in verifier context, hence ref_obj_id == 0 for them is a bug. It is allowed to write to such objects, as long fields that are special are not touched (support for which will be added in subsequent patches). Note that once such a pointer is marked PTR_UNTRUSTED, it is no longer allowed to write to it. No PROBE_MEM handling is therefore done for loads into this type unless PTR_UNTRUSTED is part of the register type, since they can never be in an undefined state, and their lifetime will always be valid. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
6728aea7 |
|
14-Nov-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Refactor btf_struct_access Instead of having to pass multiple arguments that describe the register, pass the bpf_reg_state into the btf_struct_access callback. Currently, all call sites simply reuse the btf and btf_id of the reg they want to check the access of. The only exception to this pattern is the callsite in check_ptr_to_map_access, hence for that case create a dummy reg to simulate PTR_TO_BTF_ID access. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
f0c5941f |
|
14-Nov-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Support bpf_list_head in map values Add the support on the map side to parse, recognize, verify, and build metadata table for a new special field of the type struct bpf_list_head. To parameterize the bpf_list_head for a certain value type and the list_node member it will accept in that value type, we use BTF declaration tags. The definition of bpf_list_head in a map value will be done as follows: struct foo { struct bpf_list_node node; int data; }; struct map_value { struct bpf_list_head head __contains(foo, node); }; Then, the bpf_list_head only allows adding to the list 'head' using the bpf_list_node 'node' for the type struct foo. The 'contains' annotation is a BTF declaration tag composed of four parts, "contains:name:node" where the name is then used to look up the type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during the lookup. The node defines name of the member in this type that has the type struct bpf_list_node, which is actually used for linking into the linked list. For now, 'kind' part is hardcoded as struct. This allows building intrusive linked lists in BPF, using container_of to obtain pointer to entry, while being completely type safe from the perspective of the verifier. The verifier knows exactly the type of the nodes, and knows that list helpers return that type at some fixed offset where the bpf_list_node member used for this list exists. The verifier also uses this information to disallow adding types that are not accepted by a certain list. For now, no elements can be added to such lists. Support for that is coming in future patches, hence draining and freeing items is done with a TODO that will be resolved in a future patch. Note that the bpf_list_head_free function moves the list out to a local variable under the lock and releases it, doing the actual draining of the list items outside the lock. While this helps with not holding the lock for too long pessimizing other concurrent list operations, it is also necessary for deadlock prevention: unless every function called in the critical section would be notrace, a fentry/fexit program could attach and call bpf_map_update_elem again on the map, leading to the same lock being acquired if the key matches and lead to a deadlock. While this requires some special effort on part of the BPF programmer to trigger and is highly unlikely to occur in practice, it is always better if we can avoid such a condition. While notrace would prevent this, doing the draining outside the lock has advantages of its own, hence it is used to also fix the deadlock related problem. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
2d577252 |
|
14-Nov-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Remove BPF_MAP_OFF_ARR_MAX In f71b2f64177a ("bpf: Refactor map->off_arr handling"), map->off_arr was refactored to be btf_field_offs. The number of field offsets is equal to maximum possible fields limited by BTF_FIELDS_MAX. Hence, reuse BTF_FIELDS_MAX as spin_lock and timer no longer are to be handled specially for offset sorting, fix the comment, and remove incorrect WARN_ON as its rec->cnt can never exceed this value. The reason to keep separate constant was the it was always more 2 more than total kptrs. This is no longer the case. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
f71b2f64 |
|
03-Nov-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Refactor map->off_arr handling Refactor map->off_arr handling into generic functions that can work on their own without hardcoding map specific code. The btf_fields_offs structure is now returned from btf_parse_field_offs, which can be reused later for types in program BTF. All functions like copy_map_value, zero_map_value call generic underlying functions so that they can also be reused later for copying to values allocated in programs which encode specific fields. Later, some helper functions will also require access to this btf_field_offs structure to be able to skip over special fields at runtime. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221103191013.1236066-9-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
db559117 |
|
03-Nov-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Consolidate spin_lock, timer management into btf_record Now that kptr_off_tab has been refactored into btf_record, and can hold more than one specific field type, accomodate bpf_spin_lock and bpf_timer as well. While they don't require any more metadata than offset, having all special fields in one place allows us to share the same code for allocated user defined types and handle both map values and these allocated objects in a similar fashion. As an optimization, we still keep spin_lock_off and timer_off offsets in the btf_record structure, just to avoid having to find the btf_field struct each time their offset is needed. This is mostly needed to manipulate such objects in a map value at runtime. It's ok to hardcode just one offset as more than one field is disallowed. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221103191013.1236066-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
aa3496ac |
|
03-Nov-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Refactor kptr_off_tab into btf_record To prepare the BPF verifier to handle special fields in both map values and program allocated types coming from program BTF, we need to refactor the kptr_off_tab handling code into something more generic and reusable across both cases to avoid code duplication. Later patches also require passing this data to helpers at runtime, so that they can work on user defined types, initialize them, destruct them, etc. The main observation is that both map values and such allocated types point to a type in program BTF, hence they can be handled similarly. We can prepare a field metadata table for both cases and store them in struct bpf_map or struct btf depending on the use case. Hence, refactor the code into generic btf_record and btf_field member structs. The btf_record represents the fields of a specific btf_type in user BTF. The cnt indicates the number of special fields we successfully recognized, and field_mask is a bitmask of fields that were found, to enable quick determination of availability of a certain field. Subsequently, refactor the rest of the code to work with these generic types, remove assumptions about kptr and kptr_off_tab, rename variables to more meaningful names, etc. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221103191013.1236066-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
23da464d |
|
03-Nov-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Allow specifying volatile type modifier for kptrs This is useful in particular to mark the pointer as volatile, so that compiler treats each load and store to the field as a volatile access. The alternative is having to define and use READ_ONCE and WRITE_ONCE in the BPF program. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221103191013.1236066-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
ea68376c |
|
14-Oct-2022 |
Stanislav Fomichev <sdf@google.com> |
bpf: prevent decl_tag from being referenced in func_proto Syzkaller was able to hit the following issue: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3609 at kernel/bpf/btf.c:1946 btf_type_id_size+0x2d5/0x9d0 kernel/bpf/btf.c:1946 Modules linked in: CPU: 0 PID: 3609 Comm: syz-executor361 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 RIP: 0010:btf_type_id_size+0x2d5/0x9d0 kernel/bpf/btf.c:1946 Code: ef e8 7f 8e e4 ff 41 83 ff 0b 77 28 f6 44 24 10 18 75 3f e8 6d 91 e4 ff 44 89 fe bf 0e 00 00 00 e8 20 8e e4 ff e8 5b 91 e4 ff <0f> 0b 45 31 f6 e9 98 02 00 00 41 83 ff 12 74 18 e8 46 91 e4 ff 44 RSP: 0018:ffffc90003cefb40 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 RDX: ffff8880259c0000 RSI: ffffffff81968415 RDI: 0000000000000005 RBP: ffff88801270ca00 R08: 0000000000000005 R09: 000000000000000e R10: 0000000000000011 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000011 R14: ffff888026ee6424 R15: 0000000000000011 FS: 000055555641b300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000f2e258 CR3: 000000007110e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> btf_func_proto_check kernel/bpf/btf.c:4447 [inline] btf_check_all_types kernel/bpf/btf.c:4723 [inline] btf_parse_type_sec kernel/bpf/btf.c:4752 [inline] btf_parse kernel/bpf/btf.c:5026 [inline] btf_new_fd+0x1926/0x1e70 kernel/bpf/btf.c:6892 bpf_btf_load kernel/bpf/syscall.c:4324 [inline] __sys_bpf+0xb7d/0x4cf0 kernel/bpf/syscall.c:5010 __do_sys_bpf kernel/bpf/syscall.c:5069 [inline] __se_sys_bpf kernel/bpf/syscall.c:5067 [inline] __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:5067 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f0fbae41c69 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc8aeb6228 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0fbae41c69 RDX: 0000000000000020 RSI: 0000000020000140 RDI: 0000000000000012 RBP: 00007f0fbae05e10 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000246 R12: 00007f0fbae05ea0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Looks like it tries to create a func_proto which return type is decl_tag. For the details, see Martin's spot on analysis in [0]. 0: https://lore.kernel.org/bpf/CAKH8qBuQDLva_hHxxBuZzyAcYNO4ejhovz6TQeVSk8HY-2SO6g@mail.gmail.com/T/#mea6524b3fcd6298347432226e81b1e6155efc62c Cc: Yonghong Song <yhs@fb.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Fixes: bd16dee66ae4 ("bpf: Add BTF_KIND_DECL_TAG typedef support") Reported-by: syzbot+d8bd751aef7c6b39a344@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221015002444.2680969-2-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
eed807f6 |
|
21-Sep-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Tweak definition of KF_TRUSTED_ARGS Instead of forcing all arguments to be referenced pointers with non-zero reg->ref_obj_id, tweak the definition of KF_TRUSTED_ARGS to mean that only PTR_TO_BTF_ID (and socket types translated to PTR_TO_BTF_ID) have that constraint, and require their offset to be set to 0. The rest of pointer types are also accomodated in this definition of trusted pointers, but with more relaxed rules regarding offsets. The inherent meaning of setting this flag is that all kfunc pointer arguments have a guranteed lifetime, and kernel object pointers (PTR_TO_BTF_ID, PTR_TO_CTX) are passed in their unmodified form (with offset 0). In general, this is not true for PTR_TO_BTF_ID as it can be obtained using pointer walks. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/cdede0043c47ed7a357f0a915d16f9ce06a1d589.1663778601.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
b8d31762 |
|
20-Sep-2022 |
Roberto Sassu <roberto.sassu@huawei.com> |
btf: Allow dynamic pointer parameters in kfuncs Allow dynamic pointers (struct bpf_dynptr_kern *) to be specified as parameters in kfuncs. Also, ensure that dynamic pointers passed as argument are valid and initialized, are a pointer to the stack, and of the type local. More dynamic pointer types can be supported in the future. To properly detect whether a parameter is of the desired type, introduce the stringify_struct() macro to compare the returned structure name with the desired name. In addition, protect against structure renames, by halting the build with BUILD_BUG_ON(), so that developers have to revisit the code. To check if a dynamic pointer passed to the kfunc is valid and initialized, and if its type is local, export the existing functions is_dynptr_reg_valid_init() and is_dynptr_type_expected(). Cc: Joanne Koong <joannelkoong@gmail.com> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220920075951.929132-5-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
d15bf150 |
|
20-Sep-2022 |
KP Singh <kpsingh@kernel.org> |
bpf: Allow kfuncs to be used in LSM programs In preparation for the addition of new kfuncs, allow kfuncs defined in the tracing subsystem to be used in LSM programs by mapping the LSM program type to the TRACING hook. Signed-off-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220920075951.929132-2-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
3a74904c |
|
17-Sep-2022 |
William Dean <williamsukatube@163.com> |
bpf: simplify code in btf_parse_hdr It could directly return 'btf_check_sec_info' to simplify code. Signed-off-by: William Dean <williamsukatube@163.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220917084248.3649-1-williamsukatube@163.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
571f9738 |
|
16-Sep-2022 |
Peilin Ye <peilin.ye@bytedance.com> |
bpf/btf: Use btf_type_str() whenever possible We have btf_type_str(). Use it whenever possible in btf.c, instead of "btf_kind_str[BTF_INFO_KIND(t->info)]". Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Link: https://lore.kernel.org/r/20220916202800.31421-1-yepeilin.cs@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
a37a3258 |
|
10-Sep-2022 |
Lorenz Bauer <oss@lmb.io> |
bpf: btf: fix truncated last_member_type_id in btf_struct_resolve When trying to finish resolving a struct member, btf_struct_resolve saves the member type id in a u16 temporary variable. This truncates the 32 bit type id value if it exceeds UINT16_MAX. As a result, structs that have members with type ids > UINT16_MAX and which need resolution will fail with a message like this: [67414] STRUCT ff_device size=120 vlen=12 effect_owners type_id=67434 bits_offset=960 Member exceeds struct_size Fix this by changing the type of last_member_type_id to u32. Fixes: a0791f0df7d2 ("bpf: fix BTF limits") Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Lorenz Bauer <oss@lmb.io> Link: https://lore.kernel.org/r/20220910110120.339242-1-oss@lmb.io Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
84c6ac41 |
|
07-Sep-2022 |
Daniel Xu <dxu@dxuuu.xyz> |
bpf: Export btf_type_by_id() and bpf_log() These symbols will be used in nf_conntrack.ko to support direct writes to `nf_conn`. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/3c98c19dc50d3b18ea5eca135b4fc3a5db036060.1662568410.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
eb1f7f71 |
|
06-Sep-2022 |
Benjamin Tissoires <benjamin.tissoires@redhat.com> |
bpf/verifier: allow kfunc to return an allocated mem For drivers (outside of network), the incoming data is not statically defined in a struct. Most of the time the data buffer is kzalloc-ed and thus we can not rely on eBPF and BTF to explore the data. This commit allows to return an arbitrary memory, previously allocated by the driver. An interesting extra point is that the kfunc can mark the exported memory region as read only or read/write. So, when a kfunc is not returning a pointer to a struct but to a plain type, we can consider it is a valid allocated memory assuming that: - one of the arguments is either called rdonly_buf_size or rdwr_buf_size - and this argument is a const from the caller point of view We can then use this parameter as the size of the allocated memory. The memory is either read-only or read-write based on the name of the size parameter. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220906151303.2780789-7-benjamin.tissoires@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
f9b34818 |
|
06-Sep-2022 |
Benjamin Tissoires <benjamin.tissoires@redhat.com> |
bpf/btf: bump BTF_KFUNC_SET_MAX_CNT net/bpf/test_run.c is already presenting 20 kfuncs. net/netfilter/nf_conntrack_bpf.c is also presenting an extra 10 kfuncs. Given that all the kfuncs are regrouped into one unique set, having only 2 space left prevent us to add more selftests. Bump it to 256. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220906151303.2780789-6-benjamin.tissoires@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
95f2f26f |
|
06-Sep-2022 |
Benjamin Tissoires <benjamin.tissoires@redhat.com> |
bpf: split btf_check_subprog_arg_match in two btf_check_subprog_arg_match() was used twice in verifier.c: - when checking for the type mismatches between a (sub)prog declaration and BTF - when checking the call of a subprog to see if the provided arguments are correct and valid This is problematic when we check if the first argument of a program (pointer to ctx) is correctly accessed: To be able to ensure we access a valid memory in the ctx, the verifier assumes the pointer to context is not null. This has the side effect of marking the program accessing the entire context, even if the context is never dereferenced. For example, by checking the context access with the current code, the following eBPF program would fail with -EINVAL if the ctx is set to null from the userspace: ``` SEC("syscall") int prog(struct my_ctx *args) { return 0; } ``` In that particular case, we do not want to actually check that the memory is correct while checking for the BTF validity, but we just want to ensure that the (sub)prog definition matches the BTF we have. So split btf_check_subprog_arg_match() in two so we can actually check for the memory used when in a call, and ignore that part when not. Note that a further patch is in preparation to disentangled btf_check_func_arg_match() from these two purposes, and so right now we just add a new hack around that by adding a boolean to this function. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220906151303.2780789-3-benjamin.tissoires@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
720e6a43 |
|
31-Aug-2022 |
Yonghong Song <yhs@fb.com> |
bpf: Allow struct argument in trampoline based programs Allow struct argument in trampoline based programs where the struct size should be <= 16 bytes. In such cases, the argument will be put into up to 2 registers for bpf, x86_64 and arm64 architectures. To support arch-specific trampoline manipulation, add arg_flags for additional struct information about arguments in btf_func_model. Such information will be used in arch specific function arch_prepare_bpf_trampoline() to prepare argument access properly in trampoline. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220831152646.2078089-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
a00ed843 |
|
07-Aug-2022 |
Yonghong Song <yhs@fb.com> |
bpf: Always return corresponding btf_type in __get_type_size() Currently in funciton __get_type_size(), the corresponding btf_type is returned only in invalid cases. Let us always return btf_type regardless of valid or invalid cases. Such a new functionality will be used in subsequent patches. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220807175116.4179242-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
fa96b242 |
|
05-Aug-2022 |
Benjamin Tissoires <benjamin.tissoires@redhat.com> |
btf: Add a new kfunc flag which allows to mark a function to be sleepable This allows to declare a kfunc as sleepable and prevents its use in a non sleepable program. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Co-developed-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220805214821.1058337-2-haoluo@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
58250ae3 |
|
11-Jul-2022 |
Fedor Tokarev <ftokarev@gmail.com> |
bpf: btf: Fix vsnprintf return value check vsnprintf returns the number of characters which would have been written if enough space had been available, excluding the terminating null byte. Thus, the return value of 'len_left' means that the last character has been dropped. Signed-off-by: Fedor Tokarev <ftokarev@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20220711211317.GA1143610@laptop
|
#
56e948ff |
|
21-Jul-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Add support for forcing kfunc args to be trusted Teach the verifier to detect a new KF_TRUSTED_ARGS kfunc flag, which means each pointer argument must be trusted, which we define as a pointer that is referenced (has non-zero ref_obj_id) and also needs to have its offset unchanged, similar to how release functions expect their argument. This allows a kfunc to receive pointer arguments unchanged from the result of the acquire kfunc. This is required to ensure that kfunc that operate on some object only work on acquired pointers and not normal PTR_TO_BTF_ID with same type which can be obtained by pointer walking. The restrictions applied to release arguments also apply to trusted arguments. This implies that strict type matching (not deducing type by recursively following members at offset) and OBJ_RELEASE offset checks (ensuring they are zero) are used for trusted pointer arguments. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
a4703e31 |
|
21-Jul-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Switch to new kfunc flags infrastructure Instead of populating multiple sets to indicate some attribute and then researching the same BTF ID in them, prepare a single unified BTF set which indicates whether a kfunc is allowed to be called, and also its attributes if any at the same time. Now, only one call is needed to perform the lookup for both kfunc availability and its attributes. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220721134245.2450-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
a2a5580f |
|
14-Jul-2022 |
Ben Dooks <ben.dooks@sifive.com> |
bpf: Fix check against plain integer v 'NULL' When checking with sparse, btf_show_type_value() is causing a warning about checking integer vs NULL when the macro is passed a pointer, due to the 'value != 0' check. Stop sparse complaining about any type-casting by adding a cast to the typeof(value). This fixes the following sparse warnings: kernel/bpf/btf.c:2579:17: warning: Using plain integer as NULL pointer kernel/bpf/btf.c:2581:17: warning: Using plain integer as NULL pointer kernel/bpf/btf.c:3407:17: warning: Using plain integer as NULL pointer kernel/bpf/btf.c:3758:9: warning: Using plain integer as NULL pointer Signed-off-by: Ben Dooks <ben.dooks@sifive.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220714100322.260467-1-ben.dooks@sifive.com
|
#
ec6209c8 |
|
28-Jun-2022 |
Daniel Müller <deso@posteo.net> |
bpf, libbpf: Add type match support This patch adds support for the proposed type match relation to relo_core where it is shared between userspace and kernel. It plumbs through both kernel-side and libbpf-side support. The matching relation is defined as follows (copy from source): - modifiers and typedefs are stripped (and, hence, effectively ignored) - generally speaking types need to be of same kind (struct vs. struct, union vs. union, etc.) - exceptions are struct/union behind a pointer which could also match a forward declaration of a struct or union, respectively, and enum vs. enum64 (see below) Then, depending on type: - integers: - match if size and signedness match - arrays & pointers: - target types are recursively matched - structs & unions: - local members need to exist in target with the same name - for each member we recursively check match unless it is already behind a pointer, in which case we only check matching names and compatible kind - enums: - local variants have to have a match in target by symbolic name (but not numeric value) - size has to match (but enum may match enum64 and vice versa) - function pointers: - number and position of arguments in local type has to match target - for each argument and the return value we recursively check match Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220628160127.607834-5-deso@posteo.net
|
#
c0e19f2c |
|
28-Jun-2022 |
Stanislav Fomichev <sdf@google.com> |
bpf: minimize number of allocated lsm slots per program Previous patch adds 1:1 mapping between all 211 LSM hooks and bpf_cgroup program array. Instead of reserving a slot per possible hook, reserve 10 slots per cgroup for lsm programs. Those slots are dynamically allocated on demand and reclaimed. struct cgroup_bpf { struct bpf_prog_array * effective[33]; /* 0 264 */ /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */ struct hlist_head progs[33]; /* 264 264 */ /* --- cacheline 8 boundary (512 bytes) was 16 bytes ago --- */ u8 flags[33]; /* 528 33 */ /* XXX 7 bytes hole, try to pack */ struct list_head storages; /* 568 16 */ /* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */ struct bpf_prog_array * inactive; /* 584 8 */ struct percpu_ref refcnt; /* 592 16 */ struct work_struct release_work; /* 608 72 */ /* size: 680, cachelines: 11, members: 7 */ /* sum members: 673, holes: 1, sum holes: 7 */ /* last cacheline: 40 bytes */ }; Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220628174314.1216643-5-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
69fd337a |
|
28-Jun-2022 |
Stanislav Fomichev <sdf@google.com> |
bpf: per-cgroup lsm flavor Allow attaching to lsm hooks in the cgroup context. Attaching to per-cgroup LSM works exactly like attaching to other per-cgroup hooks. New BPF_LSM_CGROUP is added to trigger new mode; the actual lsm hook we attach to is signaled via existing attach_btf_id. For the hooks that have 'struct socket' or 'struct sock' as its first argument, we use the cgroup associated with that socket. For the rest, we use 'current' cgroup (this is all on default hierarchy == v2 only). Note that for some hooks that work on 'struct sock' we still take the cgroup from 'current' because some of them work on the socket that hasn't been properly initialized yet. Behind the scenes, we allocate a shim program that is attached to the trampoline and runs cgroup effective BPF programs array. This shim has some rudimentary ref counting and can be shared between several programs attaching to the same lsm hook from different cgroups. Note that this patch bloats cgroup size because we add 211 cgroup_bpf_attach_type(s) for simplicity sake. This will be addressed in the subsequent patch. Also note that we only add non-sleepable flavor for now. To enable sleepable use-cases, bpf_prog_run_array_cg has to grab trace rcu, shim programs have to be freed via trace rcu, cgroup_bpf.effective should be also trace-rcu-managed + maybe some other changes that I'm not aware of. Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220628174314.1216643-4-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
fd75733d |
|
23-Jun-2022 |
Daniel Müller <deso@posteo.net> |
bpf: Merge "types_are_compat" logic into relo_core.c BPF type compatibility checks (bpf_core_types_are_compat()) are currently duplicated between kernel and user space. That's a historical artifact more than intentional doing and can lead to subtle bugs where one implementation is adjusted but another is forgotten. That happened with the enum64 work, for example, where the libbpf side was changed (commit 23b2a3a8f63a ("libbpf: Add enum64 relocation support")) to use the btf_kind_core_compat() helper function but the kernel side was not (commit 6089fb325cf7 ("bpf: Add btf enum64 support")). This patch addresses both the duplication issue, by merging both implementations and moving them into relo_core.c, and fixes the alluded to kind check (by giving preference to libbpf's already adjusted logic). For discussion of the topic, please refer to: https://lore.kernel.org/bpf/CAADnVQKbWR7oarBdewgOBZUPzryhRYvEbkhyPJQHHuxq=0K1gw@mail.gmail.com/T/#mcc99f4a33ad9a322afaf1b9276fb1f0b7add9665 Changelog: v1 -> v2: - limited libbpf recursion limit to 32 - changed name to __bpf_core_types_are_compat - included warning previously present in libbpf version - merged kernel and user space changes into a single patch Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220623182934.2582827-1-deso@posteo.net
|
#
6089fb32 |
|
07-Jun-2022 |
Yonghong Song <yhs@fb.com> |
bpf: Add btf enum64 support Currently, BTF only supports upto 32bit enum value with BTF_KIND_ENUM. But in kernel, some enum indeed has 64bit values, e.g., in uapi bpf.h, we have enum { BPF_F_INDEX_MASK = 0xffffffffULL, BPF_F_CURRENT_CPU = BPF_F_INDEX_MASK, BPF_F_CTXLEN_MASK = (0xfffffULL << 32), }; In this case, BTF_KIND_ENUM will encode the value of BPF_F_CTXLEN_MASK as 0, which certainly is incorrect. This patch added a new btf kind, BTF_KIND_ENUM64, which permits 64bit value to cover the above use case. The BTF_KIND_ENUM64 has the following three fields followed by the common type: struct bpf_enum64 { __u32 nume_off; __u32 val_lo32; __u32 val_hi32; }; Currently, btf type section has an alignment of 4 as all element types are u32. Representing the value with __u64 will introduce a pad for bpf_enum64 and may also introduce misalignment for the 64bit value. Hence, two members of val_hi32 and val_lo32 are chosen to avoid these issues. The kflag is also introduced for BTF_KIND_ENUM and BTF_KIND_ENUM64 to indicate whether the value is signed or unsigned. The kflag intends to provide consistent output of BTF C fortmat with the original source code. For example, the original BTF_KIND_ENUM bit value is 0xffffffff. The format C has two choices, printing out 0xffffffff or -1 and current libbpf prints out as unsigned value. But if the signedness is preserved in btf, the value can be printed the same as the original source code. The kflag value 0 means unsigned values, which is consistent to the default by libbpf and should also cover most cases as well. The new BTF_KIND_ENUM64 is intended to support the enum value represented as 64bit value. But it can represent all BTF_KIND_ENUM values as well. The compiler ([1]) and pahole will generate BTF_KIND_ENUM64 only if the value has to be represented with 64 bits. In addition, a static inline function btf_kind_core_compat() is introduced which will be used later when libbpf relo_core.c changed. Here the kernel shares the same relo_core.c with libbpf. [1] https://reviews.llvm.org/D124641 Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220607062600.3716578-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
d1a374a1 |
|
14-Jun-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Limit maximum modifier chain length in btf_check_type_tags On processing a module BTF of module built for an older kernel, we might sometimes find that some type points to itself forming a loop. If such a type is a modifier, btf_check_type_tags's while loop following modifier chain will be caught in an infinite loop. Fix this by defining a maximum chain length and bailing out if we spin any longer than that. Fixes: eb596b090558 ("bpf: Ensure type tags precede modifiers in BTF") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220615042151.2266537-1-memxor@gmail.com
|
#
f858c2b2 |
|
06-Jun-2022 |
Toke Høiland-Jørgensen <toke@redhat.com> |
bpf: Fix calling global functions from BPF_PROG_TYPE_EXT programs The verifier allows programs to call global functions as long as their argument types match, using BTF to check the function arguments. One of the allowed argument types to such global functions is PTR_TO_CTX; however the check for this fails on BPF_PROG_TYPE_EXT functions because the verifier uses the wrong type to fetch the vmlinux BTF ID for the program context type. This failure is seen when an XDP program is loaded using libxdp (which loads it as BPF_PROG_TYPE_EXT and attaches it to a global XDP type program). Fix the issue by passing in the target program type instead of the BPF_PROG_TYPE_EXT type to bpf_prog_get_ctx() when checking function argument compatibility. The first Fixes tag refers to the latest commit that touched the code in question, while the second one points to the code that first introduced the global function call verification. v2: - Use resolve_prog_type() Fixes: 3363bd0cfbb8 ("bpf: Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support") Fixes: 51c39bb1d5d1 ("bpf: Introduce function-by-function verification") Reported-by: Simon Sundberg <simon.sundberg@kau.se> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20220606075253.28422-1-toke@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
97949767 |
|
18-May-2022 |
Benjamin Tissoires <benjamin.tissoires@redhat.com> |
bpf: Allow kfunc in tracing and syscall programs. Tracing and syscall BPF program types are very convenient to add BPF capabilities to subsystem otherwise not BPF capable. When we add kfuncs capabilities to those program types, we can add BPF features to subsystems without having to touch BPF core. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220518205924.399291-2-benjamin.tissoires@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
c317ab71 |
|
25-Apr-2022 |
Menglong Dong <imagedong@tencent.com> |
bpf: Compute map_btf_id during build time For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map types are computed during vmlinux-btf init: btf_parse_vmlinux() -> btf_vmlinux_map_ids_init() It will lookup the btf_type according to the 'map_btf_name' field in 'struct bpf_map_ops'. This process can be done during build time, thanks to Jiri's resolve_btfids. selftest of map_ptr has passed: $96 map_ptr:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
2ab3b380 |
|
24-Apr-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Make BTF type match stricter for release arguments The current of behavior of btf_struct_ids_match for release arguments is that when type match fails, it retries with first member type again (recursively). Since the offset is already 0, this is akin to just casting the pointer in normal C, since if type matches it was just embedded inside parent sturct as an object. However, we want to reject cases for release function type matching, be it kfunc or BPF helpers. An example is the following: struct foo { struct bar b; }; struct foo *v = acq_foo(); rel_bar(&v->b); // btf_struct_ids_match fails btf_types_are_same, then // retries with first member type and succeeds, while // it should fail. Hence, don't walk the struct and only rely on btf_types_are_same for strict mode. All users of strict mode must be dealing with zero offset anyway, since otherwise they would want the struct to be walked. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-10-memxor@gmail.com
|
#
a1ef1959 |
|
24-Apr-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Teach verifier about kptr_get kfunc helpers We introduce a new style of kfunc helpers, namely *_kptr_get, where they take pointer to the map value which points to a referenced kernel pointer contained in the map. Since this is referenced, only bpf_kptr_xchg from BPF side and xchg from kernel side is allowed to change the current value, and each pointer that resides in that location would be referenced, and RCU protected (this must be kept in mind while adding kernel types embeddable as reference kptr in BPF maps). This means that if do the load of the pointer value in an RCU read section, and find a live pointer, then as long as we hold RCU read lock, it won't be freed by a parallel xchg + release operation. This allows us to implement a safe refcount increment scheme. Hence, enforce that first argument of all such kfunc is a proper PTR_TO_MAP_VALUE pointing at the right offset to referenced pointer. For the rest of the arguments, they are subjected to typical kfunc argument checks, hence allowing some flexibility in passing more intent into how the reference should be taken. For instance, in case of struct nf_conn, it is not freed until RCU grace period ends, but can still be reused for another tuple once refcount has dropped to zero. Hence, a bpf_ct_kptr_get helper not only needs to call refcount_inc_not_zero, but also do a tuple match after incrementing the reference, and when it fails to match it, put the reference again and return NULL. This can be implemented easily if we allow passing additional parameters to the bpf_ct_kptr_get kfunc, like a struct bpf_sock_tuple * and a tuple__sz pair. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-9-memxor@gmail.com
|
#
14a324f6 |
|
24-Apr-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Wire up freeing of referenced kptr A destructor kfunc can be defined as void func(type *), where type may be void or any other pointer type as per convenience. In this patch, we ensure that the type is sane and capture the function pointer into off_desc of ptr_off_tab for the specific pointer offset, with the invariant that the dtor pointer is always set when 'kptr_ref' tag is applied to the pointer's pointee type, which is indicated by the flag BPF_MAP_VALUE_OFF_F_REF. Note that only BTF IDs whose destructor kfunc is registered, thus become the allowed BTF IDs for embedding as referenced kptr. Hence it serves the purpose of finding dtor kfunc BTF ID, as well acting as a check against the whitelist of allowed BTF IDs for this purpose. Finally, wire up the actual freeing of the referenced pointer if any at all available offsets, so that no references are leaked after the BPF map goes away and the BPF program previously moved the ownership a referenced pointer into it. The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem will free any existing referenced kptr. The same case is with LRU map's bpf_lru_push_free/htab_lru_push_free functions, which are extended to reset unreferenced and free referenced kptr. Note that unlike BPF timers, kptr is not reset or freed when map uref drops to zero. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-8-memxor@gmail.com
|
#
5ce937d6 |
|
24-Apr-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Populate pairs of btf_id and destructor kfunc in btf To support storing referenced PTR_TO_BTF_ID in maps, we require associating a specific BTF ID with a 'destructor' kfunc. This is because we need to release a live referenced pointer at a certain offset in map value from the map destruction path, otherwise we end up leaking resources. Hence, introduce support for passing an array of btf_id, kfunc_btf_id pairs that denote a BTF ID and its associated release function. Then, add an accessor 'btf_find_dtor_kfunc' which can be used to look up the destructor kfunc of a certain BTF ID. If found, we can use it to free the object from the map free path. The registration of these pairs also serve as a whitelist of structures which are allowed as referenced PTR_TO_BTF_ID in a BPF map, because without finding the destructor kfunc, we will bail and return an error. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-7-memxor@gmail.com
|
#
c0a5a21c |
|
24-Apr-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Allow storing referenced kptr in map Extending the code in previous commits, introduce referenced kptr support, which needs to be tagged using 'kptr_ref' tag instead. Unlike unreferenced kptr, referenced kptr have a lot more restrictions. In addition to the type matching, only a newly introduced bpf_kptr_xchg helper is allowed to modify the map value at that offset. This transfers the referenced pointer being stored into the map, releasing the references state for the program, and returning the old value and creating new reference state for the returned pointer. Similar to unreferenced pointer case, return value for this case will also be PTR_TO_BTF_ID_OR_NULL. The reference for the returned pointer must either be eventually released by calling the corresponding release function, otherwise it must be transferred into another map. It is also allowed to call bpf_kptr_xchg with a NULL pointer, to clear the value, and obtain the old value if any. BPF_LDX, BPF_STX, and BPF_ST cannot access referenced kptr. A future commit will permit using BPF_LDX for such pointers, but attempt at making it safe, since the lifetime of object won't be guaranteed. There are valid reasons to enforce the restriction of permitting only bpf_kptr_xchg to operate on referenced kptr. The pointer value must be consistent in face of concurrent modification, and any prior values contained in the map must also be released before a new one is moved into the map. To ensure proper transfer of this ownership, bpf_kptr_xchg returns the old value, which the verifier would require the user to either free or move into another map, and releases the reference held for the pointer being moved in. In the future, direct BPF_XCHG instruction may also be permitted to work like bpf_kptr_xchg helper. Note that process_kptr_func doesn't have to call check_helper_mem_access, since we already disallow rdonly/wronly flags for map, which is what check_map_access_type checks, and we already ensure the PTR_TO_MAP_VALUE refers to kptr by obtaining its off_desc, so check_map_access is also not required. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-4-memxor@gmail.com
|
#
8f14852e |
|
24-Apr-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Tag argument to be released in bpf_func_proto Add a new type flag for bpf_arg_type that when set tells verifier that for a release function, that argument's register will be the one for which meta.ref_obj_id will be set, and which will then be released using release_reference. To capture the regno, introduce a new field release_regno in bpf_call_arg_meta. This would be required in the next patch, where we may either pass NULL or a refcounted pointer as an argument to the release function bpf_kptr_xchg. Just releasing only when meta.ref_obj_id is set is not enough, as there is a case where the type of argument needed matches, but the ref_obj_id is set to 0. Hence, we must enforce that whenever meta.ref_obj_id is zero, the register that is to be released can only be NULL for a release function. Since we now indicate whether an argument is to be released in bpf_func_proto itself, is_release_function helper has lost its utitlity, hence refactor code to work without it, and just rely on meta.release_regno to know when to release state for a ref_obj_id. Still, the restriction of one release argument and only one ref_obj_id passed to BPF helper or kfunc remains. This may be lifted in the future. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-3-memxor@gmail.com
|
#
61df10c7 |
|
24-Apr-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Allow storing unreferenced kptr in map This commit introduces a new pointer type 'kptr' which can be embedded in a map value to hold a PTR_TO_BTF_ID stored by a BPF program during its invocation. When storing such a kptr, BPF program's PTR_TO_BTF_ID register must have the same type as in the map value's BTF, and loading a kptr marks the destination register as PTR_TO_BTF_ID with the correct kernel BTF and BTF ID. Such kptr are unreferenced, i.e. by the time another invocation of the BPF program loads this pointer, the object which the pointer points to may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are patched to PROBE_MEM loads by the verifier, it would safe to allow user to still access such invalid pointer, but passing such pointers into BPF helpers and kfuncs should not be permitted. A future patch in this series will close this gap. The flexibility offered by allowing programs to dereference such invalid pointers while being safe at runtime frees the verifier from doing complex lifetime tracking. As long as the user may ensure that the object remains valid, it can ensure data read by it from the kernel object is valid. The user indicates that a certain pointer must be treated as kptr capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this information is recorded in the object BTF which will be passed into the kernel by way of map's BTF information. The name and kind from the map value BTF is used to look up the in-kernel type, and the actual BTF and BTF ID is recorded in the map struct in a new kptr_off_tab member. For now, only storing pointers to structs is permitted. An example of this specification is shown below: #define __kptr __attribute__((btf_type_tag("kptr"))) struct map_value { ... struct task_struct __kptr *task; ... }; Then, in a BPF program, user may store PTR_TO_BTF_ID with the type task_struct into the map, and then load it later. Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as the verifier cannot know whether the value is NULL or not statically, it must treat all potential loads at that map value offset as loading a possibly NULL pointer. Only BPF_LDX, BPF_STX, and BPF_ST (with insn->imm = 0 to denote NULL) are allowed instructions that can access such a pointer. On BPF_LDX, the destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX, it is checked whether the source register type is a PTR_TO_BTF_ID with same BTF type as specified in the map BTF. The access size must always be BPF_DW. For the map in map support, the kptr_off_tab for outer map is copied from the inner map's kptr_off_tab. It was chosen to do a deep copy instead of introducing a refcount to kptr_off_tab, because the copy only needs to be done when paramterizing using inner_map_fd in the map in map case, hence would be unnecessary for all other users. It is not permitted to use MAP_FREEZE command and mmap for BPF map having kptrs, similar to the bpf_timer case. A kptr also requires that BPF program has both read and write access to the map (hence both BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG are disallowed). Note that check_map_access must be called from both check_helper_mem_access and for the BPF instructions, hence the kptr check must distinguish between ACCESS_DIRECT and ACCESS_HELPER, and reject ACCESS_HELPER cases. We rename stack_access_src to bpf_access_src and reuse it for this purpose. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-2-memxor@gmail.com
|
#
42ba1308 |
|
15-Apr-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Make btf_find_field more generic Next commit introduces field type 'kptr' whose kind will not be struct, but pointer, and it will not be limited to one offset, but multiple ones. Make existing btf_find_struct_field and btf_find_datasec_var functions amenable to use for finding kptrs in map value, by moving spin_lock and timer specific checks into their own function. The alignment, and name are checked before the function is called, so it is the last point where we can skip field or return an error before the next loop iteration happens. Size of the field and type is meant to be checked inside the function. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220415160354.1050687-2-memxor@gmail.com
|
#
eb596b09 |
|
19-Apr-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Ensure type tags precede modifiers in BTF It is guaranteed that for modifiers, clang always places type tags before other modifiers, and then the base type. We would like to rely on this guarantee inside the kernel to make it simple to parse type tags from BTF. However, a user would be allowed to construct a BTF without such guarantees. Hence, add a pass to check that in modifier chains, type tags only occur at the head of the chain, and then don't occur later in the chain. If we see a type tag, we can have one or more type tags preceding other modifiers that then never have another type tag. If we see other modifiers, all modifiers following them should never be a type tag. Instead of having to walk chains we verified previously, we can remember the last good modifier type ID which headed a good chain. At that point, we must have verified all other chains headed by type IDs less than it. This makes the verification process less costly, and it becomes a simple O(n) pass. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220419164608.1990559-2-memxor@gmail.com
|
#
c29a4920 |
|
24-Mar-2022 |
Yuntao Wang <ytcoode@gmail.com> |
bpf: Fix maximum permitted number of arguments check Since the m->arg_size array can hold up to MAX_BPF_FUNC_ARGS argument sizes, it's ok that nargs is equal to MAX_BPF_FUNC_ARGS. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220324164238.1274915-1-ytcoode@gmail.com
|
#
583669ab |
|
20-Mar-2022 |
Yuntao Wang <ytcoode@gmail.com> |
bpf: Simplify check in btf_parse_hdr() Replace offsetof(hdr_len) + sizeof(hdr_len) with offsetofend(hdr_len) to simplify the check for correctness of btf_data_size in btf_parse_hdr() Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220320075240.1001728-1-ytcoode@gmail.com
|
#
7ada3787 |
|
20-Mar-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Check for NULL return from bpf_get_btf_vmlinux When CONFIG_DEBUG_INFO_BTF is disabled, bpf_get_btf_vmlinux can return a NULL pointer. Check for it in btf_get_module_btf to prevent a NULL pointer dereference. While kernel test robot only complained about this specific case, let's also check for NULL in other call sites of bpf_get_btf_vmlinux. Fixes: 9492450fd287 ("bpf: Always raise reference in btf_get_module_btf") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220320143003.589540-1-memxor@gmail.com
|
#
9492450f |
|
17-Mar-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Always raise reference in btf_get_module_btf Align it with helpers like bpf_find_btf_id, so all functions returning BTF in out parameter follow the same rule of raising reference consistently, regardless of module or vmlinux BTF. Adjust existing callers to handle the change accordinly. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220317115957.3193097-10-memxor@gmail.com
|
#
edc3ec09 |
|
17-Mar-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Factor out fd returning from bpf_btf_find_by_name_kind In next few patches, we need a helper that searches all kernel BTFs (vmlinux and module BTFs), and finds the type denoted by 'name' and 'kind'. Turns out bpf_btf_find_by_name_kind already does the same thing, but it instead returns a BTF ID and optionally fd (if module BTF). This is used for relocating ksyms in BPF loader code (bpftool gen skel -L). We extract the core code out into a new helper bpf_find_btf_id, which returns the BTF ID in the return value, and BTF pointer in an out parameter. The reference for the returned BTF pointer is always raised, hence user must either transfer it (e.g. to a fd), or release it after use. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220317115957.3193097-2-memxor@gmail.com
|
#
5844101a |
|
04-Mar-2022 |
Hao Luo <haoluo@google.com> |
bpf: Reject programs that try to load __percpu memory. With the introduction of the btf_type_tag "percpu", we can add a MEM_PERCPU to identify those pointers that point to percpu memory. The ability of differetiating percpu pointers from regular memory pointers have two benefits: 1. It forbids unexpected use of percpu pointers, such as direct loads. In kernel, there are special functions used for accessing percpu memory. Directly loading percpu memory is meaningless. We already have BPF helpers like bpf_per_cpu_ptr() and bpf_this_cpu_ptr() that wrap the kernel percpu functions. So we can now convert percpu pointers into regular pointers in a safe way. 2. Previously, bpf_per_cpu_ptr() and bpf_this_cpu_ptr() only work on PTR_TO_PERCPU_BTF_ID, a special reg_type which describes static percpu variables in kernel (we rely on pahole to encode them into vmlinux BTF). Now, since we can identify __percpu tagged pointers, we can also identify dynamically allocated percpu memory as well. It means we can use bpf_xxx_cpu_ptr() on dynamic percpu memory. This would be very convenient when accessing fields like "cgroup->rstat_cpu". Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220304191657.981240-4-haoluo@google.com
|
#
24d5bb80 |
|
04-Mar-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Harden register offset checks for release helpers and kfuncs Let's ensure that the PTR_TO_BTF_ID reg being passed in to release BPF helpers and kfuncs always has its offset set to 0. While not a real problem now, there's a very real possibility this will become a problem when more and more kfuncs are exposed, and more BPF helpers are added which can release PTR_TO_BTF_ID. Previous commits already protected against non-zero var_off. One of the case we are concerned about now is when we have a type that can be returned by e.g. an acquire kfunc: struct foo { int a; int b; struct bar b; }; ... and struct bar is also a type that can be returned by another acquire kfunc. Then, doing the following sequence: struct foo *f = bpf_get_foo(); // acquire kfunc if (!f) return 0; bpf_put_bar(&f->b); // release kfunc ... would work with the current code, since the btf_struct_ids_match takes reg->off into account for matching pointer type with release kfunc argument type, but would obviously be incorrect, and most likely lead to a kernel crash. A test has been included later to prevent regressions in this area. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-5-memxor@gmail.com
|
#
655efe50 |
|
04-Mar-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Fix PTR_TO_BTF_ID var_off check When kfunc support was added, check_ctx_reg was called for PTR_TO_CTX register, but no offset checks were made for PTR_TO_BTF_ID. Only reg->off was taken into account by btf_struct_ids_match, which protected against type mismatch due to non-zero reg->off, but when reg->off was zero, a user could set the variable offset of the register and allow it to be passed to kfunc, leading to bad pointer being passed into the kernel. Fix this by reusing the extracted helper check_func_arg_reg_off from previous commit, and make one call before checking all supported register types. Since the list is maintained, any future changes will be taken into account by updating check_func_arg_reg_off. This function prevents non-zero var_off to be set for PTR_TO_BTF_ID, but still allows a fixed non-zero reg->off, which is needed for type matching to work correctly when using pointer arithmetic. ARG_DONTCARE is passed as arg_type, since kfunc doesn't support accepting a ARG_PTR_TO_ALLOC_MEM without relying on size of parameter type from BTF (in case of pointer), or using a mem, len pair. The forcing of offset check for ARG_PTR_TO_ALLOC_MEM is done because ringbuf helpers obtain the size from the header located at the beginning of the memory region, hence any changes to the original pointer shouldn't be allowed. In case of kfunc, size is always known, either at verification time, or using the length parameter, hence this forcing is not required. Since this check will happen once already for PTR_TO_CTX, remove the check_ptr_off_reg call inside its block. Fixes: e6ac2450d6de ("bpf: Support bpf program calling kernel function") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-3-memxor@gmail.com
|
#
5e214f2e |
|
22-Feb-2022 |
Connor O'Brien <connoro@google.com> |
bpf: Add config to allow loading modules with BTF mismatches BTF mismatch can occur for a separately-built module even when the ABI is otherwise compatible and nothing else would prevent successfully loading. Add a new Kconfig to control how mismatches are handled. By default, preserve the current behavior of refusing to load the module. If MODULE_ALLOW_BTF_MISMATCH is enabled, load the module but ignore its BTF information. Suggested-by: Yonghong Song <yhs@fb.com> Suggested-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/CAADnVQJ+OVPnBz8z3vNu8gKXX42jCUqfuvhWAyCQDu8N_yqqwQ@mail.gmail.com Link: https://lore.kernel.org/bpf/20220223012814.1898677-1-connoro@google.com
|
#
c561d110 |
|
20-Feb-2022 |
Tom Rix <trix@redhat.com> |
bpf: Cleanup comments Add leading space to spdx tag Use // for spdx c file comment Replacements resereved to reserved inbetween to in between everytime to every time intutivie to intuitive currenct to current encontered to encountered referenceing to referencing upto to up to exectuted to executed Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220220184055.3608317-1-trix@redhat.com
|
#
d0b38229 |
|
19-Feb-2022 |
Souptick Joarder (HPE) <jrdr.linux@gmail.com> |
bpf: Initialize ret to 0 inside btf_populate_kfunc_set() Kernel test robot reported below error -> kernel/bpf/btf.c:6718 btf_populate_kfunc_set() error: uninitialized symbol 'ret'. Initialize ret to 0. Fixes: dee872e124e8 ("bpf: Populate kfunc BTF ID sets in struct btf") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Souptick Joarder (HPE) <jrdr.linux@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20220219163915.125770-1-jrdr.linux@gmail.com
|
#
adb8fa19 |
|
15-Feb-2022 |
Mauricio Vásquez <mauricio@kinvolk.io> |
libbpf: Split bpf_core_apply_relo() BTFGen needs to run the core relocation logic in order to understand what are the types involved in a given relocation. Currently bpf_core_apply_relo() calculates and **applies** a relocation to an instruction. Having both operations in the same function makes it difficult to only calculate the relocation without patching the instruction. This commit splits that logic in two different phases: (1) calculate the relocation and (2) patch the instruction. For the first phase bpf_core_apply_relo() is renamed to bpf_core_calc_relo_insn() who is now only on charge of calculating the relocation, the second phase uses the already existing bpf_core_patch_insn(). bpf_object__relocate_core() uses both of them and the BTFGen will use only bpf_core_calc_relo_insn(). Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com> Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co> Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220215225856.671072-2-mauricio@kinvolk.io
|
#
e70e13e7 |
|
03-Feb-2022 |
Matteo Croce <mcroce@microsoft.com> |
bpf: Implement bpf_core_types_are_compat(). Adopt libbpf's bpf_core_types_are_compat() for kernel duty by adding explicit recursion limit of 2 which is enough to handle 2 levels of function prototypes. Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220204005519.60361-2-mcroce@linux.microsoft.com
|
#
d7e7b42f |
|
03-Feb-2022 |
Yonghong Song <yhs@fb.com> |
bpf: Fix a btf decl_tag bug when tagging a function syzbot reported a btf decl_tag bug with stack trace below: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 3592 Comm: syz-executor914 Not tainted 5.16.0-syzkaller-11424-gb7892f7d5cb2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:btf_type_vlen include/linux/btf.h:231 [inline] RIP: 0010:btf_decl_tag_resolve+0x83e/0xaa0 kernel/bpf/btf.c:3910 ... Call Trace: <TASK> btf_resolve+0x251/0x1020 kernel/bpf/btf.c:4198 btf_check_all_types kernel/bpf/btf.c:4239 [inline] btf_parse_type_sec kernel/bpf/btf.c:4280 [inline] btf_parse kernel/bpf/btf.c:4513 [inline] btf_new_fd+0x19fe/0x2370 kernel/bpf/btf.c:6047 bpf_btf_load kernel/bpf/syscall.c:4039 [inline] __sys_bpf+0x1cbb/0x5970 kernel/bpf/syscall.c:4679 __do_sys_bpf kernel/bpf/syscall.c:4738 [inline] __se_sys_bpf kernel/bpf/syscall.c:4736 [inline] __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4736 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae The kasan error is triggered with an illegal BTF like below: type 0: void type 1: int type 2: decl_tag to func type 3 type 3: func to func_proto type 8 The total number of types is 4 and the type 3 is illegal since its func_proto type is out of range. Currently, the target type of decl_tag can be struct/union, var or func. Both struct/union and var implemented their own 'resolve' callback functions and hence handled properly in kernel. But func type doesn't have 'resolve' callback function. When btf_decl_tag_resolve() tries to check func type, it tries to get vlen of its func_proto type, which triggered the above kasan error. To fix the issue, btf_decl_tag_resolve() needs to do btf_func_check() before trying to accessing func_proto type. In the current implementation, func type is checked with btf_func_check() in the main checking function btf_check_all_types(). To fix the above kasan issue, let us implement 'resolve' callback func type properly. The 'resolve' callback will be also called in btf_check_all_types() for func types. Fixes: b5ea834dde6b ("bpf: Support for new btf kind BTF_KIND_TAG") Reported-by: syzbot+53619be9444215e785ed@syzkaller.appspotmail.com Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220203191727.741862-1-yhs@fb.com
|
#
c6f1bfe8 |
|
27-Jan-2022 |
Yonghong Song <yhs@fb.com> |
bpf: reject program if a __user tagged memory accessed in kernel way BPF verifier supports direct memory access for BPF_PROG_TYPE_TRACING type of bpf programs, e.g., a->b. If "a" is a pointer pointing to kernel memory, bpf verifier will allow user to write code in C like a->b and the verifier will translate it to a kernel load properly. If "a" is a pointer to user memory, it is expected that bpf developer should be bpf_probe_read_user() helper to get the value a->b. Without utilizing BTF __user tagging information, current verifier will assume that a->b is a kernel memory access and this may generate incorrect result. Now BTF contains __user information, it can check whether the pointer points to a user memory or not. If it is, the verifier can reject the program and force users to use bpf_probe_read_user() helper explicitly. In the future, we can easily extend btf_add_space for other address space tagging, for example, rcu/percpu etc. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220127154606.654961-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
c446fdac |
|
25-Jan-2022 |
Stanislav Fomichev <sdf@google.com> |
bpf: fix register_btf_kfunc_id_set for !CONFIG_DEBUG_INFO_BTF Commit dee872e124e8 ("bpf: Populate kfunc BTF ID sets in struct btf") breaks loading of some modules when CONFIG_DEBUG_INFO_BTF is not set. register_btf_kfunc_id_set returns -ENOENT to the callers when there is no module btf. Let's return 0 (success) instead to let those modules work in !CONFIG_DEBUG_INFO_BTF cases. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Fixes: dee872e124e8 ("bpf: Populate kfunc BTF ID sets in struct btf") Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220126001340.1573649-1-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
5c073f26 |
|
14-Jan-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Add reference tracking support to kfunc This patch adds verifier support for PTR_TO_BTF_ID return type of kfunc to be a reference, by reusing acquire_reference_state/release_reference support for existing in-kernel bpf helpers. We make use of the three kfunc types: - BTF_KFUNC_TYPE_ACQUIRE Return true if kfunc_btf_id is an acquire kfunc. This will acquire_reference_state for the returned PTR_TO_BTF_ID (this is the only allow return value). Note that acquire kfunc must always return a PTR_TO_BTF_ID{_OR_NULL}, otherwise the program is rejected. - BTF_KFUNC_TYPE_RELEASE Return true if kfunc_btf_id is a release kfunc. This will release the reference to the passed in PTR_TO_BTF_ID which has a reference state (from earlier acquire kfunc). The btf_check_func_arg_match returns the regno (of argument register, hence > 0) if the kfunc is a release kfunc, and a proper referenced PTR_TO_BTF_ID is being passed to it. This is similar to how helper call check uses bpf_call_arg_meta to store the ref_obj_id that is later used to release the reference. Similar to in-kernel helper, we only allow passing one referenced PTR_TO_BTF_ID as an argument. It can also be passed in to normal kfunc, but in case of release kfunc there must always be one PTR_TO_BTF_ID argument that is referenced. - BTF_KFUNC_TYPE_RET_NULL For kfunc returning PTR_TO_BTF_ID, tells if it can be NULL, hence force caller to mark the pointer not null (using check) before accessing it. Note that taking into account the case fixed by commit 93c230e3f5bd ("bpf: Enforce id generation for all may-be-null register type") we assign a non-zero id for mark_ptr_or_null_reg logic. Later, if more return types are supported by kfunc, which have a _OR_NULL variant, it might be better to move this id generation under a common reg_type_may_be_null check, similar to the case in the commit. Referenced PTR_TO_BTF_ID is currently only limited to kfunc, but can be extended in the future to other BPF helpers as well. For now, we can rely on the btf_struct_ids_match check to ensure we get the pointer to the expected struct type. In the future, care needs to be taken to avoid ambiguity for reference PTR_TO_BTF_ID passed to release function, in case multiple candidates can release same BTF ID. e.g. there might be two release kfuncs (or kfunc and helper): foo(struct abc *p); bar(struct abc *p); ... such that both release a PTR_TO_BTF_ID with btf_id of struct abc. In this case we would need to track the acquire function corresponding to the release function to avoid type confusion, and store this information in the register state so that an incorrect program can be rejected. This is not a problem right now, hence it is left as an exercise for the future patch introducing such a case in the kernel. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
d583691c |
|
14-Jan-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Introduce mem, size argument pair support for kfunc BPF helpers can associate two adjacent arguments together to pass memory of certain size, using ARG_PTR_TO_MEM and ARG_CONST_SIZE arguments. Since we don't use bpf_func_proto for kfunc, we need to leverage BTF to implement similar support. The ARG_CONST_SIZE processing for helpers is refactored into a common check_mem_size_reg helper that is shared with kfunc as well. kfunc ptr_to_mem support follows logic similar to global functions, where verification is done as if pointer is not null, even when it may be null. This leads to a simple to follow rule for writing kfunc: always check the argument pointer for NULL, except when it is PTR_TO_CTX. Also, the PTR_TO_CTX case is also only safe when the helper expecting pointer to program ctx is not exposed to other programs where same struct is not ctx type. In that case, the type check will fall through to other cases and would permit passing other types of pointers, possibly NULL at runtime. Currently, we require the size argument to be suffixed with "__sz" in the parameter name. This information is then recorded in kernel BTF and verified during function argument checking. In the future we can use BTF tagging instead, and modify the kernel function definitions. This will be a purely kernel-side change. This allows us to have some form of backwards compatibility for structures that are passed in to the kernel function with their size, and allow variable length structures to be passed in if they are accompanied by a size parameter. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
b202d844 |
|
14-Jan-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Remove check_kfunc_call callback and old kfunc BTF ID API Completely remove the old code for check_kfunc_call to help it work with modules, and also the callback itself. The previous commit adds infrastructure to register all sets and put them in vmlinux or module BTF, and concatenates all related sets organized by the hook and the type. Once populated, these sets remain immutable for the lifetime of the struct btf. Also, since we don't need the 'owner' module anywhere when doing check_kfunc_call, drop the 'btf_modp' module parameter from find_kfunc_desc_btf. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
dee872e1 |
|
14-Jan-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Populate kfunc BTF ID sets in struct btf This patch prepares the kernel to support putting all kinds of kfunc BTF ID sets in the struct btf itself. The various kernel subsystems will make register_btf_kfunc_id_set call in the initcalls (for built-in code and modules). The 'hook' is one of the many program types, e.g. XDP and TC/SCHED_CLS, STRUCT_OPS, and 'types' are check (allowed or not), acquire, release, and ret_null (with PTR_TO_BTF_ID_OR_NULL return type). A maximum of BTF_KFUNC_SET_MAX_CNT (32) kfunc BTF IDs are permitted in a set of certain hook and type for vmlinux sets, since they are allocated on demand, and otherwise set as NULL. Module sets can only be registered once per hook and type, hence they are directly assigned. A new btf_kfunc_id_set_contains function is exposed for use in verifier, this new method is faster than the existing list searching method, and is also automatic. It also lets other code not care whether the set is unallocated or not. Note that module code can only do single register_btf_kfunc_id_set call per hook. This is why sorting is only done for in-kernel vmlinux sets, because there might be multiple sets for the same hook and type that must be concatenated, hence sorting them is required to ensure bsearch in btf_id_set_contains continues to work correctly. Next commit will update the kernel users to make use of this infrastructure. Finally, add __maybe_unused annotation for BTF ID macros for the !CONFIG_DEBUG_INFO_BTF case, so that they don't produce warnings during build time. The previous patch is also needed to provide synchronization against initialization for module BTF's kfunc_set_tab introduced here, as described below: The kfunc_set_tab pointer in struct btf is write-once (if we consider the registration phase (comprised of multiple register_btf_kfunc_id_set calls) as a single operation). In this sense, once it has been fully prepared, it isn't modified, only used for lookup (from the verifier context). For btf_vmlinux, it is initialized fully during the do_initcalls phase, which happens fairly early in the boot process, before any processes are present. This also eliminates the possibility of bpf_check being called at that point, thus relieving us of ensuring any synchronization between the registration and lookup function (btf_kfunc_id_set_contains). However, the case for module BTF is a bit tricky. The BTF is parsed, prepared, and published from the MODULE_STATE_COMING notifier callback. After this, the module initcalls are invoked, where our registration function will be called to populate the kfunc_set_tab for module BTF. At this point, BTF may be available to userspace while its corresponding module is still intializing. A BTF fd can then be passed to verifier using bpf syscall (e.g. for kfunc call insn). Hence, there is a race window where verifier may concurrently try to lookup the kfunc_set_tab. To prevent this race, we must ensure the operations are serialized, or waiting for the __init functions to complete. In the earlier registration API, this race was alleviated as verifier bpf_check_mod_kfunc_call didn't find the kfunc BTF ID until it was added by the registration function (called usually at the end of module __init function after all module resources have been initialized). If the verifier made the check_kfunc_call before kfunc BTF ID was added to the list, it would fail verification (saying call isn't allowed). The access to list was protected using a mutex. Now, it would still fail verification, but for a different reason (returning ENXIO due to the failed btf_try_get_module call in add_kfunc_call), because if the __init call is in progress the module will be in the middle of MODULE_STATE_COMING -> MODULE_STATE_LIVE transition, and the BTF_MODULE_LIVE flag for btf_module instance will not be set, so the btf_try_get_module call will fail. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
18688de2 |
|
14-Jan-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Fix UAF due to race between btf_try_get_module and load_module While working on code to populate kfunc BTF ID sets for module BTF from its initcall, I noticed that by the time the initcall is invoked, the module BTF can already be seen by userspace (and the BPF verifier). The existing btf_try_get_module calls try_module_get which only fails if mod->state == MODULE_STATE_GOING, i.e. it can increment module reference when module initcall is happening in parallel. Currently, BTF parsing happens from MODULE_STATE_COMING notifier callback. At this point, the module initcalls have not been invoked. The notifier callback parses and prepares the module BTF, allocates an ID, which publishes it to userspace, and then adds it to the btf_modules list allowing the kernel to invoke btf_try_get_module for the BTF. However, at this point, the module has not been fully initialized (i.e. its initcalls have not finished). The code in module.c can still fail and free the module, without caring for other users. However, nothing stops btf_try_get_module from succeeding between the state transition from MODULE_STATE_COMING to MODULE_STATE_LIVE. This leads to a use-after-free issue when BPF program loads successfully in the state transition, load_module's do_init_module call fails and frees the module, and BPF program fd on close calls module_put for the freed module. Future patch has test case to verify we don't regress in this area in future. There are multiple points after prepare_coming_module (in load_module) where failure can occur and module loading can return error. We illustrate and test for the race using the last point where it can practically occur (in module __init function). An illustration of the race: CPU 0 CPU 1 load_module notifier_call(MODULE_STATE_COMING) btf_parse_module btf_alloc_id // Published to userspace list_add(&btf_mod->list, btf_modules) mod->init(...) ... ^ bpf_check | check_pseudo_btf_id | btf_try_get_module | returns true | ... ... | module __init in progress return prog_fd | ... ... V if (ret < 0) free_module(mod) ... close(prog_fd) ... bpf_prog_free_deferred module_put(used_btf.mod) // use-after-free We fix this issue by setting a flag BTF_MODULE_F_LIVE, from the notifier callback when MODULE_STATE_LIVE state is reached for the module, so that we return NULL from btf_try_get_module for modules that are not fully formed. Since try_module_get already checks that module is not in MODULE_STATE_GOING state, and that is the only transition a live module can make before being removed from btf_modules list, this is enough to close the race and prevent the bug. A later selftest patch crafts the race condition artifically to verify that it has been fixed, and that verifier fails to load program (with ENXIO). Lastly, a couple of comments: 1. Even if this race didn't exist, it seems more appropriate to only access resources (ksyms and kfuncs) of a fully formed module which has been initialized completely. 2. This patch was born out of need for synchronization against module initcall for the next patch, so it is needed for correctness even without the aforementioned race condition. The BTF resources initialized by module initcall are set up once and then only looked up, so just waiting until the initcall has finished ensures correct behavior. Fixes: 541c3bad8dc5 ("bpf: Support BPF ksym variables in kernel modules") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
45ce4b4f |
|
16-Feb-2022 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Fix crash due to out of bounds access into reg2btf_ids. When commit e6ac2450d6de ("bpf: Support bpf program calling kernel function") added kfunc support, it defined reg2btf_ids as a cheap way to translate the verifier reg type to the appropriate btf_vmlinux BTF ID, however commit c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL") moved the __BPF_REG_TYPE_MAX from the last member of bpf_reg_type enum to after the base register types, and defined other variants using type flag composition. However, now, the direct usage of reg->type to index into reg2btf_ids may no longer fall into __BPF_REG_TYPE_MAX range, and hence lead to out of bounds access and kernel crash on dereference of bad pointer. Fixes: c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220216201943.624869-1-memxor@gmail.com
|
#
be80a1d3 |
|
10-Jan-2022 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: Generalize check_ctx_reg for reuse with other types Generalize the check_ctx_reg() helper function into a more generic named one so that it can be reused for other register types as well to check whether their offset is non-zero. No functional change. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
|
#
3363bd0c |
|
16-Dec-2021 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support Allow passing PTR_TO_CTX, if the kfunc expects a matching struct type, and punt to PTR_TO_MEM block if reg->type does not fall in one of PTR_TO_BTF_ID or PTR_TO_SOCK* types. This will be used by future commits to get access to XDP and TC PTR_TO_CTX, and pass various data (flags, l4proto, netns_id, etc.) encoded in opts struct passed as pointer to kfunc. For PTR_TO_MEM support, arguments are currently limited to pointer to scalar, or pointer to struct composed of scalars. This is done so that unsafe scenarios (like passing PTR_TO_MEM where PTR_TO_BTF_ID of in-kernel valid structure is expected, which may have pointers) are avoided. Since the argument checking happens basd on argument register type, it is not easy to ascertain what the expected type is. In the future, support for PTR_TO_MEM for kfunc can be extended to serve other usecases. The struct type whose pointer is passed in may have maximum nesting depth of 4, all recursively composed of scalars or struct with scalars. Future commits will add negative tests that check whether these restrictions imposed for kfunc arguments are duly rejected by BPF verifier or not. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217015031.1278167-4-memxor@gmail.com
|
#
216e3cd2 |
|
16-Dec-2021 |
Hao Luo <haoluo@google.com> |
bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem. Some helper functions may modify its arguments, for example, bpf_d_path, bpf_get_stack etc. Previously, their argument types were marked as ARG_PTR_TO_MEM, which is compatible with read-only mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate, but technically incorrect, to modify a read-only memory by passing it into one of such helper functions. This patch tags the bpf_args compatible with immutable memory with MEM_RDONLY flag. The arguments that don't have this flag will be only compatible with mutable memory types, preventing the helper from modifying a read-only memory. The bpf_args that have MEM_RDONLY are compatible with both mutable memory and immutable memory. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-9-haoluo@google.com
|
#
cf9f2f8d |
|
16-Dec-2021 |
Hao Luo <haoluo@google.com> |
bpf: Convert PTR_TO_MEM_OR_NULL to composable types. Remove PTR_TO_MEM_OR_NULL and replace it with PTR_TO_MEM combined with flag PTR_MAYBE_NULL. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-7-haoluo@google.com
|
#
20b2aff4 |
|
16-Dec-2021 |
Hao Luo <haoluo@google.com> |
bpf: Introduce MEM_RDONLY flag This patch introduce a flag MEM_RDONLY to tag a reg value pointing to read-only memory. It makes the following changes: 1. PTR_TO_RDWR_BUF -> PTR_TO_BUF 2. PTR_TO_RDONLY_BUF -> PTR_TO_BUF | MEM_RDONLY Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-6-haoluo@google.com
|
#
c25b2ae1 |
|
16-Dec-2021 |
Hao Luo <haoluo@google.com> |
bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL We have introduced a new type to make bpf_reg composable, by allocating bits in the type to represent flags. One of the flags is PTR_MAYBE_NULL which indicates a pointer may be NULL. This patch switches the qualified reg_types to use this flag. The reg_types changed in this patch include: 1. PTR_TO_MAP_VALUE_OR_NULL 2. PTR_TO_SOCKET_OR_NULL 3. PTR_TO_SOCK_COMMON_OR_NULL 4. PTR_TO_TCP_SOCK_OR_NULL 5. PTR_TO_BTF_ID_OR_NULL 6. PTR_TO_MEM_OR_NULL 7. PTR_TO_RDONLY_BUF_OR_NULL 8. PTR_TO_RDWR_BUF_OR_NULL Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211217003152.48334-5-haoluo@google.com
|
#
bb6728d7 |
|
08-Dec-2021 |
Jiri Olsa <jolsa@redhat.com> |
bpf: Allow access to int pointer arguments in tracing programs Adding support to access arguments with int pointer arguments in tracing programs. Currently we allow tracing programs to access only pointers to string (char pointer), void pointers and pointers to structs. If we try to access argument which is pointer to int, verifier will fail to load the program with; R1 type=ctx expected=fp ; int BPF_PROG(fmod_ret_test, int _a, __u64 _b, int _ret) 0: (bf) r6 = r1 ; int BPF_PROG(fmod_ret_test, int _a, __u64 _b, int _ret) 1: (79) r9 = *(u64 *)(r6 +8) func 'bpf_modify_return_test' arg1 type INT is not a struct There is no harm for the program to access int pointer argument. We are already doing that for string pointer, which is pointer to int with 1 byte size. Changing the is_string_ptr to generic integer check and renaming it to btf_type_is_int. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211208193245.172141-2-jolsa@kernel.org
|
#
f18a4997 |
|
11-Dec-2021 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Silence coverity false positive warning. Coverity issued the following warning: 6685 cands = bpf_core_add_cands(cands, main_btf, 1); 6686 if (IS_ERR(cands)) >>> CID 1510300: (RETURN_LOCAL) >>> Returning pointer "cands" which points to local variable "local_cand". 6687 return cands; It's a false positive. Add ERR_CAST() to silence it. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
4674f210 |
|
08-Dec-2021 |
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> |
bpf: Use kmemdup() to replace kmalloc + memcpy Eliminate the follow coccicheck warning: ./kernel/bpf/btf.c:6537:13-20: WARNING opportunity for kmemdup. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1639030882-92383-1-git-send-email-jiapeng.chong@linux.alibaba.com
|
#
73b6eae5 |
|
07-Dec-2021 |
Colin Ian King <colin.king@intel.com> |
bpf: Remove redundant assignment to pointer t The pointer t is being initialized with a value that is never read. The pointer is re-assigned a value a littler later on, hence the initialization is redundant and can be removed. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211207224718.59593-1-colin.i.king@gmail.com
|
#
29f2e5bd |
|
06-Dec-2021 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Silence purge_cand_cache build warning. When CONFIG_DEBUG_INFO_BTF_MODULES is not set the following warning can be seen: kernel/bpf/btf.c:6588:13: warning: 'purge_cand_cache' defined but not used [-Wunused-function] Fix it. Fixes: 1e89106da253 ("bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211207014839.6976-1-alexei.starovoitov@gmail.com
|
#
866de407 |
|
02-Dec-2021 |
Hou Tao <houtao1@huawei.com> |
bpf: Disallow BPF_LOG_KERNEL log level for bpf(BPF_BTF_LOAD) BPF_LOG_KERNEL is only used internally, so disallow bpf_btf_load() to set log level as BPF_LOG_KERNEL. The same checking has already been done in bpf_check(), so factor out a helper to check the validity of log attributes and use it in both places. Fixes: 8580ac9404f6 ("bpf: Process in-kernel BTF") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20211203053001.740945-1-houtao1@huawei.com
|
#
78c1f8d0 |
|
03-Dec-2021 |
Alexei Starovoitov <ast@kernel.org> |
libbpf: Reduce bpf_core_apply_relo_insn() stack usage. Reduce bpf_core_apply_relo_insn() stack usage and bump BPF_CORE_SPEC_MAX_LEN limit back to 64. Fixes: 29db4bea1d10 ("bpf: Prepare relo_core.c for kernel duty.") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211203182836.16646-1-alexei.starovoitov@gmail.com
|
#
1e89106d |
|
01-Dec-2021 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn(). Given BPF program's BTF root type name perform the following steps: . search in vmlinux candidate cache. . if (present in cache and candidate list >= 1) return candidate list. . do a linear search through kernel BTFs for possible candidates. . regardless of number of candidates found populate vmlinux cache. . if (candidate list >= 1) return candidate list. . search in module candidate cache. . if (present in cache) return candidate list (even if list is empty). . do a linear search through BTFs of all kernel modules collecting candidates from all of them. . regardless of number of candidates found populate module cache. . return candidate list. Then wire the result into bpf_core_apply_relo_insn(). When BPF program is trying to CO-RE relocate a type that doesn't exist in either vmlinux BTF or in modules BTFs these steps will perform 2 cache lookups when cache is hit. Note the cache doesn't prevent the abuse by the program that might have lots of relocations that cannot be resolved. Hence cond_resched(). CO-RE in the kernel requires CAP_BPF, since BTF loading requires it. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211201181040.23337-9-alexei.starovoitov@gmail.com
|
#
c5a2d43e |
|
01-Dec-2021 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Adjust BTF log size limit. Make BTF log size limit to be the same as the verifier log size limit. Otherwise tools that progressively increase log size and use the same log for BTF loading and program loading will be hitting hard to debug EINVAL. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211201181040.23337-7-alexei.starovoitov@gmail.com
|
#
fbd94c7a |
|
01-Dec-2021 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Pass a set of bpf_core_relo-s to prog_load command. struct bpf_core_relo is generated by llvm and processed by libbpf. It's a de-facto uapi. With CO-RE in the kernel the struct bpf_core_relo becomes uapi de-jure. Add an ability to pass a set of 'struct bpf_core_relo' to prog_load command and let the kernel perform CO-RE relocations. Note the struct bpf_line_info and struct bpf_func_info have the same layout when passed from LLVM to libbpf and from libbpf to the kernel except "insn_off" fields means "byte offset" when LLVM generates it. Then libbpf converts it to "insn index" to pass to the kernel. The struct bpf_core_relo's "insn_off" field is always "byte offset". Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211201181040.23337-6-alexei.starovoitov@gmail.com
|
#
29db4bea |
|
01-Dec-2021 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Prepare relo_core.c for kernel duty. Make relo_core.c to be compiled for the kernel and for user space libbpf. Note the patch is reducing BPF_CORE_SPEC_MAX_LEN from 64 to 32. This is the maximum number of nested structs and arrays. For example: struct sample { int a; struct { int b[10]; }; }; struct sample *s = ...; int *y = &s->b[5]; This field access is encoded as "0:1:0:5" and spec len is 4. The follow up patch might bump it back to 64. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211201181040.23337-4-alexei.starovoitov@gmail.com
|
#
8293eb99 |
|
01-Dec-2021 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Rename btf_member accessors. Rename btf_member_bit_offset() and btf_member_bitfield_size() to avoid conflicts with similarly named helpers in libbpf's btf.h. Rename the kernel helpers, since libbpf helpers are part of uapi. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211201181040.23337-3-alexei.starovoitov@gmail.com
|
#
d19ddb47 |
|
12-Nov-2021 |
Song Liu <songliubraving@fb.com> |
bpf: Introduce btf_tracing_ids Similar to btf_sock_ids, btf_tracing_ids provides btf ID for task_struct, file, and vm_area_struct via easy to understand format like btf_tracing_ids[BTF_TRACING_TYPE_[TASK|file|VMA]]. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211112150243.1270987-3-songliubraving@fb.com
|
#
9e2ad638 |
|
12-Nov-2021 |
Song Liu <songliubraving@fb.com> |
bpf: Extend BTF_ID_LIST_GLOBAL with parameter for number of IDs syzbot reported the following BUG w/o CONFIG_DEBUG_INFO_BTF BUG: KASAN: global-out-of-bounds in task_iter_init+0x212/0x2e7 kernel/bpf/task_iter.c:661 Read of size 4 at addr ffffffff90297404 by task swapper/0/1 CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.15.0-syzkaller #0 Hardware name: ... Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0xf/0x309 mm/kasan/report.c:256 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:459 task_iter_init+0x212/0x2e7 kernel/bpf/task_iter.c:661 do_one_initcall+0x103/0x650 init/main.c:1295 do_initcall_level init/main.c:1368 [inline] do_initcalls init/main.c:1384 [inline] do_basic_setup init/main.c:1403 [inline] kernel_init_freeable+0x6b1/0x73a init/main.c:1606 kernel_init+0x1a/0x1d0 init/main.c:1497 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> This is caused by hard-coded name[1] in BTF_ID_LIST_GLOBAL (w/o CONFIG_DEBUG_INFO_BTF). Fix this by adding a parameter n to BTF_ID_LIST_GLOBAL. This avoids ifdef CONFIG_DEBUG_INFO_BTF in btf.c and filter.c. Fixes: 7c7e3d31e785 ("bpf: Introduce helper bpf_find_vma") Reported-by: syzbot+e0d81ec552a21d9071aa@syzkaller.appspotmail.com Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211112150243.1270987-2-songliubraving@fb.com
|
#
8c42d2fa |
|
11-Nov-2021 |
Yonghong Song <yhs@fb.com> |
bpf: Support BTF_KIND_TYPE_TAG for btf_type_tag attributes LLVM patches ([1] for clang, [2] and [3] for BPF backend) added support for btf_type_tag attributes. This patch added support for the kernel. The main motivation for btf_type_tag is to bring kernel annotations __user, __rcu etc. to btf. With such information available in btf, bpf verifier can detect mis-usages and reject the program. For example, for __user tagged pointer, developers can then use proper helper like bpf_probe_read_user() etc. to read the data. BTF_KIND_TYPE_TAG may also useful for other tracing facility where instead of to require user to specify kernel/user address type, the kernel can detect it by itself with btf. [1] https://reviews.llvm.org/D111199 [2] https://reviews.llvm.org/D113222 [3] https://reviews.llvm.org/D113496 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211112012609.1505032-1-yhs@fb.com
|
#
7c7e3d31 |
|
05-Nov-2021 |
Song Liu <songliubraving@fb.com> |
bpf: Introduce helper bpf_find_vma In some profiler use cases, it is necessary to map an address to the backing file, e.g., a shared library. bpf_find_vma helper provides a flexible way to achieve this. bpf_find_vma maps an address of a task to the vma (vm_area_struct) for this address, and feed the vma to an callback BPF function. The callback function is necessary here, as we need to ensure mmap_sem is unlocked. It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem safely when irqs are disable, we use the same mechanism as stackmap with build_id. Specifically, when irqs are disabled, the unlocked is postponed in an irq_work. Refactor stackmap.c so that the irq_work is shared among bpf_find_vma and stackmap helpers. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Hengqi Chen <hengqi.chen@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211105232330.1936330-2-songliubraving@fb.com
|
#
b12f0310 |
|
22-Nov-2021 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Fix bpf_check_mod_kfunc_call for built-in modules When module registering its set is built-in, THIS_MODULE will be NULL, hence we cannot return early in case owner is NULL. Fixes: 14f267d95fe4 ("bpf: btf: Introduce helpers for dynamic BTF set registration") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211122144742.477787-3-memxor@gmail.com
|
#
d9847eb8 |
|
22-Nov-2021 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL Vinicius Costa Gomes reported [0] that build fails when CONFIG_DEBUG_INFO_BTF is enabled and CONFIG_BPF_SYSCALL is disabled. This leads to btf.c not being compiled, and then no symbol being present in vmlinux for the declarations in btf.h. Since BTF is not useful without enabling BPF subsystem, disallow this combination. However, theoretically disabling both now could still fail, as the symbol for kfunc_btf_id_list variables is not available. This isn't a problem as the compiler usually optimizes the whole register/unregister call, but at lower optimization levels it can fail the build in linking stage. Fix that by adding dummy variables so that modules taking address of them still work, but the whole thing is a noop. [0]: https://lore.kernel.org/bpf/20211110205418.332403-1-vinicius.gomes@intel.com Fixes: 14f267d95fe4 ("bpf: btf: Introduce helpers for dynamic BTF set registration") Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211122144742.477787-2-memxor@gmail.com
|
#
bd16dee6 |
|
21-Oct-2021 |
Yonghong Song <yhs@fb.com> |
bpf: Add BTF_KIND_DECL_TAG typedef support The llvm patches ([1], [2]) added support to attach btf_decl_tag attributes to typedef declarations. This patch added support in kernel. [1] https://reviews.llvm.org/D110127 [2] https://reviews.llvm.org/D112259 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211021195628.4018847-1-yhs@fb.com
|
#
223f903e |
|
12-Oct-2021 |
Yonghong Song <yhs@fb.com> |
bpf: Rename BTF_KIND_TAG to BTF_KIND_DECL_TAG Patch set [1] introduced BTF_KIND_TAG to allow tagging declarations for struct/union, struct/union field, var, func and func arguments and these tags will be encoded into dwarf. They are also encoded to btf by llvm for the bpf target. After BTF_KIND_TAG is introduced, we intended to use it for kernel __user attributes. But kernel __user is actually a type attribute. Upstream and internal discussion showed it is not a good idea to mix declaration attribute and type attribute. So we proposed to introduce btf_type_tag as a type attribute and existing btf_tag renamed to btf_decl_tag ([2]). This patch renamed BTF_KIND_TAG to BTF_KIND_DECL_TAG and some other declarations with *_tag to *_decl_tag to make it clear the tag is for declaration. In the future, BTF_KIND_TYPE_TAG might be introduced per [3]. [1] https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/ [2] https://reviews.llvm.org/D111588 [3] https://reviews.llvm.org/D111199 Fixes: b5ea834dde6b ("bpf: Support for new btf kind BTF_KIND_TAG") Fixes: 5b84bd10363e ("libbpf: Add support for BTF_KIND_TAG") Fixes: 5c07f2fec003 ("bpftool: Add support for BTF_KIND_TAG") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com
|
#
c48e51c8 |
|
01-Oct-2021 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: selftests: Add selftests for module kfunc support This adds selftests that tests the success and failure path for modules kfuncs (in presence of invalid kfunc calls) for both libbpf and gen_loader. It also adds a prog_test kfunc_btf_id_list so that we can add module BTF ID set from bpf_testmod. This also introduces a couple of test cases to verifier selftests for validating whether we get an error or not depending on if invalid kfunc call remains after elimination of unreachable instructions. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-10-memxor@gmail.com
|
#
0e32dfc8 |
|
01-Oct-2021 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: Enable TCP congestion control kfunc from modules This commit moves BTF ID lookup into the newly added registration helper, in a way that the bbr, cubic, and dctcp implementation set up their sets in the bpf_tcp_ca kfunc_btf_set list, while the ones not dependent on modules are looked up from the wrapper function. This lifts the restriction for them to be compiled as built in objects, and can be loaded as modules if required. Also modify Makefile.modfinal to call resolve_btfids for each module. Note that since kernel kfunc_ids never overlap with module kfunc_ids, we only match the owner for module btf id sets. See following commits for background on use of: CONFIG_X86 ifdef: 569c484f9995 (bpf: Limit static tcp-cc functions in the .BTF_ids list to x86) CONFIG_DYNAMIC_FTRACE ifdef: 7aae231ac93b (bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACE) Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-6-memxor@gmail.com
|
#
14f267d9 |
|
01-Oct-2021 |
Kumar Kartikeya Dwivedi <memxor@gmail.com> |
bpf: btf: Introduce helpers for dynamic BTF set registration This adds helpers for registering btf_id_set from modules and the bpf_check_mod_kfunc_call callback that can be used to look them up. With in kernel sets, the way this is supposed to work is, in kernel callback looks up within the in-kernel kfunc whitelist, and then defers to the dynamic BTF set lookup if it doesn't find the BTF id. If there is no in-kernel BTF id set, this callback can be used directly. Also fix includes for btf.h and bpfptr.h so that they can included in isolation. This is in preparation for their usage in tcp_bbr, tcp_cubic and tcp_dctcp modules in the next patch. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211002011757.311265-4-memxor@gmail.com
|
#
b5ea834d |
|
14-Sep-2021 |
Yonghong Song <yhs@fb.com> |
bpf: Support for new btf kind BTF_KIND_TAG LLVM14 added support for a new C attribute ([1]) __attribute__((btf_tag("arbitrary_str"))) This attribute will be emitted to dwarf ([2]) and pahole will convert it to BTF. Or for bpf target, this attribute will be emitted to BTF directly ([3], [4]). The attribute is intended to provide additional information for - struct/union type or struct/union member - static/global variables - static/global function or function parameter. For linux kernel, the btf_tag can be applied in various places to specify user pointer, function pre- or post- condition, function allow/deny in certain context, etc. Such information will be encoded in vmlinux BTF and can be used by verifier. The btf_tag can also be applied to bpf programs to help global verifiable functions, e.g., specifying preconditions, etc. This patch added basic parsing and checking support in kernel for new BTF_KIND_TAG kind. [1] https://reviews.llvm.org/D106614 [2] https://reviews.llvm.org/D106621 [3] https://reviews.llvm.org/D106622 [4] https://reviews.llvm.org/D109560 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210914223015.245546-1-yhs@fb.com
|
#
eb529c5b |
|
25-Aug-2021 |
Daniel Xu <dxu@dxuuu.xyz> |
bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS This commit fixes linker errors along the lines of: s390-linux-ld: task_iter.c:(.init.text+0xa4): undefined reference to `btf_task_struct_ids'` Fix by defining btf_task_struct_ids unconditionally in kernel/bpf/btf.c since there exists code that unconditionally uses btf_task_struct_ids. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/05d94748d9f4b3eecedc4fddd6875418a396e23c.1629942444.git.dxu@dxuuu.xyz
|
#
d3621642 |
|
28-Jul-2021 |
Yonghong Song <yhs@fb.com> |
bpf: Emit better log message if bpf_iter ctx arg btf_id == 0 To avoid kernel build failure due to some missing .BTF-ids referenced functions/types, the patch ([1]) tries to fill btf_id 0 for these types. In bpf verifier, for percpu variable and helper returning btf_id cases, verifier already emitted proper warning with something like verbose(env, "Helper has invalid btf_id in R%d\n", regno); verbose(env, "invalid return type %d of func %s#%d\n", fn->ret_type, func_id_name(func_id), func_id); But this is not the case for bpf_iter context arguments. I hacked resolve_btfids to encode btf_id 0 for struct task_struct. With `./test_progs -n 7/5`, I got, 0: (79) r2 = *(u64 *)(r1 +0) func 'bpf_iter_task' arg0 has btf_id 29739 type STRUCT 'bpf_iter_meta' ; struct seq_file *seq = ctx->meta->seq; 1: (79) r6 = *(u64 *)(r2 +0) ; struct task_struct *task = ctx->task; 2: (79) r7 = *(u64 *)(r1 +8) ; if (task == (void *)0) { 3: (55) if r7 != 0x0 goto pc+11 ... ; BPF_SEQ_PRINTF(seq, "%8d %8d\n", task->tgid, task->pid); 26: (61) r1 = *(u32 *)(r7 +1372) Type '(anon)' is not a struct Basically, verifier will return btf_id 0 for task_struct. Later on, when the code tries to access task->tgid, the verifier correctly complains the type is '(anon)' and it is not a struct. Users still need to backtrace to find out what is going on. Let us catch the invalid btf_id 0 earlier and provide better message indicating btf_id is wrong. The new error message looks like below: R1 type=ctx expected=fp ; struct seq_file *seq = ctx->meta->seq; 0: (79) r2 = *(u64 *)(r1 +0) func 'bpf_iter_task' arg0 has btf_id 29739 type STRUCT 'bpf_iter_meta' ; struct seq_file *seq = ctx->meta->seq; 1: (79) r6 = *(u64 *)(r2 +0) ; struct task_struct *task = ctx->task; 2: (79) r7 = *(u64 *)(r1 +8) invalid btf_id for context argument offset 8 invalid bpf_context access off=8 size=8 [1] https://lore.kernel.org/bpf/20210727132532.2473636-1-hengqi.chen@gmail.com/ Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210728183025.1461750-1-yhs@fb.com
|
#
68134668 |
|
14-Jul-2021 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Add map side support for bpf timers. Restrict bpf timers to array, hash (both preallocated and kmalloced), and lru map types. The per-cpu maps with timers don't make sense, since 'struct bpf_timer' is a part of map value. bpf timers in per-cpu maps would mean that the number of timers depends on number of possible cpus and timers would not be accessible from all cpus. lpm map support can be added in the future. The timers in inner maps are supported. The bpf_map_update/delete_elem() helpers and sys_bpf commands cancel and free bpf_timer in a given map element. Similar to 'struct bpf_spin_lock' BTF is required and it is used to validate that map element indeed contains 'struct bpf_timer'. Make check_and_init_map_value() init both bpf_spin_lock and bpf_timer when map element data is reused in preallocated htab and lru maps. Teach copy_map_value() to support both bpf_spin_lock and bpf_timer in a single map element. There could be one of each, but not more than one. Due to 'one bpf_timer in one element' restriction do not support timers in global data, since global data is a map of single element, but from bpf program side it's seen as many global variables and restriction of single global timer would be odd. The sys_bpf map_freeze and sys_mmap syscalls are not allowed on maps with timers, since user space could have corrupted mmap element and crashed the kernel. The maps with timers cannot be readonly. Due to these restrictions search for bpf_timer in datasec BTF in case it was placed in the global data to report clear error. The previous patch allowed 'struct bpf_timer' as a first field in a map element only. Relax this restriction. Refactor lru map to s/bpf_lru_push_free/htab_lru_push_free/ to cancel and free the timer when lru map deletes an element as a part of it eviction algorithm. Make sure that bpf program cannot access 'struct bpf_timer' via direct load/store. The timer operation are done through helpers only. This is similar to 'struct bpf_spin_lock'. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210715005417.78572-5-alexei.starovoitov@gmail.com
|
#
8fb33b60 |
|
24-May-2021 |
Zhen Lei <thunder.leizhen@huawei.com> |
bpf: Fix spelling mistakes Fix some spelling mistakes in comments: aother ==> another Netiher ==> Neither desribe ==> describe intializing ==> initializing funciton ==> function wont ==> won't and move the word 'the' at the end to the next line accross ==> across pathes ==> paths triggerred ==> triggered excute ==> execute ether ==> either conervative ==> conservative convetion ==> convention markes ==> marks interpeter ==> interpreter Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210525025659.8898-2-thunder.leizhen@huawei.com
|
#
3d78417b |
|
13-May-2021 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Add bpf_btf_find_by_name_kind() helper. Add new helper: long bpf_btf_find_by_name_kind(char *name, int name_sz, u32 kind, int flags) Description Find BTF type with given name and kind in vmlinux BTF or in module's BTFs. Return Returns btf_id and btf_obj_fd in lower and upper 32 bits. It will be used by loader program to find btf_id to attach the program to and to find btf_ids of ksyms. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-10-alexei.starovoitov@gmail.com
|
#
c571bd75 |
|
13-May-2021 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Make btf_load command to be bpfptr_t compatible. Similar to prog_load make btf_load command to be availble to bpf_prog_type_syscall program. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210514003623.28033-7-alexei.starovoitov@gmail.com
|
#
31379397 |
|
05-May-2021 |
Jiri Olsa <jolsa@kernel.org> |
bpf: Forbid trampoline attach for functions with variable arguments We can't currently allow to attach functions with variable arguments. The problem is that we should save all the registers for arguments, which is probably doable, but if caller uses more than 6 arguments, we need stack data, which will be wrong, because of the extra stack frame we do in bpf trampoline, so we could crash. Also currently there's malformed trampoline code generated for such functions at the moment as described in: https://lore.kernel.org/bpf/20210429212834.82621-1-jolsa@kernel.org/ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210505132529.401047-1-jolsa@kernel.org
|
#
235fc0e3 |
|
26-Mar-2021 |
Colin Ian King <colin.king@canonical.com> |
bpf: Remove redundant assignment of variable id The variable id is being assigned a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210326194348.623782-1-colin.king@canonical.com
|
#
e6ac2450 |
|
24-Mar-2021 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Support bpf program calling kernel function This patch adds support to BPF verifier to allow bpf program calling kernel function directly. The use case included in this set is to allow bpf-tcp-cc to directly call some tcp-cc helper functions (e.g. "tcp_cong_avoid_ai()"). Those functions have already been used by some kernel tcp-cc implementations. This set will also allow the bpf-tcp-cc program to directly call the kernel tcp-cc implementation, For example, a bpf_dctcp may only want to implement its own dctcp_cwnd_event() and reuse other dctcp_*() directly from the kernel tcp_dctcp.c instead of reimplementing (or copy-and-pasting) them. The tcp-cc kernel functions mentioned above will be white listed for the struct_ops bpf-tcp-cc programs to use in a later patch. The white listed functions are not bounded to a fixed ABI contract. Those functions have already been used by the existing kernel tcp-cc. If any of them has changed, both in-tree and out-of-tree kernel tcp-cc implementations have to be changed. The same goes for the struct_ops bpf-tcp-cc programs which have to be adjusted accordingly. This patch is to make the required changes in the bpf verifier. First change is in btf.c, it adds a case in "btf_check_func_arg_match()". When the passed in "btf->kernel_btf == true", it means matching the verifier regs' states with a kernel function. This will handle the PTR_TO_BTF_ID reg. It also maps PTR_TO_SOCK_COMMON, PTR_TO_SOCKET, and PTR_TO_TCP_SOCK to its kernel's btf_id. In the later libbpf patch, the insn calling a kernel function will look like: insn->code == (BPF_JMP | BPF_CALL) insn->src_reg == BPF_PSEUDO_KFUNC_CALL /* <- new in this patch */ insn->imm == func_btf_id /* btf_id of the running kernel */ [ For the future calling function-in-kernel-module support, an array of module btf_fds can be passed at the load time and insn->off can be used to index into this array. ] At the early stage of verifier, the verifier will collect all kernel function calls into "struct bpf_kfunc_desc". Those descriptors are stored in "prog->aux->kfunc_tab" and will be available to the JIT. Since this "add" operation is similar to the current "add_subprog()" and looking for the same insn->code, they are done together in the new "add_subprog_and_kfunc()". In the "do_check()" stage, the new "check_kfunc_call()" is added to verify the kernel function call instruction: 1. Ensure the kernel function can be used by a particular BPF_PROG_TYPE. A new bpf_verifier_ops "check_kfunc_call" is added to do that. The bpf-tcp-cc struct_ops program will implement this function in a later patch. 2. Call "btf_check_kfunc_args_match()" to ensure the regs can be used as the args of a kernel function. 3. Mark the regs' type, subreg_def, and zext_dst. At the later do_misc_fixups() stage, the new fixup_kfunc_call() will replace the insn->imm with the function address (relative to __bpf_call_base). If needed, the jit can find the btf_func_model by calling the new bpf_jit_find_kfunc_model(prog, insn). With the imm set to the function address, "bpftool prog dump xlated" will be able to display the kernel function calls the same way as it displays other bpf helper calls. gpl_compatible program is required to call kernel function. This feature currently requires JIT. The verifier selftests are adjusted because of the changes in the verbose log in add_subprog_and_kfunc(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210325015142.1544736-1-kafai@fb.com
|
#
34747c41 |
|
24-Mar-2021 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Refactor btf_check_func_arg_match This patch moved the subprog specific logic from btf_check_func_arg_match() to the new btf_check_subprog_arg_match(). The core logic is left in btf_check_func_arg_match() which will be reused later to check the kernel function call. The "if (!btf_type_is_ptr(t))" is checked first to improve the indentation which will be useful for a later patch. Some of the "btf_kind_str[]" usages is replaced with the shortcut "btf_type_str(t)". Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210325015136.1544504-1-kafai@fb.com
|
#
b1828f0b |
|
26-Feb-2021 |
Ilya Leoshkevich <iii@linux.ibm.com> |
bpf: Add BTF_KIND_FLOAT support On the kernel side, introduce a new btf_kind_operations. It is similar to that of BTF_KIND_INT, however, it does not need to handle encodings and bit offsets. Do not implement printing, since the kernel does not know how to format floating-point values. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210226202256.116518-7-iii@linux.ibm.com
|
#
523a4cf4 |
|
25-Feb-2021 |
Dmitrii Banshchikov <me@ubique.spb.ru> |
bpf: Use MAX_BPF_FUNC_REG_ARGS macro Instead of using integer literal here and there use macro name for better context. Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210225202629.585485-1-me@ubique.spb.ru
|
#
f4eda8b6 |
|
23-Feb-2021 |
Dmitrii Banshchikov <me@ubique.spb.ru> |
bpf: Drop imprecise log message Now it is possible for global function to have a pointer argument that points to something different than struct. Drop the irrelevant log message and keep the logic same. Fixes: e5069b9c23b3 ("bpf: Support pointers in global func args") Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210223090416.333943-1-me@ubique.spb.ru
|
#
e5069b9c |
|
12-Feb-2021 |
Dmitrii Banshchikov <me@ubique.spb.ru> |
bpf: Support pointers in global func args Add an ability to pass a pointer to a type with known size in arguments of a global function. Such pointers may be used to overcome the limit on the maximum number of arguments, avoid expensive and tricky workarounds and to have multiple output arguments. A referenced type may contain pointers but indirect access through them isn't supported. The implementation consists of two parts. If a global function has an argument that is a pointer to a type with known size then: 1) In btf_check_func_arg_match(): check that the corresponding register points to NULL or to a valid memory region that is large enough to contain the expected argument's type. 2) In btf_prepare_func_args(): set the corresponding register type to PTR_TO_MEM_OR_NULL and its size to the size of the expected type. Only global functions are supported because allowance of pointers for static functions might break validation. Consider the following scenario. A static function has a pointer argument. A caller passes pointer to its stack memory. Because the callee can change referenced memory verifier cannot longer assume any particular slot type of the caller's stack memory hence the slot type is changed to SLOT_MISC. If there is an operation that relies on slot type other than SLOT_MISC then verifier won't be able to infer safety of the operation. When verifier sees a static function that has a pointer argument different from PTR_TO_CTX then it skips arguments check and continues with "inline" validation with more information available. The operation that relies on the particular slot type now succeeds. Because global functions were not allowed to have pointer arguments different from PTR_TO_CTX it's not possible to break existing and valid code. Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212205642.620788-4-me@ubique.spb.ru
|
#
feb4adfa |
|
12-Feb-2021 |
Dmitrii Banshchikov <me@ubique.spb.ru> |
bpf: Rename bpf_reg_state variables Using "reg" for an array of bpf_reg_state and "reg[i + 1]" for an individual bpf_reg_state is error-prone and verbose. Use "regs" for the former and "reg" for the latter as other code nearby does. Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212205642.620788-2-me@ubique.spb.ru
|
#
13ca51d5 |
|
19-Jan-2021 |
Yonghong Song <yhs@fb.com> |
bpf: Permit size-0 datasec llvm patch https://reviews.llvm.org/D84002 permitted to emit empty rodata datasec if the elf .rodata section contains read-only data from local variables. These local variables will be not emitted as BTF_KIND_VARs since llvm converted these local variables as static variables with private linkage without debuginfo types. Such an empty rodata datasec will make skeleton code generation easy since for skeleton a rodata struct will be generated if there is a .rodata elf section. The existence of a rodata btf datasec is also consistent with the existence of a rodata map created by libbpf. The btf with such an empty rodata datasec will fail in the kernel though as kernel will reject a datasec with zero vlen and zero size. For example, for the below code, int sys_enter(void *ctx) { int fmt[6] = {1, 2, 3, 4, 5, 6}; int dst[6]; bpf_probe_read(dst, sizeof(dst), fmt); return 0; } We got the below btf (bpftool btf dump ./test.o): [1] PTR '(anon)' type_id=0 [2] FUNC_PROTO '(anon)' ret_type_id=3 vlen=1 'ctx' type_id=1 [3] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED [4] FUNC 'sys_enter' type_id=2 linkage=global [5] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED [6] ARRAY '(anon)' type_id=5 index_type_id=7 nr_elems=4 [7] INT '__ARRAY_SIZE_TYPE__' size=4 bits_offset=0 nr_bits=32 encoding=(none) [8] VAR '_license' type_id=6, linkage=global-alloc [9] DATASEC '.rodata' size=0 vlen=0 [10] DATASEC 'license' size=0 vlen=1 type_id=8 offset=0 size=4 When loading the ./test.o to the kernel with bpftool, we see the following error: libbpf: Error loading BTF: Invalid argument(22) libbpf: magic: 0xeb9f ... [6] ARRAY (anon) type_id=5 index_type_id=7 nr_elems=4 [7] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none) [8] VAR _license type_id=6 linkage=1 [9] DATASEC .rodata size=24 vlen=0 vlen == 0 libbpf: Error loading .BTF into kernel: -22. BTF is optional, ignoring. Basically, libbpf changed .rodata datasec size to 24 since elf .rodata section size is 24. The kernel then rejected the BTF since vlen = 0. Note that the above kernel verifier failure can be worked around with changing local variable "fmt" to a static or global, optionally const, variable. This patch permits a datasec with vlen = 0 in kernel. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210119153519.3901963-1-yhs@fb.com
|
#
541c3bad |
|
12-Jan-2021 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Support BPF ksym variables in kernel modules Add support for directly accessing kernel module variables from BPF programs using special ldimm64 instructions. This functionality builds upon vmlinux ksym support, but extends ldimm64 with src_reg=BPF_PSEUDO_BTF_ID to allow specifying kernel module BTF's FD in insn[1].imm field. During BPF program load time, verifier will resolve FD to BTF object and will take reference on BTF object itself and, for module BTFs, corresponding module as well, to make sure it won't be unloaded from under running BPF program. The mechanism used is similar to how bpf_prog keeps track of used bpf_maps. One interesting change is also in how per-CPU variable is determined. The logic is to find .data..percpu data section in provided BTF, but both vmlinux and module each have their own .data..percpu entries in BTF. So for module's case, the search for DATASEC record needs to look at only module's added BTF types. This is implemented with custom search function. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-6-andrii@kernel.org
|
#
bcc5e616 |
|
10-Jan-2021 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Allow empty module BTFs Some modules don't declare any new types and end up with an empty BTF, containing only valid BTF header and no types or strings sections. This currently causes BTF validation error. There is nothing wrong with such BTF, so fix the issue by allowing module BTFs with no types or strings. Fixes: 36e68442d1af ("bpf: Load and verify kernel module BTFs") Reported-by: Christopher William Snowhill <chris@kode54.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210110070341.1380086-1-andrii@kernel.org
|
#
290248a5 |
|
03-Dec-2020 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Allow to specify kernel module BTFs when attaching BPF programs Add ability for user-space programs to specify non-vmlinux BTF when attaching BTF-powered BPF programs: raw_tp, fentry/fexit/fmod_ret, LSM, etc. For this, attach_prog_fd (now with the alias name attach_btf_obj_fd) should specify FD of a module or vmlinux BTF object. For backwards compatibility reasons, 0 denotes vmlinux BTF. Only kernel BTF (vmlinux or module) can be specified. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-11-andrii@kernel.org
|
#
22dc4a0f |
|
03-Dec-2020 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier Remove a permeating assumption thoughout BPF verifier of vmlinux BTF. Instead, wherever BTF type IDs are involved, also track the instance of struct btf that goes along with the type ID. This allows to gradually add support for kernel module BTFs and using/tracking module types across BPF helper calls and registers. This patch also renames btf_id() function to btf_obj_id() to minimize naming clash with using btf_id to denote BTF *type* ID, rather than BTF *object*'s ID. Also, altough btf_vmlinux can't get destructed and thus doesn't need refcounting, module BTFs need that, so apply BTF refcounting universally when BPF program is using BTF-powered attachment (tp_btf, fentry/fexit, etc). This makes for simpler clean up code. Now that BTF type ID is not enough to uniquely identify a BTF type, extend BPF trampoline key to include BTF object ID. To differentiate that from target program BPF ID, set 31st bit of type ID. BTF type IDs (at least currently) are not allowed to take full 32 bits, so there is no danger of confusing that bit with a valid BTF type ID. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-10-andrii@kernel.org
|
#
7112d127 |
|
10-Nov-2020 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Compile out btf_parse_module() if module BTF is not enabled Make sure btf_parse_module() is compiled out if module BTFs are not enabled. Fixes: 36e68442d1af ("bpf: Load and verify kernel module BTFs") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201111040645.903494-1-andrii@kernel.org
|
#
36e68442 |
|
09-Nov-2020 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Load and verify kernel module BTFs Add kernel module listener that will load/validate and unload module BTF. Module BTFs gets ID generated for them, which makes it possible to iterate them with existing BTF iteration API. They are given their respective module's names, which will get reported through GET_OBJ_INFO API. They are also marked as in-kernel BTFs for tooling to distinguish them from user-provided BTFs. Also, similarly to vmlinux BTF, kernel module BTFs are exposed through sysfs as /sys/kernel/btf/<module-name>. This is convenient for user-space tools to inspect module BTF contents and dump their types with existing tools: [vmuser@archvm bpf]$ ls -la /sys/kernel/btf total 0 drwxr-xr-x 2 root root 0 Nov 4 19:46 . drwxr-xr-x 13 root root 0 Nov 4 19:46 .. ... -r--r--r-- 1 root root 888 Nov 4 19:46 irqbypass -r--r--r-- 1 root root 100225 Nov 4 19:46 kvm -r--r--r-- 1 root root 35401 Nov 4 19:46 kvm_intel -r--r--r-- 1 root root 120 Nov 4 19:46 pcspkr -r--r--r-- 1 root root 399 Nov 4 19:46 serio_raw -r--r--r-- 1 root root 4094095 Nov 4 19:46 vmlinux Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/bpf/20201110011932.3201430-5-andrii@kernel.org
|
#
53297220 |
|
09-Nov-2020 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Assign ID to vmlinux BTF and return extra info for BTF in GET_OBJ_INFO Allocate ID for vmlinux BTF. This makes it visible when iterating over all BTF objects in the system. To allow distinguishing vmlinux BTF (and later kernel module BTF) from user-provided BTFs, expose extra kernel_btf flag, as well as BTF name ("vmlinux" for vmlinux BTF, will equal to module's name for module BTF). We might want to later allow specifying BTF name for user-provided BTFs as well, if that makes sense. But currently this is reserved only for in-kernel BTFs. Having in-kernel BTFs exposed IDs will allow to extend BPF APIs that require in-kernel BTF type with ability to specify BTF types from kernel modules, not just vmlinux BTF. This will be implemented in a follow up patch set for fentry/fexit/fmod_ret/lsm/etc. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201110011932.3201430-3-andrii@kernel.org
|
#
951bb646 |
|
09-Nov-2020 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: Add in-kernel split BTF support Adjust in-kernel BTF implementation to support a split BTF mode of operation. Changes are mostly mirroring libbpf split BTF changes, with the exception of start_id being 0 for in-kernel implementation due to simpler read-only mode. Otherwise, for split BTF logic, most of the logic of jumping to base BTF, where necessary, is encapsulated in few helper functions. Type numbering and string offset in a split BTF are logically continuing where base BTF ends, so most of the high-level logic is kept without changes. Type verification and size resolution is only doing an added resolution of new split BTF types and relies on already cached size and type resolution results in the base BTF. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201110011932.3201430-2-andrii@kernel.org
|
#
666475cc |
|
07-Nov-2020 |
Wang Qing <wangqing@vivo.com> |
bpf, btf: Remove the duplicate btf_ids.h include Remove duplicate btf_ids.h header which is included twice. Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1604736650-11197-1-git-send-email-wangqing@vivo.com
|
#
eaa6bcb7 |
|
29-Sep-2020 |
Hao Luo <haoluo@google.com> |
bpf: Introduce bpf_per_cpu_ptr() Add bpf_per_cpu_ptr() to help bpf programs access percpu vars. bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel except that it may return NULL. This happens when the cpu parameter is out of range. So the caller must check the returned value. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-5-haoluo@google.com
|
#
4976b718 |
|
29-Sep-2020 |
Hao Luo <haoluo@google.com> |
bpf: Introduce pseudo_btf_id Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a ksym so that further dereferences on the ksym can use the BTF info to validate accesses. Internally, when seeing a pseudo_btf_id ld insn, the verifier reads the btf_id stored in the insn[0]'s imm field and marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND, which is encoded in btf_vminux by pahole. If the VAR is not of a struct type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID and the mem_size is resolved to the size of the VAR's type. >From the VAR btf_id, the verifier can also read the address of the ksym's corresponding kernel var from kallsyms and use that to fill dst_reg. Therefore, the proper functionality of pseudo_btf_id depends on (1) kallsyms and (2) the encoding of kernel global VARs in pahole, which should be available since pahole v1.18. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-2-haoluo@google.com
|
#
43bc2874 |
|
29-Sep-2020 |
Toke Høiland-Jørgensen <toke@redhat.com> |
bpf: Fix context type resolving for extension programs Eelco reported we can't properly access arguments if the tracing program is attached to extension program. Having following program: SEC("classifier/test_pkt_md_access") int test_pkt_md_access(struct __sk_buff *skb) with its extension: SEC("freplace/test_pkt_md_access") int test_pkt_md_access_new(struct __sk_buff *skb) and tracing that extension with: SEC("fentry/test_pkt_md_access_new") int BPF_PROG(fentry, struct sk_buff *skb) It's not possible to access skb argument in the fentry program, with following error from verifier: ; int BPF_PROG(fentry, struct sk_buff *skb) 0: (79) r1 = *(u64 *)(r1 +0) invalid bpf_context access off=0 size=8 The problem is that btf_ctx_access gets the context type for the traced program, which is in this case the extension. But when we trace extension program, we want to get the context type of the program that the extension is attached to, so we can access the argument properly in the trace program. This version of the patch is tweaked slightly from Jiri's original one, since the refactoring in the previous patches means we have to get the target prog type from the new variable in prog->aux instead of directly from the target prog. Reported-by: Eelco Chaudron <echaudro@redhat.com> Suggested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/160138355278.48470.17057040257274725638.stgit@toke.dk
|
#
3aac1ead |
|
29-Sep-2020 |
Toke Høiland-Jørgensen <toke@redhat.com> |
bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach In preparation for allowing multiple attachments of freplace programs, move the references to the target program and trampoline into the bpf_tracing_link structure when that is created. To do this atomically, introduce a new mutex in prog->aux to protect writing to the two pointers to target prog and trampoline, and rename the members to make it clear that they are related. With this change, it is no longer possible to attach the same tracing program multiple times (detaching in-between), since the reference from the tracing program to the target disappears on the first attach. However, since the next patch will let the caller supply an attach target, that will also make it possible to attach to the same place multiple times. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/160138355059.48470.2503076992210324984.stgit@toke.dk
|
#
eb411377 |
|
27-Sep-2020 |
Alan Maguire <alan.maguire@oracle.com> |
bpf: Add bpf_seq_printf_btf helper A helper is added to allow seq file writing of kernel data structures using vmlinux BTF. Its signature is long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags); Flags and struct btf_ptr definitions/use are identical to the bpf_snprintf_btf helper, and the helper returns 0 on success or a negative error value. Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1601292670-1616-8-git-send-email-alan.maguire@oracle.com
|
#
31d0bc81 |
|
27-Sep-2020 |
Alan Maguire <alan.maguire@oracle.com> |
bpf: Move to generic BTF show support, apply it to seq files/strings generalize the "seq_show" seq file support in btf.c to support a generic show callback of which we support two instances; the current seq file show, and a show with snprintf() behaviour which instead writes the type data to a supplied string. Both classes of show function call btf_type_show() with different targets; the seq file or the string to be written. In the string case we need to track additional data - length left in string to write and length to return that we would have written (a la snprintf). By default show will display type information, field members and their types and values etc, and the information is indented based upon structure depth. Zeroed fields are omitted. Show however supports flags which modify its behaviour: BTF_SHOW_COMPACT - suppress newline/indent. BTF_SHOW_NONAME - suppress show of type and member names. BTF_SHOW_PTR_RAW - do not obfuscate pointer values. BTF_SHOW_UNSAFE - do not copy data to safe buffer before display. BTF_SHOW_ZERO - show zeroed values (by default they are not shown). Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1601292670-1616-3-git-send-email-alan.maguire@oracle.com
|
#
efc68158 |
|
25-Sep-2020 |
Toke Høiland-Jørgensen <toke@redhat.com> |
bpf: change logging calls from verbose() to bpf_log() and use log pointer In preparation for moving code around, change a bunch of references to env->log (and the verbose() logging helper) to use bpf_log() and a direct pointer to struct bpf_verifier_log. While we're touching the function signature, mark the 'prog' argument to bpf_check_type_match() as const. Also enhance the bpf_verifier_log_needed() check to handle NULL pointers for the log struct so we can re-use the code with logging disabled. Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
9436ef6e |
|
21-Sep-2020 |
Lorenz Bauer <lmb@cloudflare.com> |
bpf: Allow specifying a BTF ID per argument in function protos Function prototypes using ARG_PTR_TO_BTF_ID currently use two ways to signal which BTF IDs are acceptable. First, bpf_func_proto.btf_id is an array of IDs, one for each argument. This array is only accessed up to the highest numbered argument that uses ARG_PTR_TO_BTF_ID and may therefore be less than five arguments long. It usually points at a BTF_ID_LIST. Second, check_btf_id is a function pointer that is called by the verifier if present. It gets the actual BTF ID of the register, and the argument number we're currently checking. It turns out that the only user check_arg_btf_id ignores the argument, and is simply used to check whether the BTF ID has a struct sock_common at it's start. Replace both of these mechanisms with an explicit BTF ID for each argument in a function proto. Thanks to btf_struct_ids_match this is very flexible: check_arg_btf_id can be replaced by requiring struct sock_common. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-5-lmb@cloudflare.com
|
#
2af30f11 |
|
21-Sep-2020 |
Lorenz Bauer <lmb@cloudflare.com> |
btf: Make btf_set_contains take a const pointer bsearch doesn't modify the contents of the array, so we can take a const pointer. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200921121227.255763-2-lmb@cloudflare.com
|
#
eae2e83e |
|
25-Aug-2020 |
Jiri Olsa <jolsa@kernel.org> |
bpf: Add BTF_SET_START/END macros Adding support to define sorted set of BTF ID values. Following defines sorted set of BTF ID values: BTF_SET_START(btf_allowlist_d_path) BTF_ID(func, vfs_truncate) BTF_ID(func, vfs_fallocate) BTF_ID(func, dentry_open) BTF_ID(func, vfs_getattr) BTF_ID(func, filp_close) BTF_SET_END(btf_allowlist_d_path) It defines following 'struct btf_id_set' variable to access values and count: struct btf_id_set btf_allowlist_d_path; Adding 'allowed' callback to struct bpf_func_proto, to allow verifier the check on allowed callers. Adding btf_id_set_contains function, which will be used by allowed callbacks to verify the caller's BTF ID value is within allowed set. Also removing extra '\' in __BTF_ID_LIST macro. Added BTF_SET_START_GLOBAL macro for global sets. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-10-jolsa@kernel.org
|
#
faaf4a79 |
|
25-Aug-2020 |
Jiri Olsa <jolsa@kernel.org> |
bpf: Add btf_struct_ids_match function Adding btf_struct_ids_match function to check if given address provided by BTF object + offset is also address of another nested BTF object. This allows to pass an argument to helper, which is defined via parent BTF object + offset, like for bpf_d_path (added in following changes): SEC("fentry/filp_close") int BPF_PROG(prog_close, struct file *file, void *id) { ... ret = bpf_d_path(&file->f_path, ... The first bpf_d_path argument is hold by verifier as BTF file object plus offset of f_path member. The btf_struct_ids_match function will walk the struct file object and check if there's nested struct path object on the given offset. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-9-jolsa@kernel.org
|
#
1c6d28a6 |
|
25-Aug-2020 |
Jiri Olsa <jolsa@kernel.org> |
bpf: Factor btf_struct_access function Adding btf_struct_walk function that walks through the struct type + given offset and returns following values: enum bpf_struct_walk_result { /* < 0 error */ WALK_SCALAR = 0, WALK_PTR, WALK_STRUCT, }; WALK_SCALAR - when SCALAR_VALUE is found WALK_PTR - when pointer value is found, its ID is stored in 'next_btf_id' output param WALK_STRUCT - when nested struct object is found, its ID is stored in 'next_btf_id' output param It will be used in following patches to get all nested struct objects for given type and offset. The btf_struct_access now calls btf_struct_walk function, as long as it gets nested structs as return value. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-8-jolsa@kernel.org
|
#
dafe58fc |
|
25-Aug-2020 |
Jiri Olsa <jolsa@kernel.org> |
bpf: Remove recursion call in btf_struct_access Andrii suggested we can simply jump to again label instead of making recursion call. Suggested-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-7-jolsa@kernel.org
|
#
887c31a3 |
|
25-Aug-2020 |
Jiri Olsa <jolsa@kernel.org> |
bpf: Add type_id pointer as argument to __btf_resolve_size Adding type_id pointer as argument to __btf_resolve_size to return also BTF ID of the resolved type. It will be used in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-6-jolsa@kernel.org
|
#
69ff3047 |
|
25-Aug-2020 |
Jiri Olsa <jolsa@kernel.org> |
bpf: Add elem_id pointer as argument to __btf_resolve_size If the resolved type is array, make btf_resolve_size return also ID of the elem type. It will be needed in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-5-jolsa@kernel.org
|
#
6298399b |
|
25-Aug-2020 |
Jiri Olsa <jolsa@kernel.org> |
bpf: Move btf_resolve_size into __btf_resolve_size Moving btf_resolve_size into __btf_resolve_size and keeping btf_resolve_size public with just first 3 arguments, because the rest of the arguments are not used by outside callers. Following changes are adding more arguments, which are not useful to outside callers. They will be added to the __btf_resolve_size function. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-4-jolsa@kernel.org
|
#
afbf21dc |
|
23-Jul-2020 |
Yonghong Song <yhs@fb.com> |
bpf: Support readonly/readwrite buffers in verifier Readonly and readwrite buffer register states are introduced. Totally four states, PTR_TO_RDONLY_BUF[_OR_NULL] and PTR_TO_RDWR_BUF[_OR_NULL] are supported. As suggested by their respective names, PTR_TO_RDONLY_BUF[_OR_NULL] are for readonly buffers and PTR_TO_RDWR_BUF[_OR_NULL] for read/write buffers. These new register states will be used by later bpf map element iterator. New register states share some similarity to PTR_TO_TP_BUFFER as it will calculate accessed buffer size during verification time. The accessed buffer size will be later compared to other metrics during later attach/link_create time. Similar to reg_state PTR_TO_BTF_ID_OR_NULL in bpf iterator programs, PTR_TO_RDONLY_BUF_OR_NULL or PTR_TO_RDWR_BUF_OR_NULL reg_types can be set at prog->aux->bpf_ctx_arg_aux, and bpf verifier will retrieve the values during btf_ctx_access(). Later bpf map element iterator implementation will show how such information will be assigned during target registeration time. The verifier is also enhanced such that PTR_TO_RDONLY_BUF can be passed to ARG_PTR_TO_MEM[_OR_NULL] helper argument, and PTR_TO_RDWR_BUF can be passed to ARG_PTR_TO_MEM[_OR_NULL] or ARG_PTR_TO_UNINIT_MEM. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184111.590274-1-yhs@fb.com
|
#
951cf368 |
|
20-Jul-2020 |
Yonghong Song <yhs@fb.com> |
bpf: net: Use precomputed btf_id for bpf iterators One additional field btf_id is added to struct bpf_ctx_arg_aux to store the precomputed btf_ids. The btf_id is computed at build time with BTF_ID_LIST or BTF_ID_LIST_GLOBAL macro definitions. All existing bpf iterators are changed to used pre-compute btf_ids. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720163403.1393551-1-yhs@fb.com
|
#
bc4f0548 |
|
20-Jul-2020 |
Yonghong Song <yhs@fb.com> |
bpf: Compute bpf_skc_to_*() helper socket btf ids at build time Currently, socket types (struct tcp_sock, udp_sock, etc.) used by bpf_skc_to_*() helpers are computed when vmlinux_btf is first built in the kernel. Commit 5a2798ab32ba ("bpf: Add BTF_ID_LIST/BTF_ID/BTF_ID_UNUSED macros") implemented a mechanism to compute btf_ids at kernel build time which can simplify kernel implementation and reduce runtime overhead by removing in-kernel btf_id calculation. This patch did exactly this, removing in-kernel btf_id computation and utilizing build-time btf_id computation. If CONFIG_DEBUG_INFO_BTF is not defined, BTF_ID_LIST will define an array with size of 5, which is not enough for btf_sock_ids. So define its own static array if CONFIG_DEBUG_INFO_BTF is not defined. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200720163358.1393023-1-yhs@fb.com
|
#
5b801dfb |
|
14-Jul-2020 |
Peilin Ye <yepeilin.cs@gmail.com> |
bpf: Fix NULL pointer dereference in __btf_resolve_helper_id() Prevent __btf_resolve_helper_id() from dereferencing `btf_vmlinux` as NULL. This patch fixes the following syzbot bug: https://syzkaller.appspot.com/bug?id=f823224ada908fa5c207902a5a62065e53ca0fcc Reported-by: syzbot+ee09bda7017345f1fbe6@syzkaller.appspotmail.com Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200714180904.277512-1-yepeilin.cs@gmail.com
|
#
49f4e672 |
|
11-Jul-2020 |
Jiri Olsa <jolsa@kernel.org> |
bpf: Use BTF_ID to resolve bpf_ctx_convert struct This way the ID is resolved during compile time, and we can remove the runtime name search. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200711215329.41165-7-jolsa@kernel.org
|
#
138b9a05 |
|
11-Jul-2020 |
Jiri Olsa <jolsa@kernel.org> |
bpf: Remove btf_id helpers resolving Now when we moved the helpers btf_id arrays into .BTF_ids section, we can remove the code that resolve those IDs in runtime. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200711215329.41165-6-jolsa@kernel.org
|
#
a9b59159 |
|
24-Jun-2020 |
John Fastabend <john.fastabend@gmail.com> |
bpf: Do not allow btf_ctx_access with __int128 types To ensure btf_ctx_access() is safe the verifier checks that the BTF arg type is an int, enum, or pointer. When the function does the BTF arg lookup it uses the calculation 'arg = off / 8' using the fact that registers are 8B. This requires that the first arg is in the first reg, the second in the second, and so on. However, for __int128 the arg will consume two registers by default LLVM implementation. So this will cause the arg layout assumed by the 'arg = off / 8' calculation to be incorrect. Because __int128 is uncommon this patch applies the easiest fix and will force int types to be sizeof(u64) or smaller so that they will fit in a single register. v2: remove unneeded parens per Andrii's feedback Fixes: 9e15db66136a1 ("bpf: Implement accurate raw_tp context access via BTF") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/159303723962.11287.13309537171132420717.stgit@john-Precision-5820-Tower
|
#
af7ec138 |
|
23-Jun-2020 |
Yonghong Song <yhs@fb.com> |
bpf: Add bpf_skc_to_tcp6_sock() helper The helper is used in tracing programs to cast a socket pointer to a tcp6_sock pointer. The return value could be NULL if the casting is illegal. A new helper return type RET_PTR_TO_BTF_ID_OR_NULL is added so the verifier is able to deduce proper return types for the helper. Different from the previous BTF_ID based helpers, the bpf_skc_to_tcp6_sock() argument can be several possible btf_ids. More specifically, all possible socket data structures with sock_common appearing in the first in the memory layout. This patch only added socket types related to tcp and udp. All possible argument btf_id and return value btf_id for helper bpf_skc_to_tcp6_sock() are pre-calculcated and cached. In the future, it is even possible to precompute these btf_id's at kernel build time. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200623230809.3988195-1-yhs@fb.com
|
#
41c48f3a |
|
19-Jun-2020 |
Andrey Ignatov <rdna@fb.com> |
bpf: Support access to bpf map fields There are multiple use-cases when it's convenient to have access to bpf map fields, both `struct bpf_map` and map type specific struct-s such as `struct bpf_array`, `struct bpf_htab`, etc. For example while working with sock arrays it can be necessary to calculate the key based on map->max_entries (some_hash % max_entries). Currently this is solved by communicating max_entries via "out-of-band" channel, e.g. via additional map with known key to get info about target map. That works, but is not very convenient and error-prone while working with many maps. In other cases necessary data is dynamic (i.e. unknown at loading time) and it's impossible to get it at all. For example while working with a hash table it can be convenient to know how much capacity is already used (bpf_htab.count.counter for BPF_F_NO_PREALLOC case). At the same time kernel knows this info and can provide it to bpf program. Fill this gap by adding support to access bpf map fields from bpf program for both `struct bpf_map` and map type specific fields. Support is implemented via btf_struct_access() so that a user can define their own `struct bpf_map` or map type specific struct in their program with only necessary fields and preserve_access_index attribute, cast a map to this struct and use a field. For example: struct bpf_map { __u32 max_entries; } __attribute__((preserve_access_index)); struct bpf_array { struct bpf_map map; __u32 elem_size; } __attribute__((preserve_access_index)); struct { __uint(type, BPF_MAP_TYPE_ARRAY); __uint(max_entries, 4); __type(key, __u32); __type(value, __u32); } m_array SEC(".maps"); SEC("cgroup_skb/egress") int cg_skb(void *ctx) { struct bpf_array *array = (struct bpf_array *)&m_array; struct bpf_map *map = (struct bpf_map *)&m_array; /* .. use map->max_entries or array->map.max_entries .. */ } Similarly to other btf_struct_access() use-cases (e.g. struct tcp_sock in net/ipv4/bpf_tcp_ca.c) the patch allows access to any fields of corresponding struct. Only reading from map fields is supported. For btf_struct_access() to work there should be a way to know btf id of a struct that corresponds to a map type. To get btf id there should be a way to get a stringified name of map-specific struct, such as "bpf_array", "bpf_htab", etc for a map type. Two new fields are added to `struct bpf_map_ops` to handle it: * .map_btf_name keeps a btf name of a struct returned by map_alloc(); * .map_btf_id is used to cache btf id of that struct. To make btf ids calculation cheaper they're calculated once while preparing btf_vmlinux and cached same way as it's done for btf_id field of `struct bpf_func_proto` While calculating btf ids, struct names are NOT checked for collision. Collisions will be checked as a part of the work to prepare btf ids used in verifier in compile time that should land soon. The only known collision for `struct bpf_htab` (kernel/bpf/hashtab.c vs net/core/sock_map.c) was fixed earlier. Both new fields .map_btf_name and .map_btf_id must be set for a map type for the feature to work. If neither is set for a map type, verifier will return ENOTSUPP on a try to access map_ptr of corresponding type. If just one of them set, it's verifier misconfiguration. Only `struct bpf_array` for BPF_MAP_TYPE_ARRAY and `struct bpf_htab` for BPF_MAP_TYPE_HASH are supported by this patch. Other map types will be supported separately. The feature is available only for CONFIG_DEBUG_INFO_BTF=y and gated by perfmon_capable() so that unpriv programs won't have access to bpf map fields. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/6479686a0cd1e9067993df57b4c3eef0e276fec9.1592600985.git.rdna@fb.com
|
#
a2d0d62f |
|
19-Jun-2020 |
Andrey Ignatov <rdna@fb.com> |
bpf: Switch btf_parse_vmlinux to btf_find_by_name_kind btf_parse_vmlinux() implements manual search for struct bpf_ctx_convert since at the time of implementing btf_find_by_name_kind() was not available. Later btf_find_by_name_kind() was introduced in 27ae7997a661 ("bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS"). It provides similar search functionality and can be leveraged in btf_parse_vmlinux(). Do it. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/6e12d5c3e8a3d552925913ef73a695dd1bb27800.1592600985.git.rdna@fb.com
|
#
3c32cc1b |
|
13-May-2020 |
Yonghong Song <yhs@fb.com> |
bpf: Enable bpf_iter targets registering ctx argument types Commit b121b341e598 ("bpf: Add PTR_TO_BTF_ID_OR_NULL support") adds a field btf_id_or_null_non0_off to bpf_prog->aux structure to indicate that the first ctx argument is PTR_TO_BTF_ID reg_type and all others are PTR_TO_BTF_ID_OR_NULL. This approach does not really scale if we have other different reg types in the future, e.g., a pointer to a buffer. This patch enables bpf_iter targets registering ctx argument reg types which may be different from the default one. For example, for pointers to structures, the default reg_type is PTR_TO_BTF_ID for tracing program. The target can register a particular pointer type as PTR_TO_BTF_ID_OR_NULL which can be used by the verifier to enforce accesses. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513180221.2949882-1-yhs@fb.com
|
#
9c5f8a10 |
|
09-May-2020 |
Yonghong Song <yhs@fb.com> |
bpf: Support variable length array in tracing programs In /proc/net/ipv6_route, we have struct fib6_info { struct fib6_table *fib6_table; ... struct fib6_nh fib6_nh[0]; } struct fib6_nh { struct fib_nh_common nh_common; struct rt6_info **rt6i_pcpu; struct rt6_exception_bucket *rt6i_exception_bucket; }; struct fib_nh_common { ... u8 nhc_gw_family; ... } The access: struct fib6_nh *fib6_nh = &rt->fib6_nh; ... fib6_nh->nh_common.nhc_gw_family ... This patch ensures such an access is handled properly. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175916.2476853-1-yhs@fb.com
|
#
b121b341 |
|
09-May-2020 |
Yonghong Song <yhs@fb.com> |
bpf: Add PTR_TO_BTF_ID_OR_NULL support Add bpf_reg_type PTR_TO_BTF_ID_OR_NULL support. For tracing/iter program, the bpf program context definition, e.g., for previous bpf_map target, looks like struct bpf_iter__bpf_map { struct bpf_iter_meta *meta; struct bpf_map *map; }; The kernel guarantees that meta is not NULL, but map pointer maybe NULL. The NULL map indicates that all objects have been traversed, so bpf program can take proper action, e.g., do final aggregation and/or send final report to user space. Add btf_id_or_null_non0_off to prog->aux structure, to indicate that if the context access offset is not 0, set to PTR_TO_BTF_ID_OR_NULL instead of PTR_TO_BTF_ID. This bit is set for tracing/iter program. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175912.2476576-1-yhs@fb.com
|
#
f2e10bff |
|
28-Apr-2020 |
Andrii Nakryiko <andriin@fb.com> |
bpf: Add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link Add ability to fetch bpf_link details through BPF_OBJ_GET_INFO_BY_FD command. Also enhance show_fdinfo to potentially include bpf_link type-specific information (similarly to obj_info). Also introduce enum bpf_link_type stored in bpf_link itself and expose it in UAPI. bpf_link_tracing also now will store and return bpf_attach_type. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200429001614.1544-5-andriin@fb.com
|
#
f50b49a0 |
|
30-Mar-2020 |
KP Singh <kpsingh@google.com> |
bpf: btf: Fix arg verification in btf_ctx_access() The bounds checking for the arguments accessed in the BPF program breaks when the expected_attach_type is not BPF_TRACE_FEXIT, BPF_LSM_MAC or BPF_MODIFY_RETURN resulting in no check being done for the default case (the programs which do not receive the return value of the attached function in its arguments) when the index of the argument being accessed is equal to the number of arguments (nr_args). This was a result of a misplaced "else if" block introduced by the Commit 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN") Fixes: 6ba43b761c41 ("bpf: Attachment verification for BPF_MODIFY_RETURN") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200330144246.338-1-kpsingh@chromium.org
|
#
9e4e01df |
|
28-Mar-2020 |
KP Singh <kpsingh@google.com> |
bpf: lsm: Implement attach, detach and execution JITed BPF programs are dynamically attached to the LSM hooks using BPF trampolines. The trampoline prologue generates code to handle conversion of the signature of the hook to the appropriate BPF context. The allocated trampoline programs are attached to the nop functions initialized as LSM hooks. BPF_PROG_TYPE_LSM programs must have a GPL compatible license and and need CAP_SYS_ADMIN (required for loading eBPF programs). Upon attachment: * A BPF fexit trampoline is used for LSM hooks with a void return type. * A BPF fmod_ret trampoline is used for LSM hooks which return an int. The attached programs can override the return value of the bpf LSM hook to indicate a MAC Policy decision. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Florent Revest <revest@google.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: James Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/bpf/20200329004356.27286-5-kpsingh@chromium.org
|
#
5c6f2588 |
|
20-Mar-2020 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
bpf: Explicitly memset some bpf info structures declared on the stack Trying to initialize a structure with "= {};" will not always clean out all padding locations in a structure. So be explicit and call memset to initialize everything for a number of bpf information structures that are then copied from userspace, sometimes from smaller memory locations than the size of the structure. Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200320162258.GA794295@kroah.com
|
#
90ceddcb |
|
18-Mar-2020 |
Fangrui Song <maskray@google.com> |
bpf: Support llvm-objcopy for vmlinux BTF Simplify gen_btf logic to make it work with llvm-objcopy. The existing 'file format' and 'architecture' parsing logic is brittle and does not work with llvm-objcopy/llvm-objdump. 'file format' output of llvm-objdump>=11 will match GNU objdump, but 'architecture' (bfdarch) may not. .BTF in .tmp_vmlinux.btf is non-SHF_ALLOC. Add the SHF_ALLOC flag because it is part of vmlinux image used for introspection. C code can reference the section via linker script defined __start_BTF and __stop_BTF. This fixes a small problem that previous .BTF had the SHF_WRITE flag (objcopy -I binary -O elf* synthesized .data). Additionally, `objcopy -I binary` synthesized symbols _binary__btf_vmlinux_bin_start and _binary__btf_vmlinux_bin_stop (not used elsewhere) are replaced with more commonplace __start_BTF and __stop_BTF. Add 2>/dev/null because GNU objcopy (but not llvm-objcopy) warns "empty loadable segment detected at vaddr=0xffffffff81000000, is this intentional?" We use a dd command to change the e_type field in the ELF header from ET_EXEC to ET_REL so that lld will accept .btf.vmlinux.bin.o. Accepting ET_EXEC as an input file is an extremely rare GNU ld feature that lld does not intend to support, because this is error-prone. The output section description .BTF in include/asm-generic/vmlinux.lds.h avoids potential subtle orphan section placement issues and suppresses --orphan-handling=warn warnings. Fixes: df786c9b9476 ("bpf: Force .BTF section start to zero when dumping from vmlinux") Fixes: cb0cc635c7a9 ("powerpc: Include .BTF section") Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Stanislav Fomichev <sdf@google.com> Tested-by: Andrii Nakryiko <andriin@fb.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://github.com/ClangBuiltLinux/linux/issues/871 Link: https://lore.kernel.org/bpf/20200318222746.173648-1-maskray@google.com
|
#
da6c7fae |
|
10-Mar-2020 |
Yoshiki Komachi <komachi.yoshiki@gmail.com> |
bpf/btf: Fix BTF verification of enum members in struct/union btf_enum_check_member() was currently sure to recognize the size of "enum" type members in struct/union as the size of "int" even if its size was packed. This patch fixes BTF enum verification to use the correct size of member in BPF programs. Fixes: 179cde8cef7e ("bpf: btf: Check members of struct/union") Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1583825550-18606-2-git-send-email-komachi.yoshiki@gmail.com
|
#
6ba43b76 |
|
04-Mar-2020 |
KP Singh <kpsingh@google.com> |
bpf: Attachment verification for BPF_MODIFY_RETURN - Allow BPF_MODIFY_RETURN attachment only to functions that are: * Whitelisted for error injection by checking within_error_injection_list. Similar discussions happened for the bpf_override_return helper. * security hooks, this is expected to be cleaned up with the LSM changes after the KRSI patches introduce the LSM_HOOK macro: https://lore.kernel.org/bpf/20200220175250.10795-1-kpsingh@chromium.org/ - The attachment is currently limited to functions that return an int. This can be extended later other types (e.g. PTR). Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200304191853.1529-5-kpsingh@chromium.org
|
#
ae240823 |
|
04-Mar-2020 |
KP Singh <kpsingh@google.com> |
bpf: Introduce BPF_MODIFY_RETURN When multiple programs are attached, each program receives the return value from the previous program on the stack and the last program provides the return value to the attached function. The fmod_ret bpf programs are run after the fentry programs and before the fexit programs. The original function is only called if all the fmod_ret programs return 0 to avoid any unintended side-effects. The success value, i.e. 0 is not currently configurable but can be made so where user-space can specify it at load time. For example: int func_to_be_attached(int a, int b) { <--- do_fentry do_fmod_ret: <update ret by calling fmod_ret> if (ret != 0) goto do_fexit; original_function: <side_effects_happen_here> } <--- do_fexit The fmod_ret program attached to this function can be defined as: SEC("fmod_ret/func_to_be_attached") int BPF_PROG(func_name, int a, int b, int ret) { // This will skip the original function logic. return 1; } The first fmod_ret program is passed 0 in its return argument. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200304191853.1529-4-kpsingh@chromium.org
|
#
2bf0eb9b |
|
09-Feb-2020 |
Hongbo Yao <yaohongbo@huawei.com> |
bpf: Make btf_check_func_type_match() static Fix the following sparse warning: kernel/bpf/btf.c:4131:5: warning: symbol 'btf_check_func_type_match' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Hongbo Yao <yaohongbo@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200210011441.147102-1-yaohongbo@huawei.com
|
#
257af63d |
|
31-Jan-2020 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Fix modifier skipping logic Fix the way modifiers are skipped while walking pointers. Otherwise second level dereferences of 'const struct foo *' will be rejected by the verifier. Fixes: 9e15db66136a ("bpf: Implement accurate raw_tp context access via BTF") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200201000314.261392-1-ast@kernel.org
|
#
d3e42bb0 |
|
27-Jan-2020 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Reuse log from btf_prase_vmlinux() in btf_struct_ops_init() Instead of using a locally defined "struct bpf_verifier_log log = {}", btf_struct_ops_init() should reuse the "log" from its calling function "btf_parse_vmlinux()". It should also resolve the frame-size too large compiler warning in some ARCH. Fixes: 27ae7997a661 ("bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200127175145.1154438-1-kafai@fb.com
|
#
84ad7a7a |
|
23-Jan-2020 |
Jiri Olsa <jolsa@kernel.org> |
bpf: Allow BTF ctx access for string pointers When accessing the context we allow access to arguments with scalar type and pointer to struct. But we deny access for pointer to scalar type, which is the case for many functions. Alexei suggested to take conservative approach and allow currently only string pointer access, which is the case for most functions now: Adding check if the pointer is to string type and allow access to it. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200123161508.915203-2-jolsa@kernel.org
|
#
be8704ff |
|
20-Jan-2020 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Introduce dynamic program extensions Introduce dynamic program extensions. The users can load additional BPF functions and replace global functions in previously loaded BPF programs while these programs are executing. Global functions are verified individually by the verifier based on their types only. Hence the global function in the new program which types match older function can safely replace that corresponding function. This new function/program is called 'an extension' of old program. At load time the verifier uses (attach_prog_fd, attach_btf_id) pair to identify the function to be replaced. The BPF program type is derived from the target program into extension program. Technically bpf_verifier_ops is copied from target program. The BPF_PROG_TYPE_EXT program type is a placeholder. It has empty verifier_ops. The extension program can call the same bpf helper functions as target program. Single BPF_PROG_TYPE_EXT type is used to extend XDP, SKB and all other program types. The verifier allows only one level of replacement. Meaning that the extension program cannot recursively extend an extension. That also means that the maximum stack size is increasing from 512 to 1024 bytes and maximum function nesting level from 8 to 16. The programs don't always consume that much. The stack usage is determined by the number of on-stack variables used by the program. The verifier could have enforced 512 limit for combined original plus extension program, but it makes for difficult user experience. The main use case for extensions is to provide generic mechanism to plug external programs into policy program or function call chaining. BPF trampoline is used to track both fentry/fexit and program extensions because both are using the same nop slot at the beginning of every BPF function. Attaching fentry/fexit to a function that was replaced is not allowed. The opposite is true as well. Replacing a function that currently being analyzed with fentry/fexit is not allowed. The executable page allocated by BPF trampoline is not used by program extensions. This inefficiency will be optimized in future patches. Function by function verification of global function supports scalars and pointer to context only. Hence program extensions are supported for such class of global functions only. In the future the verifier will be extended with support to pointers to structures, arrays with sizes, etc. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200121005348.2769920-2-ast@kernel.org
|
#
51c39bb1 |
|
09-Jan-2020 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Introduce function-by-function verification New llvm and old llvm with libbpf help produce BTF that distinguish global and static functions. Unlike arguments of static function the arguments of global functions cannot be removed or optimized away by llvm. The compiler has to use exactly the arguments specified in a function prototype. The argument type information allows the verifier validate each global function independently. For now only supported argument types are pointer to context and scalars. In the future pointers to structures, sizes, pointer to packet data can be supported as well. Consider the following example: static int f1(int ...) { ... } int f3(int b); int f2(int a) { f1(a) + f3(a); } int f3(int b) { ... } int main(...) { f1(...) + f2(...) + f3(...); } The verifier will start its safety checks from the first global function f2(). It will recursively descend into f1() because it's static. Then it will check that arguments match for the f3() invocation inside f2(). It will not descend into f3(). It will finish f2() that has to be successfully verified for all possible values of 'a'. Then it will proceed with f3(). That function also has to be safe for all possible values of 'b'. Then it will start subprog 0 (which is main() function). It will recursively descend into f1() and will skip full check of f2() and f3(), since they are global. The order of processing global functions doesn't affect safety, since all global functions must be proven safe based on their arguments only. Such function by function verification can drastically improve speed of the verification and reduce complexity. Note that the stack limit of 512 still applies to the call chain regardless whether functions were static or global. The nested level of 8 also still applies. The same recursion prevention checks are in place as well. The type information and static/global kind is preserved after the verification hence in the above example global function f2() and f3() can be replaced later by equivalent functions with the same types that are loaded and verified later without affecting safety of this main() program. Such replacement (re-linking) of global functions is a subject of future patches. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-3-ast@kernel.org
|
#
85d33df3 |
|
08-Jan-2020 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS The patch introduces BPF_MAP_TYPE_STRUCT_OPS. The map value is a kernel struct with its func ptr implemented in bpf prog. This new map is the interface to register/unregister/introspect a bpf implemented kernel struct. The kernel struct is actually embedded inside another new struct (or called the "value" struct in the code). For example, "struct tcp_congestion_ops" is embbeded in: struct bpf_struct_ops_tcp_congestion_ops { refcount_t refcnt; enum bpf_struct_ops_state state; struct tcp_congestion_ops data; /* <-- kernel subsystem struct here */ } The map value is "struct bpf_struct_ops_tcp_congestion_ops". The "bpftool map dump" will then be able to show the state ("inuse"/"tobefree") and the number of subsystem's refcnt (e.g. number of tcp_sock in the tcp_congestion_ops case). This "value" struct is created automatically by a macro. Having a separate "value" struct will also make extending "struct bpf_struct_ops_XYZ" easier (e.g. adding "void (*init)(void)" to "struct bpf_struct_ops_XYZ" to do some initialization works before registering the struct_ops to the kernel subsystem). The libbpf will take care of finding and populating the "struct bpf_struct_ops_XYZ" from "struct XYZ". Register a struct_ops to a kernel subsystem: 1. Load all needed BPF_PROG_TYPE_STRUCT_OPS prog(s) 2. Create a BPF_MAP_TYPE_STRUCT_OPS with attr->btf_vmlinux_value_type_id set to the btf id "struct bpf_struct_ops_tcp_congestion_ops" of the running kernel. Instead of reusing the attr->btf_value_type_id, btf_vmlinux_value_type_id s added such that attr->btf_fd can still be used as the "user" btf which could store other useful sysadmin/debug info that may be introduced in the furture, e.g. creation-date/compiler-details/map-creator...etc. 3. Create a "struct bpf_struct_ops_tcp_congestion_ops" object as described in the running kernel btf. Populate the value of this object. The function ptr should be populated with the prog fds. 4. Call BPF_MAP_UPDATE with the object created in (3) as the map value. The key is always "0". During BPF_MAP_UPDATE, the code that saves the kernel-func-ptr's args as an array of u64 is generated. BPF_MAP_UPDATE also allows the specific struct_ops to do some final checks in "st_ops->init_member()" (e.g. ensure all mandatory func ptrs are implemented). If everything looks good, it will register this kernel struct to the kernel subsystem. The map will not allow further update from this point. Unregister a struct_ops from the kernel subsystem: BPF_MAP_DELETE with key "0". Introspect a struct_ops: BPF_MAP_LOOKUP_ELEM with key "0". The map value returned will have the prog _id_ populated as the func ptr. The map value state (enum bpf_struct_ops_state) will transit from: INIT (map created) => INUSE (map updated, i.e. reg) => TOBEFREE (map value deleted, i.e. unreg) The kernel subsystem needs to call bpf_struct_ops_get() and bpf_struct_ops_put() to manage the "refcnt" in the "struct bpf_struct_ops_XYZ". This patch uses a separate refcnt for the purose of tracking the subsystem usage. Another approach is to reuse the map->refcnt and then "show" (i.e. during map_lookup) the subsystem's usage by doing map->refcnt - map->usercnt to filter out the map-fd/pinned-map usage. However, that will also tie down the future semantics of map->refcnt and map->usercnt. The very first subsystem's refcnt (during reg()) holds one count to map->refcnt. When the very last subsystem's refcnt is gone, it will also release the map->refcnt. All bpf_prog will be freed when the map->refcnt reaches 0 (i.e. during map_free()). Here is how the bpftool map command will look like: [root@arch-fb-vm1 bpf]# bpftool map show 6: struct_ops name dctcp flags 0x0 key 4B value 256B max_entries 1 memlock 4096B btf_id 6 [root@arch-fb-vm1 bpf]# bpftool map dump id 6 [{ "value": { "refcnt": { "refs": { "counter": 1 } }, "state": 1, "data": { "list": { "next": 0, "prev": 0 }, "key": 0, "flags": 2, "init": 24, "release": 0, "ssthresh": 25, "cong_avoid": 30, "set_state": 27, "cwnd_event": 28, "in_ack_event": 26, "undo_cwnd": 29, "pkts_acked": 0, "min_tso_segs": 0, "sndbuf_expand": 0, "cong_control": 0, "get_info": 0, "name": [98,112,102,95,100,99,116,99,112,0,0,0,0,0,0,0 ], "owner": 0 } } } ] Misc Notes: * bpf_struct_ops_map_sys_lookup_elem() is added for syscall lookup. It does an inplace update on "*value" instead returning a pointer to syscall.c. Otherwise, it needs a separate copy of "zero" value for the BPF_STRUCT_OPS_STATE_INIT to avoid races. * The bpf_struct_ops_map_delete_elem() is also called without preempt_disable() from map_delete_elem(). It is because the "->unreg()" may requires sleepable context, e.g. the "tcp_unregister_congestion_control()". * "const" is added to some of the existing "struct btf_func_model *" function arg to avoid a compiler warning caused by this patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003505.3855919-1-kafai@fb.com
|
#
27ae7997 |
|
08-Jan-2020 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS This patch allows the kernel's struct ops (i.e. func ptr) to be implemented in BPF. The first use case in this series is the "struct tcp_congestion_ops" which will be introduced in a latter patch. This patch introduces a new prog type BPF_PROG_TYPE_STRUCT_OPS. The BPF_PROG_TYPE_STRUCT_OPS prog is verified against a particular func ptr of a kernel struct. The attr->attach_btf_id is the btf id of a kernel struct. The attr->expected_attach_type is the member "index" of that kernel struct. The first member of a struct starts with member index 0. That will avoid ambiguity when a kernel struct has multiple func ptrs with the same func signature. For example, a BPF_PROG_TYPE_STRUCT_OPS prog is written to implement the "init" func ptr of the "struct tcp_congestion_ops". The attr->attach_btf_id is the btf id of the "struct tcp_congestion_ops" of the _running_ kernel. The attr->expected_attach_type is 3. The ctx of BPF_PROG_TYPE_STRUCT_OPS is an array of u64 args saved by arch_prepare_bpf_trampoline that will be done in the next patch when introducing BPF_MAP_TYPE_STRUCT_OPS. "struct bpf_struct_ops" is introduced as a common interface for the kernel struct that supports BPF_PROG_TYPE_STRUCT_OPS prog. The supporting kernel struct will need to implement an instance of the "struct bpf_struct_ops". The supporting kernel struct also needs to implement a bpf_verifier_ops. During BPF_PROG_LOAD, bpf_struct_ops_find() will find the right bpf_verifier_ops by searching the attr->attach_btf_id. A new "btf_struct_access" is also added to the bpf_verifier_ops such that the supporting kernel struct can optionally provide its own specific check on accessing the func arg (e.g. provide limited write access). After btf_vmlinux is parsed, the new bpf_struct_ops_init() is called to initialize some values (e.g. the btf id of the supporting kernel struct) and it can only be done once the btf_vmlinux is available. The R0 checks at BPF_EXIT is excluded for the BPF_PROG_TYPE_STRUCT_OPS prog if the return type of the prog->aux->attach_func_proto is "void". Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003503.3855825-1-kafai@fb.com
|
#
976aba00 |
|
08-Jan-2020 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Support bitfield read access in btf_struct_access This patch allows bitfield access as a scalar. It checks "off + size > t->size" to avoid accessing bitfield end up accessing beyond the struct. This check is done outside of the loop since it is applicable to all access. It also takes this chance to break early on the "off < moff" case. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003501.3855427-1-kafai@fb.com
|
#
218b3f65 |
|
08-Jan-2020 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Add enum support to btf_ctx_access() It allows bpf prog (e.g. tracing) to attach to a kernel function that takes enum argument. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003459.3855366-1-kafai@fb.com
|
#
275517ff |
|
08-Jan-2020 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Avoid storing modifier to info->btf_id info->btf_id expects the btf_id of a struct, so it should store the final result after skipping modifiers (if any). It also takes this chanace to add a missing newline in one of the bpf_log() messages. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003456.3855176-1-kafai@fb.com
|
#
4c80c7bc |
|
10-Dec-2019 |
Arnd Bergmann <arnd@arndb.de> |
bpf: Fix build in minimal configurations, again Building with -Werror showed another failure: kernel/bpf/btf.c: In function 'btf_get_prog_ctx_type.isra.31': kernel/bpf/btf.c:3508:63: error: array subscript 0 is above array bounds of 'u8[0]' {aka 'unsigned char[0]'} [-Werror=array-bounds] ctx_type = btf_type_member(conv_struct) + bpf_ctx_convert_map[prog_type] * 2; I don't actually understand why the array is empty, but a similar fix has addressed a related problem, so I suppose we can do the same thing here. Fixes: ce27709b8162 ("bpf: Fix build in minimal configurations") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191210203553.2941035-1-arnd@arndb.de
|
#
ce27709b |
|
27-Nov-2019 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Fix build in minimal configurations Some kconfigs can have BPF enabled without a single valid program type. In such configurations the build will fail with: ./kernel/bpf/btf.c:3466:1: error: empty enum is invalid Fix it by adding unused value to the enum. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/bpf/20191128043508.2346723-1-ast@kernel.org
|
#
d0f01043 |
|
26-Nov-2019 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Fix static checker warning kernel/bpf/btf.c:4023 btf_distill_func_proto() error: potentially dereferencing uninitialized 't'. kernel/bpf/btf.c 4012 nargs = btf_type_vlen(func); 4013 if (nargs >= MAX_BPF_FUNC_ARGS) { 4014 bpf_log(log, 4015 "The function %s has %d arguments. Too many.\n", 4016 tname, nargs); 4017 return -EINVAL; 4018 } 4019 ret = __get_type_size(btf, func->type, &t); ^^ t isn't initialized for the first -EINVAL return This is unlikely path, since BTF should have been validated at this point. Fix it by returning 'void' BTF. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191126230106.237179-1-ast@kernel.org
|
#
5b92a28a |
|
14-Nov-2019 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Support attaching tracing BPF program to other BPF programs Allow FENTRY/FEXIT BPF programs to attach to other BPF programs of any type including their subprograms. This feature allows snooping on input and output packets in XDP, TC programs including their return values. In order to do that the verifier needs to track types not only of vmlinux, but types of other BPF programs as well. The verifier also needs to translate uapi/linux/bpf.h types used by networking programs into kernel internal BTF types used by FENTRY/FEXIT BPF programs. In some cases LLVM optimizations can remove arguments from BPF subprograms without adjusting BTF info that LLVM backend knows. When BTF info disagrees with actual types that the verifiers sees the BPF trampoline has to fallback to conservative and treat all arguments as u64. The FENTRY/FEXIT program can still attach to such subprograms, but it won't be able to recognize pointer types like 'struct sk_buff *' and it won't be able to pass them to bpf_skb_output() for dumping packets to user space. The FENTRY/FEXIT program would need to use bpf_probe_read_kernel() instead. The BPF_PROG_LOAD command is extended with attach_prog_fd field. When it's set to zero the attach_btf_id is one vmlinux BTF type ids. When attach_prog_fd points to previously loaded BPF program the attach_btf_id is BTF type id of main function or one of its subprograms. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-18-ast@kernel.org
|
#
8c1b6e69 |
|
14-Nov-2019 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Compare BTF types of functions arguments with actual types Make the verifier check that BTF types of function arguments match actual types passed into top-level BPF program and into BPF-to-BPF calls. If types match such BPF programs and sub-programs will have full support of BPF trampoline. If types mismatch the trampoline has to be conservative. It has to save/restore five program arguments and assume 64-bit scalars. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-17-ast@kernel.org
|
#
91cc1a99 |
|
14-Nov-2019 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Annotate context types Annotate BPF program context types with program-side type and kernel-side type. This type information is used by the verifier. btf_get_prog_ctx_type() is used in the later patches to verify that BTF type of ctx in BPF program matches to kernel expected ctx type. For example, the XDP program type is: BPF_PROG_TYPE(BPF_PROG_TYPE_XDP, xdp, struct xdp_md, struct xdp_buff) That means that XDP program should be written as: int xdp_prog(struct xdp_md *ctx) { ... } Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-16-ast@kernel.org
|
#
9cc31b3a |
|
14-Nov-2019 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Fix race in btf_resolve_helper_id() btf_resolve_helper_id() caching logic is a bit racy, since under root the verifier can verify several programs in parallel. Fix it with READ/WRITE_ONCE. Fix the type as well, since error is also recorded. Fixes: a7658e1a4164 ("bpf: Check types of arguments passed into helpers") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-15-ast@kernel.org
|
#
fec56f58 |
|
14-Nov-2019 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Introduce BPF trampoline Introduce BPF trampoline concept to allow kernel code to call into BPF programs with practically zero overhead. The trampoline generation logic is architecture dependent. It's converting native calling convention into BPF calling convention. BPF ISA is 64-bit (even on 32-bit architectures). The registers R1 to R5 are used to pass arguments into BPF functions. The main BPF program accepts only single argument "ctx" in R1. Whereas CPU native calling convention is different. x86-64 is passing first 6 arguments in registers and the rest on the stack. x86-32 is passing first 3 arguments in registers. sparc64 is passing first 6 in registers. And so on. The trampolines between BPF and kernel already exist. BPF_CALL_x macros in include/linux/filter.h statically compile trampolines from BPF into kernel helpers. They convert up to five u64 arguments into kernel C pointers and integers. On 64-bit architectures this BPF_to_kernel trampolines are nops. On 32-bit architecture they're meaningful. The opposite job kernel_to_BPF trampolines is done by CAST_TO_U64 macros and __bpf_trace_##call() shim functions in include/trace/bpf_probe.h. They convert kernel function arguments into array of u64s that BPF program consumes via R1=ctx pointer. This patch set is doing the same job as __bpf_trace_##call() static trampolines, but dynamically for any kernel function. There are ~22k global kernel functions that are attachable via nop at function entry. The function arguments and types are described in BTF. The job of btf_distill_func_proto() function is to extract useful information from BTF into "function model" that architecture dependent trampoline generators will use to generate assembly code to cast kernel function arguments into array of u64s. For example the kernel function eth_type_trans has two pointers. They will be casted to u64 and stored into stack of generated trampoline. The pointer to that stack space will be passed into BPF program in R1. On x86-64 such generated trampoline will consume 16 bytes of stack and two stores of %rdi and %rsi into stack. The verifier will make sure that only two u64 are accessed read-only by BPF program. The verifier will also recognize the precise type of the pointers being accessed and will not allow typecasting of the pointer to a different type within BPF program. The tracing use case in the datacenter demonstrated that certain key kernel functions have (like tcp_retransmit_skb) have 2 or more kprobes that are always active. Other functions have both kprobe and kretprobe. So it is essential to keep both kernel code and BPF programs executing at maximum speed. Hence generated BPF trampoline is re-generated every time new program is attached or detached to maintain maximum performance. To avoid the high cost of retpoline the attached BPF programs are called directly. __bpf_prog_enter/exit() are used to support per-program execution stats. In the future this logic will be optimized further by adding support for bpf_stats_enabled_key inside generated assembly code. Introduction of preemptible and sleepable BPF programs will completely remove the need to call to __bpf_prog_enter/exit(). Detach of a BPF program from the trampoline should not fail. To avoid memory allocation in detach path the half of the page is used as a reserve and flipped after each attach/detach. 2k bytes is enough to call 40+ BPF programs directly which is enough for BPF tracing use cases. This limit can be increased in the future. BPF_TRACE_FENTRY programs have access to raw kernel function arguments while BPF_TRACE_FEXIT programs have access to kernel return value as well. Often kprobe BPF program remembers function arguments in a map while kretprobe fetches arguments from a map and analyzes them together with return value. BPF_TRACE_FEXIT accelerates this typical use case. Recursion prevention for kprobe BPF programs is done via per-cpu bpf_prog_active counter. In practice that turned out to be a mistake. It caused programs to randomly skip execution. The tracing tools missed results they were looking for. Hence BPF trampoline doesn't provide builtin recursion prevention. It's a job of BPF program itself and will be addressed in the follow up patches. BPF trampoline is intended to be used beyond tracing and fentry/fexit use cases in the future. For example to remove retpoline cost from XDP programs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org
|
#
7e3617a7 |
|
07-Nov-2019 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Add array support to btf_struct_access This patch adds array support to btf_struct_access(). It supports array of int, array of struct and multidimensional array. It also allows using u8[] as a scratch space. For example, it allows access the "char cb[48]" with size larger than the array's element "char". Another potential use case is "u64 icsk_ca_priv[]" in the tcp congestion control. btf_resolve_size() is added to resolve the size of any type. It will follow the modifier if there is any. Please see the function comment for details. This patch also adds the "off < moff" check at the beginning of the for loop. It is to reject cases when "off" is pointing to a "hole" in a struct. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191107180903.4097702-1-kafai@fb.com
|
#
38207291 |
|
24-Oct-2019 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Prepare btf_ctx_access for non raw_tp use case This patch makes a few changes to btf_ctx_access() to prepare it for non raw_tp use case where the attach_btf_id is not necessary a BTF_KIND_TYPEDEF. It moves the "btf_trace_" prefix check and typedef-follow logic to a new function "check_attach_btf_id()" which is called only once during bpf_check(). btf_ctx_access() only operates on a BTF_KIND_FUNC_PROTO type now. That should also be more efficient since it is done only one instead of every-time check_ctx_access() is called. "check_attach_btf_id()" needs to find the func_proto type from the attach_btf_id. It needs to store the result into the newly added prog->aux->attach_func_proto. func_proto btf type has no name, so a proper name should be stored into "attach_func_name" also. v2: - Move the "btf_trace_" check to an earlier verifier phase (Alexei) Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191025001811.1718491-1-kafai@fb.com
|
#
a7658e1a |
|
15-Oct-2019 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Check types of arguments passed into helpers Introduce new helper that reuses existing skb perf_event output implementation, but can be called from raw_tracepoint programs that receive 'struct sk_buff *' as tracepoint argument or can walk other kernel data structures to skb pointer. In order to do that teach verifier to resolve true C types of bpf helpers into in-kernel BTF ids. The type of kernel pointer passed by raw tracepoint into bpf program will be tracked by the verifier all the way until it's passed into helper function. For example: kfree_skb() kernel function calls trace_kfree_skb(skb, loc); bpf programs receives that skb pointer and may eventually pass it into bpf_skb_output() bpf helper which in-kernel is implemented via bpf_skb_event_output() kernel function. Its first argument in the kernel is 'struct sk_buff *'. The verifier makes sure that types match all the way. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-11-ast@kernel.org
|
#
9e15db66 |
|
15-Oct-2019 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Implement accurate raw_tp context access via BTF libbpf analyzes bpf C program, searches in-kernel BTF for given type name and stores it into expected_attach_type. The kernel verifier expects this btf_id to point to something like: typedef void (*btf_trace_kfree_skb)(void *, struct sk_buff *skb, void *loc); which represents signature of raw_tracepoint "kfree_skb". Then btf_ctx_access() matches ctx+0 access in bpf program with 'skb' and 'ctx+8' access with 'loc' arguments of "kfree_skb" tracepoint. In first case it passes btf_id of 'struct sk_buff *' back to the verifier core and 'void *' in second case. Then the verifier tracks PTR_TO_BTF_ID as any other pointer type. Like PTR_TO_SOCKET points to 'struct bpf_sock', PTR_TO_TCP_SOCK points to 'struct bpf_tcp_sock', and so on. PTR_TO_BTF_ID points to in-kernel structs. If 1234 is btf_id of 'struct sk_buff' in vmlinux's BTF then PTR_TO_BTF_ID#1234 points to one of in kernel skbs. When PTR_TO_BTF_ID#1234 is dereferenced (like r2 = *(u64 *)r1 + 32) the btf_struct_access() checks which field of 'struct sk_buff' is at offset 32. Checks that size of access matches type definition of the field and continues to track the dereferenced type. If that field was a pointer to 'struct net_device' the r2's type will be PTR_TO_BTF_ID#456. Where 456 is btf_id of 'struct net_device' in vmlinux's BTF. Such verifier analysis prevents "cheating" in BPF C program. The program cannot cast arbitrary pointer to 'struct sk_buff *' and access it. C compiler would allow type cast, of course, but the verifier will notice type mismatch based on BPF assembly and in-kernel BTF. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-7-ast@kernel.org
|
#
8580ac94 |
|
15-Oct-2019 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Process in-kernel BTF If in-kernel BTF exists parse it and prepare 'struct btf *btf_vmlinux' for further use by the verifier. In-kernel BTF is trusted just like kallsyms and other build artifacts embedded into vmlinux. Yet run this BTF image through BTF verifier to make sure that it is valid and it wasn't mangled during the build. If in-kernel BTF is incorrect it means either gcc or pahole or kernel are buggy. In such case disallow loading BPF programs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-4-ast@kernel.org
|
#
e3439af4 |
|
25-Sep-2019 |
Colin Ian King <colin.king@canonical.com> |
bpf: Clean up indentation issue in BTF kflag processing There is a statement that is indented one level too deeply, remove the extraneous tab. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20190925093835.19515-1-colin.king@canonical.com
|
#
9eea9849 |
|
17-Sep-2019 |
Alexei Starovoitov <ast@kernel.org> |
bpf: fix BTF verification of enums vmlinux BTF has enums that are 8 byte and 1 byte in size. 2 byte enum is a valid construct as well. Fix BTF enum verification to accept those sizes. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
1b9ed84e |
|
20-Aug-2019 |
Quentin Monnet <quentin@isovalent.com> |
bpf: add new BPF_BTF_GET_NEXT_ID syscall command Add a new command for the bpf() system call: BPF_BTF_GET_NEXT_ID is used to cycle through all BTF objects loaded on the system. The motivation is to be able to inspect (list) all BTF objects presents on the system. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
3481e64b |
|
20-Aug-2019 |
Quentin Monnet <quentin@isovalent.com> |
bpf: add BTF ids in procfs for file descriptors to BTF objects Implement the show_fdinfo hook for BTF FDs file operations, and make it print the id of the BTF object. This allows for a quick retrieval of the BTF id from its FD; or it can help understanding what type of object (BTF) the file descriptor points to. v2: - Do not expose data_size, only btf_id, in FD info. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
1acc5d5c |
|
12-Jul-2019 |
Andrii Nakryiko <andriin@fb.com> |
bpf: fix BTF verifier size resolution logic BTF verifier has a size resolution bug which in some circumstances leads to invalid size resolution for, e.g., TYPEDEF modifier. This happens if we have [1] PTR -> [2] TYPEDEF -> [3] ARRAY, in which case due to being in pointer context ARRAY size won't be resolved (because for pointer it doesn't matter, so it's a sink in pointer context), but it will be permanently remembered as zero for TYPEDEF and TYPEDEF will be marked as RESOLVED. Eventually ARRAY size will be resolved correctly, but TYPEDEF resolved_size won't be updated anymore. This, subsequently, will lead to erroneous map creation failure, if that TYPEDEF is specified as either key or value, as key_size/value_size won't correspond to resolved size of TYPEDEF (kernel will believe it's zero). Note, that if BTF was ordered as [1] ARRAY <- [2] TYPEDEF <- [3] PTR, this won't be a problem, as by the time we get to TYPEDEF, ARRAY's size is already calculated and stored. This bug manifests itself in rejecting BTF-defined maps that use array typedef as a value type: typedef int array_t[16]; struct { __uint(type, BPF_MAP_TYPE_ARRAY); __type(value, array_t); /* i.e., array_t *value; */ } test_map SEC(".maps"); The fix consists on not relying on modifier's resolved_size and instead using modifier's resolved_id (type ID for "concrete" type to which modifier eventually resolves) and doing size determination for that resolved type. This allow to preserve existing "early DFS termination" logic for PTR or STRUCT_OR_ARRAY contexts, but still do correct size determination for modifier types. Fixes: eb3f595dab40 ("bpf: btf: Validate type reference") Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
e4f07120 |
|
19-Jun-2019 |
Stanislav Fomichev <sdf@google.com> |
bpf: fix NULL deref in btf_type_is_resolve_source_only Commit 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec") added invocations of btf_type_is_resolve_source_only before btf_type_nosize_or_null which checks for the NULL pointer. Swap the order of btf_type_nosize_or_null and btf_type_is_resolve_source_only to make sure the do the NULL pointer check first. Fixes: 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
2824ecb7 |
|
09-Apr-2019 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: allow for key-less BTF in array map Given we'll be reusing BPF array maps for global data/bss/rodata sections, we need a way to associate BTF DataSec type as its map value type. In usual cases we have this ugly BPF_ANNOTATE_KV_PAIR() macro hack e.g. via 38d5d3b3d5db ("bpf: Introduce BPF_ANNOTATE_KV_PAIR") to get initial map to type association going. While more use cases for it are discouraged, this also won't work for global data since the use of array map is a BPF loader detail and therefore unknown at compilation time. For array maps with just a single entry we make an exception in terms of BTF in that key type is declared optional if value type is of DataSec type. The latter LLVM is guaranteed to emit and it also aligns with how we regard global data maps as just a plain buffer area reusing existing map facilities for allowing things like introspection with existing tools. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
1dc92851 |
|
09-Apr-2019 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf: kernel side support for BTF Var and DataSec This work adds kernel-side verification, logging and seq_show dumping of BTF Var and DataSec kinds which are emitted with latest LLVM. The following constraints apply: BTF Var must have: - Its kind_flag is 0 - Its vlen is 0 - Must point to a valid type - Type must not resolve to a forward type - Size of underlying type must be > 0 - Must have a valid name - Can only be a source type, not sink or intermediate one - Name may include dots (e.g. in case of static variables inside functions) - Cannot be a member of a struct/union - Linkage so far can either only be static or global/allocated BTF DataSec must have: - Its kind_flag is 0 - Its vlen cannot be 0 - Its size cannot be 0 - Must have a valid name - Can only be a source type, not sink or intermediate one - Name may include dots (e.g. to represent .bss, .data, .rodata etc) - Cannot be a member of a struct/union - Inner btf_var_secinfo array with {type,offset,size} triple must be sorted by offset in ascending order - Type must always point to BTF Var - BTF resolved size of Var must be <= size provided by triple - DataSec size must be >= sum of triple sizes (thus holes are allowed) btf_var_resolve(), btf_ptr_resolve() and btf_modifier_resolve() are on a high level quite similar but each come with slight, subtle differences. They could potentially be a bit refactored in future which hasn't been done here to ease review. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
d83525ca |
|
31-Jan-2019 |
Alexei Starovoitov <ast@kernel.org> |
bpf: introduce bpf_spin_lock Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let bpf program serialize access to other variables. Example: struct hash_elem { int cnt; struct bpf_spin_lock lock; }; struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key); if (val) { bpf_spin_lock(&val->lock); val->cnt++; bpf_spin_unlock(&val->lock); } Restrictions and safety checks: - bpf_spin_lock is only allowed inside HASH and ARRAY maps. - BTF description of the map is mandatory for safety analysis. - bpf program can take one bpf_spin_lock at a time, since two or more can cause dead locks. - only one 'struct bpf_spin_lock' is allowed per map element. It drastically simplifies implementation yet allows bpf program to use any number of bpf_spin_locks. - when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed. - bpf program must bpf_spin_unlock() before return. - bpf program can access 'struct bpf_spin_lock' only via bpf_spin_lock()/bpf_spin_unlock() helpers. - load/store into 'struct bpf_spin_lock lock;' field is not allowed. - to use bpf_spin_lock() helper the BTF description of map value must be a struct and have 'struct bpf_spin_lock anyname;' field at the top level. Nested lock inside another struct is not allowed. - syscall map_lookup doesn't copy bpf_spin_lock field to user space. - syscall map_update and program map_update do not update bpf_spin_lock field. - bpf_spin_lock cannot be on the stack or inside networking packet. bpf_spin_lock can only be inside HASH or ARRAY map value. - bpf_spin_lock is available to root only and to all program types. - bpf_spin_lock is not allowed in inner maps of map-in-map. - ld_abs is not allowed inside spin_lock-ed region. - tracing progs and socket filter progs cannot use bpf_spin_lock due to insufficient preemption checks Implementation details: - cgroup-bpf class of programs can nest with xdp/tc programs. Hence bpf_spin_lock is equivalent to spin_lock_irqsave. Other solutions to avoid nested bpf_spin_lock are possible. Like making sure that all networking progs run with softirq disabled. spin_lock_irqsave is the simplest and doesn't add overhead to the programs that don't use it. - arch_spinlock_t is used when its implemented as queued_spin_lock - archs can force their own arch_spinlock_t - on architectures where queued_spin_lock is not available and sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used. - presence of bpf_spin_lock inside map value could have been indicated via extra flag during map_create, but specifying it via BTF is cleaner. It provides introspection for map key/value and reduces user mistakes. Next steps: - allow bpf_spin_lock in other map types (like cgroup local storage) - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper to request kernel to grab bpf_spin_lock before rewriting the value. That will serialize access to map elements. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
81f5c6f5 |
|
29-Jan-2019 |
Yonghong Song <yhs@fb.com> |
bpf: btf: allow typedef func_proto Current implementation does not allow typedef func_proto. But it is actually allowed. -bash-4.4$ cat t.c typedef int (f) (int); f *g; -bash-4.4$ clang -O2 -g -c -target bpf t.c -Xclang -target-feature -Xclang +dwarfris -bash-4.4$ pahole -JV t.o File t.o: [1] PTR (anon) type_id=2 [2] TYPEDEF f type_id=3 [3] FUNC_PROTO (anon) return=4 args=(4 (anon)) [4] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED -bash-4.4$ This patch related btf verifier to allow such (typedef func_proto) patterns. Fixes: 2667a2626f4d ("bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO") Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
583c5318 |
|
16-Jan-2019 |
Mathieu Malaterre <malat@debian.org> |
bpf: Make function btf_name_offset_valid static Initially in commit 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") the function 'btf_name_offset_valid' was introduced as static function it was later on changed to a non-static one, and then finally in commit 23127b33ec80 ("bpf: Create a new btf_name_by_offset() for non type name use case") the function prototype was removed. Revert back to original implementation and make the function static. Remove warning triggered with W=1: kernel/bpf/btf.c:470:6: warning: no previous prototype for 'btf_name_offset_valid' [-Wmissing-prototypes] Fixes: 23127b33ec80 ("bpf: Create a new btf_name_by_offset() for non type name use case") Signed-off-by: Mathieu Malaterre <malat@debian.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
b1e8818c |
|
15-Jan-2019 |
Yonghong Song <yhs@fb.com> |
bpf: btf: support 128 bit integer type Currently, btf only supports up to 64-bit integer. On the other hand, 128bit support for gcc and clang has existed for a long time. For example, both gcc 4.8 and llvm 3.7 supports types "__int128" and "unsigned __int128" for virtually all 64bit architectures including bpf. The requirement for __int128 support comes from two areas: . bpf program may use __int128. For example, some bcc tools (https://github.com/iovisor/bcc/tree/master/tools), mostly tcp v6 related, tcpstates.py, tcpaccept.py, etc., are using __int128 to represent the ipv6 addresses. . linux itself is using __int128 types. Hence supporting __int128 type in BTF is required for vmlinux BTF, which will be used by "compile once and run everywhere" and other projects. For 128bit integer, instead of base-10, hex numbers are pretty printed out as large decimal number is hard to decipher, e.g., for ipv6 addresses. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
17e3ac81 |
|
10-Jan-2019 |
Yonghong Song <yhs@fb.com> |
bpf: fix bpffs bitfield pretty print Commit 9d5f9f701b18 ("bpf: btf: fix struct/union/fwd types with kind_flag") introduced kind_flag and used bitfield_size in the btf_member to directly pretty print member values. The commit contained a bug where the incorrect parameters could be passed to function btf_bitfield_seq_show(). The bits_offset parameter in the function expects a value less than 8. Instead, the member offset in the structure is passed. The below is btf_bitfield_seq_show() func signature: void btf_bitfield_seq_show(void *data, u8 bits_offset, u8 nr_bits, struct seq_file *m) both bits_offset and nr_bits are u8 type. If the bitfield member offset is greater than 256, incorrect value will be printed. This patch fixed the issue by calculating correct proper data offset and bits_offset similar to non kind_flag case. Fixes: 9d5f9f701b18 ("bpf: btf: fix struct/union/fwd types with kind_flag") Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
76c43ae8 |
|
18-Dec-2018 |
Yonghong Song <yhs@fb.com> |
bpf: log struct/union attribute for forward type Current btf internal verbose logger logs fwd type as [2] FWD A type_id=0 where A is the type name. Commit 9d5f9f701b18 ("bpf: btf: fix struct/union/fwd types with kind_flag") introduced kind_flag which can be used to distinguish whether a forward type is a struct or union. Also, "type_id=0" does not carry any meaningful information for fwd type as btf_type.type = 0 is simply enforced during btf verification and is not used anywhere else. This commit changed the log to [2] FWD A struct if kind_flag = 0, or [2] FWD A union if kind_flag = 1. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
ffa0c1cf |
|
15-Dec-2018 |
Yonghong Song <yhs@fb.com> |
bpf: enable cgroup local storage map pretty print with kind_flag Commit 970289fc0a83 ("bpf: add bpffs pretty print for cgroup local storage maps") added bpffs pretty print for cgroup local storage maps. The commit worked for struct without kind_flag set. This patch refactored and made pretty print also work with kind_flag set for the struct. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
9d5f9f70 |
|
15-Dec-2018 |
Yonghong Song <yhs@fb.com> |
bpf: btf: fix struct/union/fwd types with kind_flag This patch fixed two issues with BTF. One is related to struct/union bitfield encoding and the other is related to forward type. Issue #1 and solution: ====================== Current btf encoding of bitfield follows what pahole generates. For each bitfield, pahole will duplicate the type chain and put the bitfield size at the final int or enum type. Since the BTF enum type cannot encode bit size, pahole workarounds the issue by generating an int type whenever the enum bit size is not 32. For example, -bash-4.4$ cat t.c typedef int ___int; enum A { A1, A2, A3 }; struct t { int a[5]; ___int b:4; volatile enum A c:4; } g; -bash-4.4$ gcc -c -O2 -g t.c The current kernel supports the following BTF encoding: $ pahole -JV t.o [1] TYPEDEF ___int type_id=2 [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED [3] ENUM A size=4 vlen=3 A1 val=0 A2 val=1 A3 val=2 [4] STRUCT t size=24 vlen=3 a type_id=5 bits_offset=0 b type_id=9 bits_offset=160 c type_id=11 bits_offset=164 [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5 [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none) [7] VOLATILE (anon) type_id=3 [8] INT int size=1 bit_offset=0 nr_bits=4 encoding=(none) [9] TYPEDEF ___int type_id=8 [10] INT (anon) size=1 bit_offset=0 nr_bits=4 encoding=SIGNED [11] VOLATILE (anon) type_id=10 Two issues are in the above: . by changing enum type to int, we lost the original type information and this will not be ideal later when we try to convert BTF to a header file. . the type duplication for bitfields will cause BTF bloat. Duplicated types cannot be deduplicated later if the bitfield size is different. To fix this issue, this patch implemented a compatible change for BTF struct type encoding: . the bit 31 of struct_type->info, previously reserved, now is used to indicate whether bitfield_size is encoded in btf_member or not. . if bit 31 of struct_type->info is set, btf_member->offset will encode like: bit 0 - 23: bit offset bit 24 - 31: bitfield size if bit 31 is not set, the old behavior is preserved: bit 0 - 31: bit offset So if the struct contains a bit field, the maximum bit offset will be reduced to (2^24 - 1) instead of MAX_UINT. The maximum bitfield size will be 256 which is enough for today as maximum bitfield in compiler can be 128 where int128 type is supported. This kernel patch intends to support the new BTF encoding: $ pahole -JV t.o [1] TYPEDEF ___int type_id=2 [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED [3] ENUM A size=4 vlen=3 A1 val=0 A2 val=1 A3 val=2 [4] STRUCT t kind_flag=1 size=24 vlen=3 a type_id=5 bitfield_size=0 bits_offset=0 b type_id=1 bitfield_size=4 bits_offset=160 c type_id=7 bitfield_size=4 bits_offset=164 [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5 [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none) [7] VOLATILE (anon) type_id=3 Issue #2 and solution: ====================== Current forward type in BTF does not specify whether the original type is struct or union. This will not work for type pretty print and BTF-to-header-file conversion as struct/union must be specified. $ cat tt.c struct t; union u; int foo(struct t *t, union u *u) { return 0; } $ gcc -c -g -O2 tt.c $ pahole -JV tt.o [1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED [2] FWD t type_id=0 [3] PTR (anon) type_id=2 [4] FWD u type_id=0 [5] PTR (anon) type_id=4 To fix this issue, similar to issue #1, type->info bit 31 is used. If the bit is set, it is union type. Otherwise, it is a struct type. $ pahole -JV tt.o [1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED [2] FWD t kind_flag=0 type_id=0 [3] PTR (anon) kind_flag=0 type_id=2 [4] FWD u kind_flag=1 type_id=0 [5] PTR (anon) kind_flag=0 type_id=4 Pahole/LLVM change: =================== The new kind_flag functionality has been implemented in pahole and llvm: https://github.com/yonghong-song/pahole/tree/bitfield https://github.com/yonghong-song/llvm/tree/bitfield Note that pahole hasn't implemented func/func_proto kind and .BTF.ext. So to print function signature with bpftool, the llvm compiler should be used. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
f97be3ab |
|
15-Dec-2018 |
Yonghong Song <yhs@fb.com> |
bpf: btf: refactor btf_int_bits_seq_show() Refactor function btf_int_bits_seq_show() by creating function btf_bitfield_seq_show() which has no dependence on btf and btf_type. The function btf_bitfield_seq_show() will be in later patch to directly dump bitfield member values. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
23127b33 |
|
13-Dec-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Create a new btf_name_by_offset() for non type name use case The current btf_name_by_offset() is returning "(anon)" type name for the offset == 0 case and "(invalid-name-offset)" for the out-of-bound offset case. It fits well for the internal BTF verbose log purpose which is focusing on type. For example, offset == 0 => "(anon)" => anonymous type/name. Returning non-NULL for the bad offset case is needed during the BTF verification process because the BTF verifier may complain about another field first before discovering the name_off is invalid. However, it may not be ideal for the newer use case which does not necessary mean type name. For example, when logging line_info in the BPF verifier in the next patch, it is better to log an empty src line instead of logging "(anon)". The existing bpf_name_by_offset() is renamed to __bpf_name_by_offset() and static to btf.c. A new bpf_name_by_offset() is added for generic context usage. It returns "\0" for name_off == 0 (note that btf->strings[0] is "\0") and NULL for invalid offset. It allows the caller to decide what is the best output in its context. The new btf_name_by_offset() is overlapped with btf_name_offset_valid(). Hence, btf_name_offset_valid() is removed from btf.h to keep the btf.h API minimal. The existing btf_name_offset_valid() usage in btf.c could also be replaced later. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
9a1126b6 |
|
10-Dec-2018 |
Roman Gushchin <guroan@gmail.com> |
bpf: add bpffs pretty print for cgroup local storage maps Implement bpffs pretty printing for cgroup local storage maps (both shared and per-cpu). Output example (captured for tools/testing/selftests/bpf/netcnt_prog.c): Shared: $ cat /sys/fs/bpf/map_2 # WARNING!! The output is for debug purpose only # WARNING!! The output format will change {4294968594,1}: {9999,1039896} Per-cpu: $ cat /sys/fs/bpf/map_1 # WARNING!! The output is for debug purpose only # WARNING!! The output format will change {4294968594,1}: { cpu0: {0,0,0,0,0} cpu1: {0,0,0,0,0} cpu2: {1,104,0,0,0} cpu3: {0,0,0,0,0} } Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
c454a46b |
|
07-Dec-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: Add bpf_line_info support This patch adds bpf_line_info support. It accepts an array of bpf_line_info objects during BPF_PROG_LOAD. The "line_info", "line_info_cnt" and "line_info_rec_size" are added to the "union bpf_attr". The "line_info_rec_size" makes bpf_line_info extensible in the future. The new "check_btf_line()" ensures the userspace line_info is valid for the kernel to use. When the verifier is translating/patching the bpf_prog (through "bpf_patch_insn_single()"), the line_infos' insn_off is also adjusted by the newly added "bpf_adj_linfo()". If the bpf_prog is jited, this patch also provides the jited addrs (in aux->jited_linfo) for the corresponding line_info.insn_off. "bpf_prog_fill_jited_linfo()" is added to fill the aux->jited_linfo. It is currently called by the x86 jit. Other jits can also use "bpf_prog_fill_jited_linfo()" and it will be done in the followup patches. In the future, if it deemed necessary, a particular jit could also provide its own "bpf_prog_fill_jited_linfo()" implementation. A few "*line_info*" fields are added to the bpf_prog_info such that the user can get the xlated line_info back (i.e. the line_info with its insn_off reflecting the translated prog). The jited_line_info is available if the prog is jited. It is an array of __u64. If the prog is not jited, jited_line_info_cnt is 0. The verifier's verbose log with line_info will be done in a follow up patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
eb04bbb6 |
|
27-Nov-2018 |
Yonghong Song <yhs@fb.com> |
bpf: btf: check name validity for various types This patch added name checking for the following types: . BTF_KIND_PTR, BTF_KIND_ARRAY, BTF_KIND_VOLATILE, BTF_KIND_CONST, BTF_KIND_RESTRICT: the name must be null . BTF_KIND_STRUCT, BTF_KIND_UNION: the struct/member name is either null or a valid identifier . BTF_KIND_ENUM: the enum type name is either null or a valid identifier; the enumerator name must be a valid identifier. . BTF_KIND_FWD: the name must be a valid identifier . BTF_KIND_TYPEDEF: the name must be a valid identifier For those places a valid name is required, the name must be a valid C identifier. This can be relaxed later if we found use cases for a different (non-C) frontend. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
cdbb096a |
|
27-Nov-2018 |
Yonghong Song <yhs@fb.com> |
bpf: btf: implement btf_name_valid_identifier() Function btf_name_valid_identifier() have been implemented in bpf-next commit 2667a2626f4d ("bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO"). Backport this function so later patch can use it. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
311fe1a8 |
|
25-Nov-2018 |
Colin Ian King <colin.king@canonical.com> |
bpf: btf: fix spelling mistake "Memmber" -> "Member" There is a spelling mistake in a btf_verifier_log_member message, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
838e9690 |
|
19-Nov-2018 |
Yonghong Song <yhs@fb.com> |
bpf: Introduce bpf_func_info This patch added interface to load a program with the following additional information: . prog_btf_fd . func_info, func_info_rec_size and func_info_cnt where func_info will provide function range and type_id corresponding to each function. The func_info_rec_size is introduced in the UAPI to specify struct bpf_func_info size passed from user space. This intends to make bpf_func_info structure growable in the future. If the kernel gets a different bpf_func_info size from userspace, it will try to handle user request with part of bpf_func_info it can understand. In this patch, kernel can understand struct bpf_func_info { __u32 insn_offset; __u32 type_id; }; If user passed a bpf func_info record size of 16 bytes, the kernel can still handle part of records with the above definition. If verifier agrees with function range provided by the user, the bpf_prog ksym for each function will use the func name provided in the type_id, which is supposed to provide better encoding as it is not limited by 16 bytes program name limitation and this is better for bpf program which contains multiple subprograms. The bpf_prog_info interface is also extended to return btf_id, func_info, func_info_rec_size and func_info_cnt to userspace, so userspace can print out the function prototype for each xlated function. The insn_offset in the returned func_info corresponds to the insn offset for xlated functions. With other jit related fields in bpf_prog_info, userspace can also print out function prototypes for each jited function. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
2667a262 |
|
19-Nov-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO This patch adds BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO to support the function debug info. BTF_KIND_FUNC_PROTO must not have a name (i.e. !t->name_off) and it is followed by >= 0 'struct bpf_param' objects to describe the function arguments. The BTF_KIND_FUNC must have a valid name and it must refer back to a BTF_KIND_FUNC_PROTO. The above is the conclusion after the discussion between Edward Cree, Alexei, Daniel, Yonghong and Martin. By combining BTF_KIND_FUNC and BTF_LIND_FUNC_PROTO, a complete function signature can be obtained. It will be used in the later patches to learn the function signature of a running bpf program. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
b47a0bd2 |
|
19-Nov-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Break up btf_type_is_void() This patch breaks up btf_type_is_void() into btf_type_is_void() and btf_type_is_fwd(). It also adds btf_type_nosize() to better describe it is testing a type has nosize info. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
4a6998af |
|
24-Oct-2018 |
Martin Lau <kafai@fb.com> |
bpf, btf: fix a missing check bug in btf_parse Wenwen Wang reported: In btf_parse(), the header of the user-space btf data 'btf_data' is firstly parsed and verified through btf_parse_hdr(). In btf_parse_hdr(), the header is copied from user-space 'btf_data' to kernel-space 'btf->hdr' and then verified. If no error happens during the verification process, the whole data of 'btf_data', including the header, is then copied to 'data' in btf_parse(). It is obvious that the header is copied twice here. More importantly, no check is enforced after the second copy to make sure the headers obtained in these two copies are same. Given that 'btf_data' resides in the user space, a malicious user can race to modify the header between these two copies. By doing so, the user can inject inconsistent data, which can cause undefined behavior of the kernel and introduce potential security risk. This issue is similar to the one fixed in commit 8af03d1ae2e1 ("bpf: btf: Fix a missing check bug"). To fix it, this patch copies the user 'btf_data' *before* parsing / verifying the BTF header. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Co-developed-by: Wenwen Wang <wang6495@umn.edu> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
8af03d1a |
|
07-Oct-2018 |
Wenwen Wang <wang6495@umn.edu> |
bpf: btf: Fix a missing check bug In btf_parse_hdr(), the length of the btf data header is firstly copied from the user space to 'hdr_len' and checked to see whether it is larger than 'btf_data_size'. If yes, an error code EINVAL is returned. Otherwise, the whole header is copied again from the user space to 'btf->hdr'. However, after the second copy, there is no check between 'btf->hdr->hdr_len' and 'hdr_len' to confirm that the two copies get the same value. Given that the btf data is in the user space, a malicious user can race to change the data between the two copies. By doing so, the user can provide malicious data to the kernel and cause undefined behavior. This patch adds a necessary check after the second copy, to make sure 'btf->hdr->hdr_len' has the same value as 'hdr_len'. Otherwise, an error code EINVAL will be returned. Signed-off-by: Wenwen Wang <wang6495@umn.edu> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
4b1c5d91 |
|
12-Sep-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Fix end boundary calculation for type section The end boundary math for type section is incorrect in btf_check_all_metas(). It just happens that hdr->type_off is always 0 for now because there are only two sections (type and string) and string section must be at the end (ensured in btf_parse_str_sec). However, type_off may not be 0 if a new section would be added later. This patch fixes it. Fixes: f80442a4cd18 ("bpf: btf: Change how section is supported in btf_header") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
6283fa38 |
|
20-Jul-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Ensure the member->offset is in the right order This patch ensures the member->offset of a struct is in the correct order (i.e the later member's offset cannot go backward). The current "pahole -J" BTF encoder does not generate something like this. However, checking this can ensure future encoder will not violate this. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
36fc3c8c |
|
19-Jul-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Clean up BTF_INT_BITS() in uapi btf.h This patch shrinks the BTF_INT_BITS() mask. The current btf_int_check_meta() ensures the nr_bits of an integer cannot exceed 64. Hence, it is mostly an uapi cleanup. The actual btf usage (i.e. seq_show()) is also modified to use u8 instead of u16. The verification (e.g. btf_int_check_meta()) path stays as is to deal with invalid BTF situation. Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
b65f370d |
|
10-Jul-2018 |
Okash Khawaja <osk@fb.com> |
bpf: btf: Fix bitfield extraction for big endian When extracting bitfield from a number, btf_int_bits_seq_show() builds a mask and accesses least significant byte of the number in a way specific to little-endian. This patch fixes that by checking endianness of the machine and then shifting left and right the unneeded bits. Thanks to Martin Lau for the help in navigating potential pitfalls when dealing with endianess and for the final solution. Fixes: b00b8daec828 ("bpf: btf: Add pretty print capability for data with BTF type info") Signed-off-by: Okash Khawaja <osk@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
778e1cdd |
|
12-Jun-2018 |
Kees Cook <keescook@chromium.org> |
treewide: kvzalloc() -> kvcalloc() The kvzalloc() function has a 2-factor argument form, kvcalloc(). This patch replaces cases of: kvzalloc(a * b, gfp) with: kvcalloc(a * b, gfp) as well as handling cases of: kvzalloc(a * b * c, gfp) with: kvzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kvcalloc(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kvzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kvzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kvzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kvzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kvzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kvzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kvzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kvzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kvzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kvzalloc( - sizeof(char) * COUNT + COUNT , ...) | kvzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kvzalloc + kvcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kvzalloc + kvcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kvzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kvzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kvzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kvzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kvzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kvzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kvzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kvzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kvzalloc(C1 * C2 * C3, ...) | kvzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kvzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kvzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kvzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kvzalloc(sizeof(THING) * C2, ...) | kvzalloc(sizeof(TYPE) * C2, ...) | kvzalloc(C1 * C2 * C3, ...) | kvzalloc(C1 * C2, ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kvzalloc + kvcalloc ( - (E1) * E2 + E1, E2 , ...) | - kvzalloc + kvcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kvzalloc + kvcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
8175383f |
|
02-Jun-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Ensure t->type == 0 for BTF_KIND_FWD The t->type in BTF_KIND_FWD is not used. It must be 0. This patch ensures that and also adds a test case in test_btf.c Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
b9308ae6 |
|
02-Jun-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Check array t->size This patch ensures array's t->size is 0. The array size is decided by its individual elem's size and the number of elements. Hence, t->size is not used and it must be 0. A test case is added to test_btf.c Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
53c8036c |
|
25-May-2018 |
Arnd Bergmann <arnd@arndb.de> |
bpf: btf: avoid -Wreturn-type warning gcc warns about a noreturn function possibly returning in some configurations: kernel/bpf/btf.c: In function 'env_type_is_resolve_sink': kernel/bpf/btf.c:729:1: error: control reaches end of non-void function [-Werror=return-type] Using BUG() instead of BUG_ON() avoids that warning and otherwise does the exact same thing. Fixes: eb3f595dab40 ("bpf: btf: Validate type reference") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
a2889a4c |
|
23-May-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Avoid variable length array Sparse warning: kernel/bpf/btf.c:1985:34: warning: Variable length array is used. This patch directly uses ARRAY_SIZE(). Fixes: f80442a4cd18 ("bpf: btf: Change how section is supported in btf_header") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
aea2f7b8 |
|
22-May-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Remove unused bits from uapi/linux/btf.h This patch does the followings: 1. Limit BTF_MAX_TYPES and BTF_MAX_NAME_OFFSET to 64k. We can raise it later. 2. Remove the BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID. They are currently encoded at the highest bit of a u32. It is because the current use case does not require supporting parent type (i.e type_id referring to a type in another BTF file). It also does not support referring to a string in ELF. The BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID checks are replaced by BTF_TYPE_ID_CHECK and BTF_STR_OFFSET_CHECK which are defined in btf.c instead of uapi/linux/btf.h. 3. Limit the BTF_INFO_KIND from 5 bits to 4 bits which is enough. There is unused bits headroom if we ever needed it later. 4. The root bit in BTF_INFO is also removed because it is not used in the current use case. 5. Remove BTF_INT_VARARGS since func type is not supported now. The BTF_INT_ENCODING is limited to 4 bits instead of 8 bits. The above can be added back later because the verifier ensures the unused bits are zeros. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
4ef5f574 |
|
22-May-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Check array->index_type Instead of ingoring the array->index_type field. Enforce that it must be a BTF_KIND_INT in size 1/2/4/8 bytes. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
f80442a4 |
|
22-May-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Change how section is supported in btf_header There are currently unused section descriptions in the btf_header. Those sections are here to support future BTF use cases. For example, the func section (func_off) is to support function signature (e.g. the BPF prog function signature). Instead of spelling out all potential sections up-front in the btf_header. This patch makes changes to btf_header such that extending it (e.g. adding a section) is possible later. The unused ones can be removed for now and they can be added back later. This patch: 1. adds a hdr_len to the btf_header. It will allow adding sections (and other info like parent_label and parent_name) later. The check is similar to the existing bpf_attr. If a user passes in a longer hdr_len, the kernel ensures the extra tailing bytes are 0. 2. allows the section order in the BTF object to be different from its sec_off order in btf_header. 3. each sec_off is followed by a sec_len. It must not have gap or overlapping among sections. The string section is ensured to be at the end due to the 4 bytes alignment requirement of the type section. The above changes will allow enough flexibility to add new sections (and other info) to the btf_header later. This patch also removes an unnecessary !err check at the end of btf_parse(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
62dab84c |
|
04-May-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Add struct bpf_btf_info During BPF_OBJ_GET_INFO_BY_FD on a btf_fd, the current bpf_attr's info.info is directly filled with the BTF binary data. It is not extensible. In this case, we want to add BTF ID. This patch adds "struct bpf_btf_info" which has the BTF ID as one of its member. The BTF binary data itself is exposed through the "btf" and "btf_size" members. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
78958fca |
|
04-May-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Introduce BTF ID This patch gives an ID to each loaded BTF. The ID is allocated by the idr like the existing prog-id and map-id. The bpf_put(map->btf) is moved to __bpf_map_put() so that the userspace can stop seeing the BTF ID ASAP when the last BTF refcnt is gone. It also makes BTF accessible from userspace through the 1. new BPF_BTF_GET_FD_BY_ID command. It is limited to CAP_SYS_ADMIN which is inline with the BPF_BTF_LOAD cmd and the existing BPF_[MAP|PROG]_GET_FD_BY_ID cmd. 2. new btf_id (and btf_key_id + btf_value_id) in "struct bpf_map_info" Once the BTF ID handler is accessible from userspace, freeing a BTF object has to go through a rcu period. The BPF_BTF_GET_FD_BY_ID cmd can then be done under a rcu_read_lock() instead of taking spin_lock. [Note: A similar rcu usage can be done to the existing bpf_prog_get_fd_by_id() in a follow up patch] When processing the BPF_BTF_GET_FD_BY_ID cmd, refcount_inc_not_zero() is needed because the BTF object could be already in the rcu dead row . btf_get() is removed since its usage is currently limited to btf.c alone. refcount_inc() is used directly instead. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
82e96972 |
|
04-May-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Avoid WARN_ON when CONFIG_REFCOUNT_FULL=y If CONFIG_REFCOUNT_FULL=y, refcount_inc() WARN when refcount is 0. When creating a new btf, the initial btf->refcnt is 0 and triggered the following: [ 34.855452] refcount_t: increment on 0; use-after-free. [ 34.856252] WARNING: CPU: 6 PID: 1857 at lib/refcount.c:153 refcount_inc+0x26/0x30 .... [ 34.868809] Call Trace: [ 34.869168] btf_new_fd+0x1af6/0x24d0 [ 34.869645] ? btf_type_seq_show+0x200/0x200 [ 34.870212] ? lock_acquire+0x3b0/0x3b0 [ 34.870726] ? security_capable+0x54/0x90 [ 34.871247] __x64_sys_bpf+0x1b2/0x310 [ 34.871761] ? __ia32_sys_bpf+0x310/0x310 [ 34.872285] ? bad_area_access_error+0x310/0x310 [ 34.872894] do_syscall_64+0x95/0x3f0 This patch uses refcount_set() instead. Reported-by: Yonghong Song <yhs@fb.com> Tested-by: Yonghong Song <yhs@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
fbcf93eb |
|
21-Apr-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Clean up btf.h in uapi This patch cleans up btf.h in uapi: 1) Rename "name" to "name_off" to better reflect it is an offset to the string section instead of a char array. 2) Remove unused value BTF_FLAGS_COMPR and BTF_MAGIC_SWAP Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
60197cfb |
|
18-Apr-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd This patch adds BPF_OBJ_GET_INFO_BY_FD support to BTF fd. The original BTF data, which was used to create the BTF fd during the earlier BPF_BTF_LOAD call, will be returned. The userspace is expected to allocate buffer to info.info and the buffer size is set to info.info_len before calling BPF_OBJ_GET_INFO_BY_FD. The original BTF data is copied to the userspace buffer (info.info). Only upto the user's specified info.info_len will be copied. The original BTF data size is set to info.info_len. The userspace needs to check if it is bigger than its allocated buffer size. If it is, the userspace should realloc with the kernel-returned info.info_len and call the BPF_OBJ_GET_INFO_BY_FD again. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
f56a653c |
|
18-Apr-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Add BPF_BTF_LOAD command This patch adds a BPF_BTF_LOAD command which 1) loads and verifies the BTF (implemented in earlier patches) 2) returns a BTF fd to userspace. In the next patch, the BTF fd can be specified during BPF_MAP_CREATE. It currently limits to CAP_SYS_ADMIN. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
b00b8dae |
|
18-Apr-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Add pretty print capability for data with BTF type info This patch adds pretty print capability for data with BTF type info. The current usage is to allow pretty print for a BPF map. The next few patches will allow a read() on a pinned map with BTF type info for its key and value. This patch uses the seq_printf() infra. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
179cde8c |
|
18-Apr-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Check members of struct/union This patch checks a few things of struct's members: 1) It has a valid size (e.g. a "const void" is invalid) 2) A member's size (+ its member's offset) does not exceed the containing struct's size. 3) The member's offset satisfies the alignment requirement The above can only be done after the needs_resolve member's type is resolved. Hence, the above is done together in btf_struct_resolve(). Each possible member's type (e.g. int, enum, modifier...) implements the check_member() ops which will be called from btf_struct_resolve(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
eb3f595d |
|
18-Apr-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Validate type reference After collecting all btf_type in the first pass in an earlier patch, the second pass (in this patch) can validate the reference types (e.g. the referring type does exist and it does not refer to itself). While checking the reference type, it also gathers other information (e.g. the size of an array). This info will be useful in checking the struct's members in a later patch. They will also be useful in doing pretty print later. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
#
69b693f0 |
|
18-Apr-2018 |
Martin KaFai Lau <kafai@fb.com> |
bpf: btf: Introduce BPF Type Format (BTF) This patch introduces BPF type Format (BTF). BTF (BPF Type Format) is the meta data format which describes the data types of BPF program/map. Hence, it basically focus on the C programming language which the modern BPF is primary using. The first use case is to provide a generic pretty print capability for a BPF map. BTF has its root from CTF (Compact C-Type format). To simplify the handling of BTF data, BTF removes the differences between small and big type/struct-member. Hence, BTF consistently uses u32 instead of supporting both "one u16" and "two u32 (+padding)" in describing type and struct-member. It also raises the number of types (and functions) limit from 0x7fff to 0x7fffffff. Due to the above changes, the format is not compatible to CTF. Hence, BTF starts with a new BTF_MAGIC and version number. This patch does the first verification pass to the BTF. The first pass checks: 1. meta-data size (e.g. It does not go beyond the total btf's size) 2. name_offset is valid 3. Each BTF_KIND (e.g. int, enum, struct....) does its own check of its meta-data. Some other checks, like checking a struct's member is referring to a valid type, can only be done in the second pass. The second verification pass will be implemented in the next patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|