Cross Reference: /linux-master/arch/arm/net/bpf_jit

History log of /linux-master/arch/arm/net/bpf_jit_32.c
Revision	Date	Author	Comments
# 71086041	07-Sep-2023	Puranjay Mohan <puranjay12@gmail.com>	arm32, bpf: add support for 64 bit division instruction ARM32 doesn't have instructions to do 64-bit/64-bit divisions. So, to implement the following instructions: BPF_ALU64 \| BPF_DIV BPF_ALU64 \| BPF_MOD BPF_ALU64 \| BPF_SDIV BPF_ALU64 \| BPF_SMOD We implement the above instructions by doing function calls to div64_u64() and div64_u64_rem() for unsigned division/mod and calls to div64_s64() for signed division/mod. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-7-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
# 5097faa5	07-Sep-2023	Puranjay Mohan <puranjay12@gmail.com>	arm32, bpf: add support for 32-bit signed division The cpuv4 added a new BPF_SDIV instruction that does signed division. The encoding is similar to BPF_DIV but BPF_SDIV sets offset=1. ARM32 already supports 32-bit BPF_DIV which can be easily extended to support BPF_SDIV as ARM32 has the SDIV instruction. When the CPU is not ARM-v7, we implement that SDIV/SMOD with the function call similar to the implementation of DIV/MOD. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-6-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
# 1cfb7eae	07-Sep-2023	Puranjay Mohan <puranjay12@gmail.com>	arm32, bpf: add support for unconditional bswap instruction The cpuv4 added a new unconditional bswap instruction with following behaviour: BPF_ALU64 \| BPF_TO_LE \| BPF_END with imm = 16/32/64 means: dst = bswap16(dst) dst = bswap32(dst) dst = bswap64(dst) As we already support converting to big-endian from little-endian we can use the same for unconditional bswap. just treat the unconditional scenario the same as big-endian conversion. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-5-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
# fc832653	07-Sep-2023	Puranjay Mohan <puranjay12@gmail.com>	arm32, bpf: add support for sign-extension mov instruction The cpuv4 added a new BPF_MOVSX instruction that sign extends the src before moving it to the destination. BPF_ALU \| BPF_MOVSX sign extends 8-bit and 16-bit operands into 32-bit operands, and zeroes the remaining upper 32 bits. BPF_ALU64 \| BPF_MOVSX sign extends 8-bit, 16-bit, and 32-bit operands into 64-bit operands. The offset field of the instruction is used to tell the number of bit to use for sign-extension. BPF_MOV and BPF_MOVSX have the same code but the former sets offset to 0 and the later one sets the offset to 8, 16 or 32 The behaviour of this instruction is dst = (s8,s16,s32)src On ARM32 the implementation uses LSH and ARSH to extend the 8/16 bits to a 32-bit register and then it is sign extended to the upper 32-bit register using ARSH. For 32-bit we just move it to the destination register and use ARSH to extend it to the upper 32-bit register. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-4-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
# f9e6981b	07-Sep-2023	Puranjay Mohan <puranjay12@gmail.com>	arm32, bpf: add support for sign-extension load instruction The cpuv4 added the support of an instruction that is similar to load but also sign-extends the result after the load. BPF_MEMSX \| <size> \| BPF_LDX means dst = (signed size ) (src + offset) here <size> can be one of BPF_B, BPF_H, BPF_W. ARM32 has instructions to load a byte or a half word with sign extension into a 32bit register. As the JIT uses two 32 bit registers to simulate a 64-bit BPF register, an extra instruction is emitted to sign-extent the result up to the second register. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-3-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
# 471f3d4e	07-Sep-2023	Puranjay Mohan <puranjay12@gmail.com>	arm32, bpf: add support for 32-bit offset jmp instruction The cpuv4 adds unconditional jump with 32-bit offset where the immediate field of the instruction is to be used to calculate the jump offset. BPF_JA \| BPF_K \| BPF_JMP32 => gotol +imm => PC += imm. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230907230550.1417590-2-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
# fc386ba7	10-Jun-2022	YueHaibing <yuehaibing@huawei.com>	bpf, arm: Remove unused function emit_a32_alu_r() Since commit b18bea2a45b1 ("ARM: net: bpf: improve 64-bit ALU implementation") this is unused anymore, so can remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220611040904.8976-1-yuehaibing@huawei.com
# d8dc09a4	18-Mar-2022	Julia Lawall <Julia.Lawall@inria.fr>	bpf, arm: Fix various typos in comments Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220318103729.157574-9-Julia.Lawall@inria.fr
# 06edc59c	19-Nov-2021	Christoph Hellwig <hch@lst.de>	bpf, docs: Prune all references to "internal BPF" The eBPF name has completely taken over from eBPF in general usage for the actual eBPF representation, or BPF for any general in-kernel use. Prune all remaining references to "internal BPF". Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211119163215.971383-4-hch@lst.de
# ebf7f6f0	04-Nov-2021	Tiezhu Yang <yangtiezhu@loongson.cn>	bpf: Change value of MAX_TAIL_CALL_CNT from 32 to 33 In the current code, the actual max tail call count is 33 which is greater than MAX_TAIL_CALL_CNT (defined as 32). The actual limit is not consistent with the meaning of MAX_TAIL_CALL_CNT and thus confusing at first glance. We can see the historical evolution from commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other bpf programs") and commit f9dabe016b63 ("bpf: Undo off-by-one in interpreter tail call count limit"). In order to avoid changing existing behavior, the actual limit is 33 now, this is reasonable. After commit 874be05f525e ("bpf, tests: Add tail call test suite"), we can see there exists failed testcase. On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set: # echo 0 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf # dmesg \| grep -w FAIL Tail call error path, max count reached jited:0 ret 34 != 33 FAIL On some archs: # echo 1 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf # dmesg \| grep -w FAIL Tail call error path, max count reached jited:1 ret 34 != 33 FAIL Although the above failed testcase has been fixed in commit 18935a72eb25 ("bpf/tests: Fix error in tail call limit tests"), it would still be good to change the value of MAX_TAIL_CALL_CNT from 32 to 33 to make the code more readable. The 32-bit x86 JIT was using a limit of 32, just fix the wrong comments and limit to 33 tail calls as the constant MAX_TAIL_CALL_CNT updated. For the mips64 JIT, use "ori" instead of "addiu" as suggested by Johan Almbladh. For the riscv JIT, use RV_REG_TCC directly to save one register move as suggested by Björn Töpel. For the other implementations, no function changes, it does not change the current limit 33, the new value of MAX_TAIL_CALL_CNT can reflect the actual max tail call count, the related tail call testcases in test_bpf module and selftests can work well for the interpreter and the JIT. Here are the test results on x86_64: # uname -m x86_64 # echo 0 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf test_suite=test_tail_calls # dmesg \| tail -1 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed] # rmmod test_bpf # echo 1 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf test_suite=test_tail_calls # dmesg \| tail -1 test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed] # rmmod test_bpf # ./test_progs -t tailcalls #142 tailcalls:OK Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Björn Töpel <bjorn@kernel.org> Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/bpf/1636075800-3264-1-git-send-email-yangtiezhu@loongson.cn
# 90982e13	06-Oct-2021	Daniel Borkmann <daniel@iogearbox.net>	bpf, arm: Remove dummy bpf_jit_compile stub The BPF core defines a __weak bpf_jit_compile() dummy function already which should only be overridden by JITs if they actually implement a legacy cBPF JIT. Given arm implements an eBPF JIT, this stub is not needed. Now that MIPS cBPF JIT is finally gone, the only JIT left that is still implementing bpf_jit_compile() is the sparc32 one. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
# 79e3445b	28-Sep-2021	Johan Almbladh <johan.almbladh@anyfinetworks.com>	bpf, arm: Fix register clobbering in div/mod implementation On ARM CPUs that lack div/mod instructions, ALU32 BPF_DIV and BPF_MOD are implemented using a call to a helper function. Before, the emitted code for those function calls failed to preserve caller-saved ARM registers. Since some of those registers happen to be mapped to BPF registers, it resulted in eBPF register values being overwritten. This patch emits code to push and pop the remaining caller-saved ARM registers r2-r3 into the stack during the div/mod function call. ARM registers r0-r1 are used as arguments and return value, and those were already saved and restored correctly. Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# f5e81d11	13-Jul-2021	Daniel Borkmann <daniel@iogearbox.net>	bpf: Introduce BPF nospec instruction for mitigating Spectre v4 In case of JITs, each of the JIT backends compiles the BPF nospec instruction /either/ to a machine instruction which emits a speculation barrier /or/ to /no/ machine instruction in case the underlying architecture is not affected by Speculative Store Bypass or has different mitigations in place already. This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence' instruction for mitigation. In case of arm64, we rely on the firmware mitigation as controlled via the ssbd kernel parameter. Whenever the mitigation is enabled, it works for all of the kernel code with no need to provide any additional instructions here (hence only comment in arm64 JIT). Other archs can follow as needed. The BPF nospec instruction is specifically targeting Spectre v4 since i) we don't use a serialization barrier for the Spectre v1 case, and ii) mitigation instructions for v1 and v4 might be different on some archs. The BPF nospec is required for a future commit, where the BPF verifier does annotate intermediate BPF programs with speculation barriers. Co-developed-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Benedict Schlueter <benedict.schlueter@rub.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Benedict Schlueter <benedict.schlueter@rub.de> Acked-by: Alexei Starovoitov <ast@kernel.org>
# 91c960b0	14-Jan-2021	Brendan Jackman <jackmanb@google.com>	bpf: Rename BPF_XADD and prepare to encode other atomics in .imm A subsequent patch will add additional atomic operations. These new operations will use the same opcode field as the existing XADD, with the immediate discriminating different operations. In preparation, rename the instruction mode BPF_ATOMIC and start calling the zero immediate BPF_ADD. This is possible (doesn't break existing valid BPF progs) because the immediate field is currently reserved MBZ and BPF_ADD is zero. All uses are removed from the tree but the BPF_XADD definition is kept around to avoid breaking builds for people including kernel headers. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-5-jackmanb@google.com
# c648c9c7	30-Apr-2020	Luke Nelson <lukenels@cs.washington.edu>	bpf, arm: Optimize ALU ARSH K using asr immediate instruction This patch adds an optimization that uses the asr immediate instruction for BPF_ALU BPF_ARSH BPF_K, rather than loading the immediate to a temporary register. This is similar to existing code for handling BPF_ALU BPF_{LSH,RSH} BPF_K. This optimization saves two instructions and is more consistent with LSH and RSH. Example of the code generated for BPF_ALU32_IMM(BPF_ARSH, BPF_REG_0, 5) before the optimization: 2c: mov r8, #5 30: mov r9, #0 34: asr r0, r0, r8 and after optimization: 2c: asr r0, r0, #5 Tested on QEMU using lib/test_bpf and test_verifier. Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200501020210.32294-3-luke.r.nels@gmail.com
# cf48db69	30-Apr-2020	Luke Nelson <lukenels@cs.washington.edu>	bpf, arm: Optimize ALU64 ARSH X using orrpl conditional instruction This patch optimizes the code generated by emit_a32_arsh_r64, which handles the BPF_ALU64 BPF_ARSH BPF_X instruction. The original code uses a conditional B followed by an unconditional ORR. The optimization saves one instruction by removing the B instruction and using a conditional ORR (with an inverted condition). Example of the code generated for BPF_ALU64_REG(BPF_ARSH, BPF_REG_0, BPF_REG_1), before optimization: 34: rsb ip, r2, #32 38: subs r9, r2, #32 3c: lsr lr, r0, r2 40: orr lr, lr, r1, lsl ip 44: bmi 0x4c 48: orr lr, lr, r1, asr r9 4c: asr ip, r1, r2 50: mov r0, lr 54: mov r1, ip and after optimization: 34: rsb ip, r2, #32 38: subs r9, r2, #32 3c: lsr lr, r0, r2 40: orr lr, lr, r1, lsl ip 44: orrpl lr, lr, r1, asr r9 48: asr ip, r1, r2 4c: mov r0, lr 50: mov r1, ip Tested on QEMU using lib/test_bpf and test_verifier. Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200501020210.32294-2-luke.r.nels@gmail.com
# 4178417c	09-Apr-2020	Luke Nelson <lukenels@cs.washington.edu>	arm, bpf: Fix offset overflow for BPF_MEM BPF_DW This patch fixes an incorrect check in how immediate memory offsets are computed for BPF_DW on arm. For BPF_LDX/ST/STX + BPF_DW, the 32-bit arm JIT breaks down an 8-byte access into two separate 4-byte accesses using off+0 and off+4. If off fits in imm12, the JIT emits a ldr/str instruction with the immediate and avoids the use of a temporary register. While the current check off <= 0xfff ensures that the first immediate off+0 doesn't overflow imm12, it's not sufficient for the second immediate off+4, which may cause the second access of BPF_DW to read/write the wrong address. This patch fixes the problem by changing the check to off <= 0xfff - 4 for BPF_DW, ensuring off+4 will never overflow. A side effect of simplifying the check is that it now allows using negative immediate offsets in ldr/str. This means that small negative offsets can also avoid the use of a temporary register. This patch introduces no new failures in test_verifier or test_bpf.c. Fixes: c5eae692571d6 ("ARM: net: bpf: improve 64-bit store implementation") Fixes: ec19e02b343db ("ARM: net: bpf: fix LDX instructions") Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200409221752.28448-1-luke.r.nels@gmail.com
# bb9562cf	08-Apr-2020	Luke Nelson <lukenels@cs.washington.edu>	arm, bpf: Fix bugs with ALU64 {RSH, ARSH} BPF_K shift by 0 The current arm BPF JIT does not correctly compile RSH or ARSH when the immediate shift amount is 0. This causes the "rsh64 by 0 imm" and "arsh64 by 0 imm" BPF selftests to hang the kernel by reaching an instruction the verifier determines to be unreachable. The root cause is in how immediate right shifts are encoded on arm. For LSR and ASR (logical and arithmetic right shift), a bit-pattern of 00000 in the immediate encodes a shift amount of 32. When the BPF immediate is 0, the generated code shifts by 32 instead of the expected behavior (a no-op). This patch fixes the bugs by adding an additional check if the BPF immediate is 0. After the change, the above mentioned BPF selftests pass. Fixes: 39c13c204bb11 ("arm: eBPF JIT compiler") Co-developed-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200408181229.10909-1-luke.r.nels@gmail.com
# c4533128	09-Dec-2019	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: Improve prologue code sequence Improve the prologue code sequence to be able to take advantage of 64-bit stores, changing the code from: push {r4, r5, r6, r7, r8, r9, fp, lr} mov fp, sp sub ip, sp, #80 ; 0x50 sub sp, sp, #600 ; 0x258 str ip, [fp, #-100] ; 0xffffff9c mov r6, #0 str r6, [fp, #-96] ; 0xffffffa0 mov r4, #0 mov r3, r4 mov r2, r0 str r4, [fp, #-104] ; 0xffffff98 str r4, [fp, #-108] ; 0xffffff94 to the tighter: push {r4, r5, r6, r7, r8, r9, fp, lr} mov fp, sp mov r3, #0 sub r2, sp, #80 ; 0x50 sub sp, sp, #600 ; 0x258 strd r2, [fp, #-100] ; 0xffffff9c mov r2, #0 strd r2, [fp, #-108] ; 0xffffff94 mov r2, r0 resulting in a saving of three instructions. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/E1ieH2g-0004ih-Rb@rmk-PC.armlinux.org.uk
# b886d83c	01-Jun-2019	Thomas Gleixner <tglx@linutronix.de>	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation version 2 of the license extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 315 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190115.503150771@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# 163541e6	24-May-2019	Jiong Wang <jiong.wang@netronome.com>	arm: bpf: eliminate zero extension code-gen Cc: Shubham Bansal <illusionist.neo@gmail.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
# b85062ac	25-Jan-2019	Jiong Wang <jiong.wang@netronome.com>	arm: bpf: implement jitting of JMP32 This patch implements code-gen for new JMP32 instructions on arm. For JSET, "ands" (AND with flags updated) is used, so corresponding encoding helper is added. Cc: Shubham Bansal <illusionist.neo@gmail.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
# b18bea2a	12-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: improve 64-bit ALU implementation Improbe the 64-bit ALU implementation from: movw r8, #65532 movt r8, #65535 movw r9, #65535 movt r9, #65535 ldr r7, [fp, #-44] adds r7, r7, r8 str r7, [fp, #-44] ldr r7, [fp, #-40] adc r7, r7, r9 str r7, [fp, #-40] to: movw r8, #65532 movt r8, #65535 movw r9, #65535 movt r9, #65535 ldrd r6, [fp, #-44] adds r6, r6, r8 adc r7, r7, r9 strd r6, [fp, #-44] Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# c5eae692	12-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: improve 64-bit store implementation Improve the 64-bit store implementation from: ldr r6, [fp, #-8] str r8, [r6] ldr r6, [fp, #-8] mov r7, #4 add r7, r6, r7 str r9, [r7] to: ldr r6, [fp, #-8] str r8, [r6] str r9, [r6, #4] We leave the store as two separate STR instructions rather than using STRD as the store may not be aligned, and STR can handle misalignment. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# 077513b8	12-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: improve 64-bit sign-extended immediate load Improve the 64-bit sign-extended immediate from: mov r6, #1 str r6, [fp, #-52] ; 0xffffffcc mov r6, #0 str r6, [fp, #-48] ; 0xffffffd0 to: mov r6, #1 mov r7, #0 strd r6, [fp, #-52] ; 0xffffffcc Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# f9ff5018	12-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: improve 64-bit load immediate implementation Rather than writing each 32-bit half of the 64-bit immediate value separately when the register is on the stack: movw r6, #45056 ; 0xb000 movt r6, #60979 ; 0xee33 str r6, [fp, #-44] ; 0xffffffd4 mov r6, #0 str r6, [fp, #-40] ; 0xffffffd8 arrange to use the double-word store when available instead: movw r6, #45056 ; 0xb000 movt r6, #60979 ; 0xee33 mov r7, #0 strd r6, [fp, #-44] ; 0xffffffd4 Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# 8c9602d3	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: use double-word load/stores where available Use double-word load and stores where support for this instruction is supported by the CPU architecture. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# bef8968d	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: always use odd/even register pair Always use an odd/even register pair for our 64-bit registers, so that we're able to use the double-word load/store instructions in the future. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# b5045229	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: avoid reloading 'array' Rearranging the order of the initial tail call code a little allows is to avoid reloading the 'array' pointer. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# aaffd2f5	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: avoid reloading 'index' Avoid reloading 'index' after we have validated it - it remains in tmp2[1] up to the point that we begin the code to index the pointer array, so with a little rearrangement of the registers, we can use the already loaded value. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# 2b6958ef	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: use ldr instructions with shifted rm register Rather than pre-shifting the rm register for the ldr in the tail call, shift it in the load instruction. This eliminates one unnecessary instruction. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# 828e2b90	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: use immediate forms of instructions where possible Rather than moving constants to a register and then using them in a subsequent instruction, use them directly in the desired instruction cutting out the "middle" register. This removes two instructions from the tail call code path. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# 1ca3b17b	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: imm12 constant conversion Provide a version of the imm8m() function that the compiler can optimise when used with a constant expression. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# 96cced4e	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: access eBPF scratch space using ARM FP register Access the eBPF scratch space using the frame pointer rather than our stack pointer, as the offsets from the ARM frame pointer are constant across all eBPF programs. Since we no longer reference the scratch space registers from the stack pointer, this simplifies emit_push_r64() as it no longer needs to know how many words are pushed onto the stack. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# a6eccac5	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: 64-bit accessor functions for BPF registers Provide a couple of 64-bit register accessors, and use them where appropriate Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# 7a987025	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: provide accessor functions for BPF registers Many of the code paths need to have knowledge about whether a register is stacked or in a CPU register. Move this decision making to a pair of helper functions instead of having it scattered throughout the code. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# 47b9c3bf	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: remove is_on_stack() and sstk/dstk The decision about whether a BPF register is on the stack or in a CPU register is detected at the top BPF insn processing level, and then percolated throughout the remainder of the code. Since we now use negative register values to represent stacked registers, we can detect where a BPF register is stored without restoring to carrying this additional metadata through all code paths. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# 1c35ba12	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: use negative numbers for stacked registers Use negative numbers for eBPF registers that live on the stack. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# a8ef95a0	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: provide load/store ops with negative immediates Provide a set of load/store opcode generators that work with negative immediates as well as positive ones. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
# d449ceb1	11-Jul-2018	Russell King <rmk+kernel@armlinux.org.uk>	ARM: net: bpf: enumerate the JIT scratch stack layout Enumerate the contents of the JIT scratch stack layout used for storing some of the JITs 64-bit registers, tail call counter and AX register. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>