History log of /linux-master/drivers/net/ethernet/netronome/nfp/bpf/jit.c
Revision Date Author Comments
# af35f95a 22-Jul-2022 Slark Xiao <slark_xiao@163.com>

nfp: bpf: Fix typo 'the the' in comment

Replace 'the the' with 'the' in the comment.

Signed-off-by: Slark Xiao <slark_xiao@163.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 91c960b0 14-Jan-2021 Brendan Jackman <jackmanb@google.com>

bpf: Rename BPF_XADD and prepare to encode other atomics in .imm

A subsequent patch will add additional atomic operations. These new
operations will use the same opcode field as the existing XADD, with
the immediate discriminating different operations.

In preparation, rename the instruction mode BPF_ATOMIC and start
calling the zero immediate BPF_ADD.

This is possible (doesn't break existing valid BPF progs) because the
immediate field is currently reserved MBZ and BPF_ADD is zero.

All uses are removed from the tree but the BPF_XADD definition is
kept around to avoid breaking builds for people including kernel
headers.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Björn Töpel <bjorn.topel@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-5-jackmanb@google.com


# c593642c 09-Dec-2019 Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>

treewide: Use sizeof_field() macro

Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

if [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net


# 155283c3 06-Oct-2019 Colin Ian King <colin.king@canonical.com>

nfp: bpf: make array exp_mask static, makes object smaller

Don't populate the array exp_mask on the stack but instead make it
static. Makes the object code smaller by 224 bytes.

Before:
text data bss dec hex filename
77832 2290 0 80122 138fa ethernet/netronome/nfp/bpf/jit.o

After:
text data bss dec hex filename
77544 2354 0 79898 1381a ethernet/netronome/nfp/bpf/jit.o

(gcc version 9.2.1, amd64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>


# 86c28b2d 23-Aug-2019 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: fix latency bug when updating stack index register

NFP is using Local Memory to model stack. LM_addr could be used as base of
a 16 32-bit word region of Local Memory. Then, if the stack offset is
beyond the current region, the local index needs to be updated. The update
needs at least three cycles to take effect, therefore the sequence normally
looks like:

local_csr_wr[ActLMAddr3, gprB_5]
nop
nop
nop

If the local index switch happens on a narrow loads, then the instruction
preparing value to zero high 32-bit of the destination register could be
counted as one cycle, the sequence then could be something like:

local_csr_wr[ActLMAddr3, gprB_5]
nop
nop
immed[gprB_5, 0]

However, we have zero extension optimization that zeroing high 32-bit could
be eliminated, therefore above IMMED insn won't be available for which case
the first sequence needs to be generated.

Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen")
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 0b4de1ff 24-May-2019 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: eliminate zero extension code-gen

This patch eliminate zero extension code-gen for instructions including
both alu and load/store. The only exception is for ctx load, because
offload target doesn't go through host ctx convert logic so we do
customized load and ignores zext flag set by verifier.

Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 69e168eb 07-May-2019 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: fix static check error through tightening shift amount adjustment

NFP shift instruction has something special. If shift direction is left
then shift amount of 1 to 31 is specified as 32 minus the amount to shift.

But no need to do this for indirect shift which has shift amount be 0. Even
after we do this subtraction, shift amount 0 will be turned into 32 which
will eventually be encoded the same as 0 because only low 5 bits are
encoded, but shift amount be 32 will fail the FIELD_PREP check done later
on shift mask (0x1f), due to 32 is out of mask range. Such error has been
observed when compiling nfp/bpf/jit.c using gcc 8.3 + O3.

This issue has started when indirect shift support added after which the
incoming shift amount to __emit_shf could be 0, therefore it is at that
time shift amount adjustment inside __emit_shf should have been tightened.

Fixes: 991f5b3651f6 ("nfp: bpf: support logic indirect shifts (BPF_[L|R]SH | BPF_X)")
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reported-by: Pablo Cascón <pablo.cascon@netronome.com
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# f036ebd9 22-Feb-2019 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: fix ALU32 high bits clearance bug

NFP BPF JIT compiler is doing a couple of small optimizations when jitting
ALU imm instructions, some of these optimizations could save code-gen, for
example:

A & -1 = A
A | 0 = A
A ^ 0 = A

However, for ALU32, high 32-bit of the 64-bit register should still be
cleared according to ISA semantics.

Fixes: cd7df56ed3e6 ("nfp: add BPF to NFP code translator")
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 71c19024 22-Feb-2019 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K

The intended optimization should be A ^ 0 = A, not A ^ -1 = A.

Fixes: cd7df56ed3e6 ("nfp: add BPF to NFP code translator")
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# ac7a1717 01-Feb-2019 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: complete ALU32 logic shift supports

The following ALU32 logic shift supports are missing:

BPF_ALU | BPF_LSH | BPF_X
BPF_ALU | BPF_RSH | BPF_X
BPF_ALU | BPF_RSH | BPF_K

For BPF_RSH | BPF_K, it could be implemented using NFP direct shift
instruction. For the other BPF_X shifts, NFP indirect shifts sequences need
to be used.

Separate code-gen hook is assigned to each instruction to make the
implementation clear.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# db0a4b3b 01-Feb-2019 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: correct the behavior for shifts by zero

Shifts by zero do nothing, and should be treated as nops.

Even though compiler is not supposed to generate such instructions and
manual written assembly is unlikely to have them, but they are legal
instructions and have defined behavior.

This patch correct existing shifts code-gen to make sure they do nothing
when shift amount is zero except when the instruction is ALU32 for which
high bits need to be cleared.

For shift amount bigger than type size, already, NFP JIT back-end errors
out for immediate shift and only low 5 bits will be taken into account for
indirect shift which is the same as x86.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 46144839 25-Jan-2019 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: implement jitting of JMP32

This patch implements code-gen for new JMP32 instructions on NFP.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# e2fc6114 22-Jan-2019 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: save original program length

Instead of passing env->prog->len around, and trying to adjust
for optimized out instructions just save the initial number
of instructions in struct nfp_prog.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 91a87a58 22-Jan-2019 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: split up the skip flag

We fail program loading if jump lands on a skipped instruction.
This is for historical reasons, it used to be that we only skipped
instructions optimized out based on prior context, and therefore
the optimization would be buggy if we jumped directly to such
instruction (because the context would be skipped by the jump).

There are cases where instructions can be skipped without any
context, for example there is no point in generating code for:

r0 |= 0

We will also soon support dropping dead code, so make the skip
logic differentiate between "optimized with preceding context"
vs other skip types.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# e90287f3 22-Jan-2019 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: don't use instruction number for jump target

Instruction number is meaningless at code gen phase. The target
of the instruction is overwritten by nfp_fixup_branches(). The
convention is to put the raw offset in target address as a place
holder. See cmp_* functions.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 4987eacc 19-Dec-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: optimize codegen for JSET with a constant

The top word of the constant can only have bits set if sign
extension set it to all-1, therefore we don't really have to
mask the top half of the register. We can just OR it into
the result as is.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 6e774845 19-Dec-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: remove the trivial JSET optimization

The verifier will now understand the JSET instruction, so don't
mark the dead branch in the JIT as noop. We won't generate any
code, anyway.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 84708c13 05-Dec-2018 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: implement jitting of BPF_ALU | BPF_ARSH | BPF_*

BPF_X support needs indirect shift mode, please see code comments for
details.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 96de2506 11-Oct-2018 Jakub Kicinski <kuba@kernel.org>

nfp: replace long license headers with SPDX

Replace the repeated license text with SDPX identifiers.
While at it bump the Copyright dates for files we touched
this year.

Signed-off-by: Edwin Peer <edwin.peer@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Nic Viljoen <nick.viljoen@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7ff0ccde 06-Oct-2018 Quentin Monnet <quentin@isovalent.com>

nfp: bpf: support pointers to other stack frames for BPF-to-BPF calls

Mark instructions that use pointers to areas in the stack outside of the
current stack frame, and process them accordingly in mem_op_stack().
This way, we also support BPF-to-BPF calls where the caller passes a
pointer to data in its own stack frame to the callee (typically, when
the caller passes an address to one of its local variables located in
the stack, as an argument).

Thanks to Jakub and Jiong for figuring out how to deal with this case,
I just had to turn their email discussion into this patch.

Suggested-by: Jiong Wang <jiong.wang@netronome.com>
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 44549623 06-Oct-2018 Quentin Monnet <quentin@isovalent.com>

nfp: bpf: optimise save/restore for R6~R9 based on register usage

When pre-processing the instructions, it is trivial to detect what
subprograms are using R6, R7, R8 or R9 as destination registers. If a
subprogram uses none of those, then we do not need to jump to the
subroutines dedicated to saving and restoring callee-saved registers in
its prologue and epilogue.

This patch introduces detection of callee-saved registers in subprograms
and prevents the JIT from adding calls to those subroutines whenever we
can: we save some instructions in the translated program, and some time
at runtime on BPF-to-BPF calls and returns.

If no subprogram needs to save those registers, we can avoid appending
the subroutines at the end of the program.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 2178f3f0 06-Oct-2018 Quentin Monnet <quentin@isovalent.com>

nfp: bpf: fix return address from register-saving subroutine to callee

On performing a BPF-to-BPF call, we first jump to a subroutine that
pushes callee-saved registers (R6~R9) to the stack, and from there we
goes to the start of the callee next. In order to do so, the caller must
pass to the subroutine the address of the NFP instruction to jump to at
the end of that subroutine. This cannot be reliably implemented when
translated the caller, as we do not always know the start offset of the
callee yet.

This patch implement the required fixup step for passing the start
offset in the callee via the register used by the subroutine to hold its
return address.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# bdf4c66f 06-Oct-2018 Quentin Monnet <quentin@isovalent.com>

nfp: bpf: update fixup function for BPF-to-BPF calls support

Relocation for targets of BPF-to-BPF calls are required at the end of
translation. Update the nfp_fixup_branches() function in that regard.

When checking that the last instruction of each bloc is a branch, we
must account for the length of the instructions required to pop the
return address from the stack.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 389f263b 06-Oct-2018 Quentin Monnet <quentin@isovalent.com>

nfp: bpf: add main logics for BPF-to-BPF calls support in nfp driver

This is the main patch for the logics of BPF-to-BPF calls in the nfp
driver.

The functions called on BPF_JUMP | BPF_CALL and BPF_JUMP | BPF_EXIT were
used to call helpers and exit from the program, respectively; make them
usable for calling into, or returning from, a BPF subprogram as well.

For all calls, push the return address as well as the callee-saved
registers (R6 to R9) to the stack, and pop them upon returning from the
calls. In order to limit the overhead in terms of instruction number,
this is done through dedicated subroutines. Jumping to the callee
actually consists in jumping to the subroutine, that "returns" to the
callee: this will require some fixup for passing the address in a later
patch. Similarly, returning consists in jumping to the subroutine, which
pops registers and then return directly to the caller (but no fixup is
needed here).

Return to the caller is performed with the RTN instruction newly added
to the JIT.

For the few steps where we need to know what subprogram an instruction
belongs to, the struct nfp_insn_meta is extended with a new subprog_idx
field.

Note that checks on the available stack size, to take into account the
additional requirements associated to BPF-to-BPF calls (storing R6-R9
and return addresses), are added in a later patch.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# e3b49dc6 06-Oct-2018 Quentin Monnet <quentin@isovalent.com>

nfp: bpf: account for BPF-to-BPF calls when preparing nfp JIT

Similarly to "exit" or "helper call" instructions, BPF-to-BPF calls will
require additional processing before translation starts, in order to
record and mark jump destinations.

We also mark the instructions where each subprogram begins. This will be
used in a following commit to determine where to add prologues for
subprograms.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 1a7e62e6 06-Oct-2018 Quentin Monnet <quentin@isovalent.com>

nfp: bpf: rename nfp_prog->stack_depth as nfp_prog->stack_frame_depth

In preparation for support for BPF to BPF calls in offloaded programs,
rename the "stack_depth" field of the struct nfp_prog as
"stack_frame_depth". This is to make it clear that the field refers to
the maximum size of the current stack frame (as opposed to the maximum
size of the whole stack memory).

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 0c261593 03-Aug-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: xdp_adjust_tail support

Add support for adjust_tail. There are no FW changes needed but add
a FW capability just in case there would be any issue with previously
released FW, or we will have to change the ABI in the future.

The helper is trivial and shouldn't be used too often so just inline
the body of the function. We add the delta to locally maintained
packet length register and check for overflow, since add of negative
value must overflow if result is positive. Note that if delta of 0
would be allowed in the kernel this trick stops working and we need
one more instruction to compare lengths before and after the change.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# ab01f4ac 25-Jul-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: remember maps by ID

Record perf maps by map ID, not raw kernel pointer. This helps
with debug messages, because printing pointers to logs is frowned
upon, and makes debug easier for the users, as map ID is something
they should be more familiar with. Note that perf maps are offload
neutral, therefore IDs won't be orphaned.

While at it use a rate limited print helper for the error message.

Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 9fb410a8 06-Jul-2018 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: migrate to advanced reciprocal divide in reciprocal_div.h

As we are doing JIT, we would want to use the advanced version of the
reciprocal divide (reciprocal_value_adv) to trade performance with host.

We could reduce the required ALU instructions from 4 to 2 or 1.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 2a952b03 06-Jul-2018 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: support u32 divide using reciprocal_div.h

NFP doesn't have integer divide instruction, this patch use reciprocal
algorithm (the basic one, reciprocal_div) to emulate it.

For each u32 divide, we would need 11 instructions to finish the operation.

7 (for multiplication) + 4 (various ALUs) = 11

Given NFP only supports multiplication no bigger than u32, we'd require
divisor and dividend no bigger than that as well.

Also eBPF doesn't support signed divide and has enforced this on C language
level by failing compilation. However LLVM assembler hasn't enforced this,
so it is possible for negative constant to leak in as a BPF_K operand
through assembly code, we reject such cases as well.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# d3d23fdb 06-Jul-2018 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: support u16 and u32 multiplications

NFP supports u16 and u32 multiplication. Multiplication is done 8-bits per
step, therefore we need 2 steps for u16 and 4 steps for u32.

We also need one start instruction to initialize the sequence and one or
two instructions to fetch the result depending on either you need the high
halve of u32 multiplication.

For ALU64, if either operand is beyond u32's value range, we reject it. One
thing to note, if the source operand is BPF_K, then we need to check "imm"
field directly, and we'd reject it if it is negative. Because for ALU64,
"imm" (with s32 type) is expected to be sign extended to s64 which NFP mul
doesn't support. For ALU32, it is fine for "imm" be negative though,
because the result is 32-bits and here is no difference on the low halve
of result for signed/unsigned mul, so we will get correct result.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 662c5472 06-Jul-2018 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: rename umin/umax to umin_src/umax_src

The two fields are a copy of umin and umax info of bpf_insn->src_reg
generated by verifier.

Rename to make their meaning clear.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# cc0dff6d 26-Jun-2018 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: allow source ptr type be map ptr in memcpy optimization

Map read has been supported on NFP, this patch enables optimization
for memcpy from map to packet.

This patch also fixed one latent bug which will cause copying from
unexpected address once memcpy for map pointer enabled. The fixed
code path was not exercised before.

Reported-by: Mary Pham <mary.pham@netronome.com>
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# c217abcc 18-May-2018 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: support arithmetic indirect right shift (BPF_ARSH | BPF_X)

Code logic is similar with arithmetic right shift by constant, and NFP
get indirect shift amount through source A operand of PREV_ALU.

It is possible to fall back to logic right shift if the MSB is known to be
zero from range info, however there is no benefit to do this given logic
indirect right shift use the same number and cycle of instruction sequence.

Suppose the MSB of regX is the bit we want to replicate to fill in all the
vacant positions, and regY contains the shift amount, then we could use
single instruction to set up both.

[alu, --, regY, OR, regX]

--
NOTE: the PREV_ALU result doesn't need to write to any destination
register.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# f43d0f17 18-May-2018 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: support arithmetic right shift by constant (BPF_ARSH | BPF_K)

Code logic is similar with logic right shift except we also need to set
PREV_ALU result properly, the MSB of which is the bit that will be
replicated to fill in all the vacant positions.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 991f5b36 18-May-2018 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: support logic indirect shifts (BPF_[L|R]SH | BPF_X)

For indirect shifts, shift amount is not specified as constant, NFP needs
to get the shift amount through the low 5 bits of source A operand in
PREV_ALU, therefore extra instructions are needed compared with shifts by
constants.

Because NFP is 32-bit, so we are using register pair for 64-bit shifts and
therefore would need different instruction sequences depending on whether
shift amount is less than 32 or not.

NFP branch-on-bit-test instruction emitter is added by this patch and is
used for efficient runtime check on shift amount. We'd think the shift
amount is less than 32 if bit 5 is clear and greater or equal than 32
otherwise. Shift amount is greater than or equal to 64 will result in
undefined behavior.

This patch also use range info to avoid generating unnecessary runtime code
if we are certain shift amount is less than 32 or not.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# d985888f 08-May-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: support setting the RX queue index

BPF has access to all internal FW datapath structures. Including
the structure containing RX queue selection. With little coordination
with the datapath we can let the offloaded BPF select the RX queue.
We just need a way to tell the datapath that queue selection has already
been done and it shouldn't overwrite it. Define a bit to tell datapath
BPF already selected a queue (QSEL_SET), if the selected queue is not
enabled (>= number of enabled queues) datapath will perform normal RSS.

BPF queue selection on the NIC can be used to replace standard
datapath RSS with fully programmable BPF/XDP RSS.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# b4264c96 03-May-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: rewrite map pointers with NFP TIDs

Kernel will now replace map fds with actual pointer before
calling the offload prepare. We can identify those pointers
and replace them with NFP table IDs instead of loading the
table ID in code generated for CALL instruction.

This allows us to support having the same CALL being used with
different maps.

Since we don't want to change the FW ABI we still need to
move the TID from R1 to portion of R0 before the jump.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 9816dd35 03-May-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: perf event output helpers support

Add support for the perf_event_output family of helpers.

The implementation on the NFP will not match the host code exactly.
The state of the host map and rings is unknown to the device, hence
device can't return errors when rings are not installed. The device
simply packs the data into a firmware notification message and sends
it over to the host, returning success to the program.

There is no notion of a host CPU on the device when packets are being
processed. Device will only offload programs which set BPF_F_CURRENT_CPU.
Still, if map index doesn't match CPU no error will be returned (see
above).

Dropped/lost firmware notification messages will not cause "lost
events" event on the perf ring, they are only visible via device
error counters.

Firmware notification messages may also get reordered in respect
to the packets which caused their generation.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 7bdc97be 24-Apr-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: optimize comparisons to negative constants

Comparison instruction requires a subtraction. If the constant
is negative we are more likely to fit it into a NFP instruction
directly if we change the sign and use addition.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 61dd8f00 24-Apr-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: tabularize generations of compare operations

There are quite a few compare instructions now, use a table
to translate BPF instruction code to NFP instruction parameters
instead of parameterizing helpers. This saves LOC and makes
future extensions easier.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 6c59500c 24-Apr-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: optimize add/sub of a negative constant

NFP instruction set can fit small immediates into the instruction.
Negative integers, however, will never fit because they will have
highest bit set. If we swap the ALU op between ADD and SUB and
negate the constant we have a better chance of fitting small negative
integers into the instruction itself and saving one or two cycles.

immed[gprB_21, 0xfffffffc]
alu[gprA_4, gprA_4, +, gprB_21], gpr_wrboth
immed[gprB_21, 0xffffffff]
alu[gprA_5, gprA_5, +carry, gprB_21], gpr_wrboth

now becomes:

alu[gprA_4, gprA_4, -, 4], gpr_wrboth
alu[gprA_5, gprA_5, -carry, 0], gpr_wrboth

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 9c9e5323 24-Apr-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: remove double space

Whitespace cleanup - remove double space.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# df4a37d8 28-Mar-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add support for bpf_get_prandom_u32()

NFP has a prng register, which we can read to obtain a u32 worth
of pseudo random data. Generate code for it.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 41aed09c 28-Mar-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add support for atomic add of unknown values

Allow atomic add to be used even when the value is not guaranteed
to fit into a 16 bit immediate. This requires the value to be pulled
as data, and therefore use of a transfer register and a context swap.

Track the information about possible lengths of the value, if it's
guaranteed to be larger than 16bits don't generate the code for the
optimized case at all.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# b556ddd9 28-Mar-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: expose command delay slots

Allow callers to control the delay slots of commands, instead of
giving them just a wait/nowait choice.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# dcb0c27f 28-Mar-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add basic support for atomic adds

Implement atomic add operation for 32 and 64 bit values. Depend
on the verifier to ensure alignment. Values have to be kept in
big endian and swapped upon read/write. For now only support
atomic add of a constant.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# bfee64de 28-Mar-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add map deletes from the datapath

Support calling map_delete_elem() FW helper from the datapath
programs. For JIT checks and code are basically equivalent
to map lookups. Similarly to other map helper key must be on
the stack. Different pointer types are left for future extension.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 44d65a47 28-Mar-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add map updates from the datapath

Support calling map_update_elem() from the datapath programs
by calling into FW-provided helper. Value pointer is passed
in LM pointer #2. Keeping track of old state for arg3 is not
necessary, since LM pointer #2 will be always loaded in this
case, the trivial optimization for value at the bottom of the
stack can't be done here.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 2f46e0c1 28-Mar-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add helper for validating stack pointers

Our implementation has restriction on stack pointers for function
calls. Move the common checks into a helper for reuse. The state
has to be encapsulated into a structure to support parameters
other than BPF_REG_2.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# fc448497 28-Mar-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: rename map_lookup_stack() to map_call_stack_common()

We will reuse most of map call code gen for other map calls.
Rename the lookup gen function and use meta->func_id instead
of hard-coding lookup.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 87b10ecd 28-Mar-2018 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: detect packet reads could be cached, enable the optimisation

This patch is the front end of this optimisation, it detects and marks
those packet reads that could be cached. Then the optimisation "backend"
will be activated automatically.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 91ff69e8 28-Mar-2018 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: support unaligned read offset

This patch add the support for unaligned read offset, i.e. the read offset
to the start of packet cache area is not aligned to REG_WIDTH. In this
case, the read area might across maximum three transfer-in registers.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# be759237 28-Mar-2018 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: read from packet data cache for PTR_TO_PACKET

This patch assumes there is a packet data cache, and would try to read
packet data from the cache instead of from memory.

This patch only implements the optimisation "backend", it doesn't build
the packet data cache, so this optimisation is not enabled.

This patch has only enabled aligned packet data read, i.e. when the read
offset to the start of cache is REG_WIDTH aligned.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# e8a4796e 23-Mar-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: fix check of program max insn count

NFP program allocation length is in bytes and NFP program length
is in instructions, fix the comparison of the two.

Fixes: 9314c442d7dd ("nfp: bpf: move translation prepare to offload.c")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 74801e50 16-Jan-2018 Quentin Monnet <quentin@isovalent.com>

nfp: bpf: reject program on instructions unknown to the JIT compiler

If an eBPF instruction is unknown to the driver JIT compiler, we can
reject the program at verification time.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 3dd43c33 11-Jan-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add support for reading map memory

Map memory needs to use 40 bit addressing. Add handling of such
accesses. Since 40 bit addresses are formed by using both 32 bit
operands we need to pre-calculate the actual address instead of
adding in the offset inside the instruction, like we did in 32 bit
mode.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 77a3d311 11-Jan-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add verification and codegen for map lookups

Verify our current constraints on the location of the key are
met and generate the code for calling map lookup on the datapath.

New relocation types have to be added - for helpers and return
addresses.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# c087aa8b 09-Jan-2018 Nic Viljoen <nick.viljoen@netronome.com>

nfp: bpf: add signed jump insns

This patch adds signed jump instructions (jsgt, jsge, jslt, jsle)
to the nfp jit. As well as adding the additional required raw
assembler branch mask to nfp_asm.h

Signed-off-by: Nic Viljoen <nick.viljoen@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# e84797fe 09-Jan-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: use a large constant in unresolved branches

To make absolute relocated branches (branches which will be completely
rewritten with br_set_offset()) distinguishable in user space dumps
from normal jumps add a large offset to them.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 44a12ecc 09-Jan-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: don't depend on high order allocations for program image

The translator pre-allocates a buffer of maximal program size.
Due to HW/FW limitations the program buffer can't currently be
longer than 128Kb, so we used to kmalloc() it, and then map for
DMA directly.

Now that the late branch resolution is copying the program image
anyway, we can just kvmalloc() the buffer. While at it, after
translation reallocate the buffer to save space.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 2314fe9e 09-Jan-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: relocate jump targets just before the load

Don't translate the program assuming it will be loaded at a given
address. This will be required for sharing programs between ports
of the same NIC, tail calls and subprograms. It will also make the
jump targets easier to understand when dumping the program to user
space.

Translate the program as if it was going to be loaded at address
zero. When load happens add the load offset in and set addresses
of special branches.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 488feeaf 09-Jan-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add helpers for modifying branch addresses

In preparation for better handling of relocations move existing
helper for setting branch offset to nfp_asm.c and add two more.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 1549921d 09-Jan-2018 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: move jump resolution to jit.c

Jump target resolution should be in jit.c not offload.c.
No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 8231f844 14-Dec-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: optimize the adjust_head calls in trivial cases

If the program is simple and has only one adjust head call
with constant parameters, we can check that the call will
always succeed at translation time. We need to track the
location of the call and make sure parameters are always
the same. We also have to check the parameters against
datapath constraints and ETH_HLEN.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 0d49eaf4 14-Dec-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add basic support for adjust head call

Support bpf_xdp_adjust_head(). We need to check whether the
packet offset after adjustment is within datapath's limits.
We also check if the frame is at least ETH_HLEN long (similar
to the kernel implementation).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 2cb230bd 14-Dec-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: prepare for call support

Add skeleton of verifier checks and translation handler
for call instructions. Make sure jump target resolution
will not treat them as jumps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 6bc7103c 30-Nov-2017 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: detect load/store sequences lowered from memory copy

This patch add the optimization frontend, but adding a new eBPF IR scan
pass "nfp_bpf_opt_ldst_gather".

The pass will traverse the IR to recognize the load/store pairs sequences
that come from lowering of memory copy builtins.

The gathered memory copy information will be kept in the meta info
structure of the first load instruction in the sequence and will be
consumed by the optimization backend added in the previous patches.

NOTE: a sequence with cross memory access doesn't qualify this
optimization, i.e. if one load in the sequence will load from place that
has been written by previous store. This is because when we turn the
sequence into single CPP operation, we are reading all contents at once
into NFP transfer registers, then write them out as a whole. This is not
identical with what the original load/store sequence is doing.

Detecting cross memory access for two random pointers will be difficult,
fortunately under XDP/eBPF's restrictied runtime environment, the copy
normally happen among map, packet data and stack, they do not overlap with
each other.

And for cases supported by NFP, cross memory access will only happen on
PTR_TO_PACKET. Fortunately for this, there is ID information that we could
do accurate memory alias check.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 8c900538 30-Nov-2017 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: implement memory bulk copy for length bigger than 32-bytes

When the gathered copy length is bigger than 32-bytes and within 128-bytes
(the maximum length a single CPP Pull/Push request can finish), the
strategy of read/write are changeed into:

* Read.
- use direct reference mode when length is within 32-bytes.
- use indirect mode when length is bigger than 32-bytes.

* Write.
- length <= 8-bytes
use write8 (direct_ref).
- length <= 32-byte and 4-bytes aligned
use write32 (direct_ref).
- length <= 32-bytes but not 4-bytes aligned
use write8 (indirect_ref).
- length > 32-bytes and 4-bytes aligned
use write32 (indirect_ref).
- length > 32-bytes and not 4-bytes aligned and <= 40-bytes
use write32 (direct_ref) to finish the first 32-bytes.
use write8 (direct_ref) to finish all remaining hanging part.
- length > 32-bytes and not 4-bytes aligned
use write32 (indirect_ref) to finish those 4-byte aligned parts.
use write8 (direct_ref) to finish all remaining hanging part.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 9879a381 30-Nov-2017 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: implement memory bulk copy for length within 32-bytes

For NFP, we want to re-group a sequence of load/store pairs lowered from
memcpy/memmove into single memory bulk operation which then could be
accelerated using NFP CPP bus.

This patch extends the existing load/store auxiliary information by adding
two new fields:

struct bpf_insn *paired_st;
s16 ldst_gather_len;

Both fields are supposed to be carried by the the load instruction at the
head of the sequence. "paired_st" is the corresponding store instruction at
the head and "ldst_gather_len" is the gathered length.

If "ldst_gather_len" is negative, then the sequence is doing memory
load/store in descending order, otherwise it is in ascending order. We need
this information to detect overlapped memory access.

This patch then optimize memory bulk copy when the copy length is within
32-bytes.

The strategy of read/write used is:

* Read.
Use read32 (direct_ref), always.

* Write.
- length <= 8-bytes
write8 (direct_ref).
- length <= 32-bytes and is 4-byte aligned
write32 (direct_ref).
- length <= 32-bytes but is not 4-byte aligned
write8 (indirect_ref).

NOTE: the optimization should not change program semantics. The destination
register of the last load instruction should contain the same value before
and after this optimization.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 5468a8b9 30-Nov-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: encode indirect commands

Add support for emitting commands with field overwrites.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 3239e7bb 30-Nov-2017 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: correct the encoding for No-Dest immed

When immed is used with No-Dest, the emitter should use reg.dst instead of
reg.areg for the destination, using the latter will actually encode
register zero.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 29fe46ef 30-Nov-2017 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: don't do ld/shifts combination if shifts are jump destination

If any of the shift insns in the ld/shift sequence is jump destination,
don't do combination.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 1266f5d6 30-Nov-2017 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: don't do ld/mask combination if mask is jump destination

If the mask insn in the ld/mask pair is jump destination, then don't do
combination.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 5b674140 30-Nov-2017 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: record jump destination to simplify jump fixup

eBPF insns are internally organized as dual-list inside NFP offload JIT.
Random access to an insn needs to be done by either forward or backward
traversal along the list.

One place we need to do such traversal is at nfp_fixup_branches where one
traversal is needed for each jump insn to find the destination. Such
traversals could be avoided if jump destinations are collected through a
single travesal in a pre-scan pass, and such information could also be
useful in other places where jump destination info are needed.

This patch adds such jump destination collection in nfp_prog_prepare.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 854dc87d 30-Nov-2017 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: support backward jump

This patch adds support for backward jump on NFP.

- restrictions on backward jump in various functions have been removed.
- nfp_fixup_branches now supports backward jump.

There is one thing to note, currently an input eBPF JMP insn may generate
several NFP insns, for example,

NFP imm move insn A \
NFP compare insn B --> 3 NFP insn jited from eBPF JMP insn M
NFP branch insn C /
---
NFP insn X --> 1 NFP insn jited from eBPF insn N
---
...

therefore, we are doing sanity check to make sure the last jited insn from
an eBPF JMP is a NFP branch instruction.

Once backward jump is allowed, it is possible an eBPF JMP insn is at the
end of the program. This is however causing trouble for the sanity check.
Because the sanity check requires the end index of the NFP insns jited from
one eBPF insn while only the start index is recorded before this patch that
we can only get the end index by:

start_index_of_the_next_eBPF_insn - 1

or for the above example:

start_index_of_eBPF_insn_N (which is the index of NFP insn X) - 1

nfp_fixup_branches was using nfp_for_each_insn_walk2 to expose *next* insn
to each iteration during the traversal so the last index could be
calculated from which. Now, it needs some extra code to handle the last
insn. Meanwhile, the use of walk2 is actually unnecessary, we could simply
use generic single instruction walk to do this, the next insn could be
easily calculated using list_next_entry.

So, this patch migrates the jump fixup traversal method to
*list_for_each_entry*, this simplifies the code logic a little bit.

The other thing to note is a new state variable "last_bpf_off" is
introduced to track the index of the last jited NFP insn. This is necessary
because NFP is generating special purposes epilogue sequences, so the index
of the last jited NFP insn is *not* always nfp_prog->prog_len - 1.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# c6c580d7 03-Nov-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: move to new BPF program offload infrastructure

Following steps are taken in the driver to offload an XDP program:

XDP_SETUP_PROG:
* prepare:
- allocate program state;
- run verifier (bpf_analyzer());
- run translation;
* load:
- stop old program if needed;
- load program;
- enable BPF if not enabled;
* clean up:
- free program image.

With new infrastructure the flow will look like this:

BPF_OFFLOAD_VERIFIER_PREP:
- allocate program state;
BPF_OFFLOAD_TRANSLATE:
- run translation;
XDP_SETUP_PROG:
- stop old program if needed;
- load program;
- enable BPF if not enabled;
BPF_OFFLOAD_DESTROY:
- free program image.

Take advantage of the new infrastructure. Allocation of driver
metadata has to be moved from jit.c to offload.c since it's now
done at a different stage. Since there is no separate driver
private data for verification step, move temporary nfp_meta
pointer into nfp_prog. We will now use user space context
offsets.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9314c442 03-Nov-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: move translation prepare to offload.c

struct nfp_prog is currently only used internally by the translator.
This means there is a lot of parameter passing going on, between
the translator and different stages of offload. Simplify things
by allocating nfp_prog in offload.c already.

We will now use kmalloc() to allocate the program area and only
DMA map it for the time of loading (instead of allocating DMA
coherent memory upfront).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c1c88eae 03-Nov-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: move program prepare and free into offload.c

Most of offload/translation prepare logic will be moved to
offload.c. To help git generate more reasonable diffs
move nfp_prog_prepare() and nfp_prog_free() functions
there as a first step.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 94508438 03-Nov-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: remove the register renumbering leftovers

The register renumbering was removed and will not be coming back
in its old, naive form, given that it would be fundamentally
incompatible with calling functions. Remove the leftovers.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 012bb8a8 03-Nov-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: drop support for cls_bpf with legacy actions

Only support BPF_PROG_TYPE_SCHED_CLS programs in direct
action mode. This simplifies preparing the offload since
there will now be only one mode of operation for that type
of program. We need to know the attachment mode type of
cls_bpf programs, because exit codes are interpreted
differently for legacy vs DA mode.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 254ef4d7 01-Nov-2017 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: support [BPF_ALU | BPF_ALU64] | BPF_NEG

This patch supports BPF_NEG under both BPF_ALU64 and BPF_ALU. LLVM recently
starts to generate it.

NOTE: BPF_NEG takes single operand which is an register and serve as both
input and output.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5d42ced1 01-Nov-2017 Jiong Wang <jiong.wang@netronome.com>

nfp: bpf: rename ALU_OP_NEG to ALU_OP_NOT

The current ALU_OP_NEG is Op encoding 0x4 for NPF ALU instruction. It is
actually performing "~B" operation which is bitwise NOT.

The using naming ALU_OP_NEG is misleading as NEG is -B which is not the
same as ~B.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9f16c8ab 23-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: optimize mov64 a little

Loading 64bit constants require up to 4 load immediates, since
we can only load 16 bits at a time. If the 32bit halves of
the 64bit constant are the same, however, we can save a cycle
by doing a register move instead of two loads of 16 bits.

Note that we don't optimize the normal ALU64 load because even
though it's a 64 bit load the upper half of the register is
a coming from sign extension so we can load it in one cycle
anyway.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b14157ee 23-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: support stack accesses via non-constant pointers

If stack pointer has a different value on different paths
but the alignment to words (4B) remains the same, we can
set a new LMEM access pointer to the calculated value and
access whichever word it's pointing to.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2df03a50 23-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: support accessing the stack beyond 64 bytes

To access beyond 64th byte of the stack we need to set a new
stack pointer register (LMEM is accessed indirectly through
those pointers). Add a function for encoding local CSR access
instruction. Use stack pointer number 3.

Note that stack pointer registers allow us to index into 32
bytes of LMEM (with shift operations i.e. when operands are
restricted). This means if access is crossing 32 byte boundary
we must not use offsetting, we have to set the pointer to the
exact address and move it with post-increments.

We depend on the datapath placing the stack base address in
GPR A22 for our use.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d3488480 23-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: allow stack accesses via modified stack registers

As long as the verifier tells us the stack offset exactly we
can render the LMEM reads quite easily. Simply make sure that
the offset is constant for a given instruction and add it to
the instruction's offset.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9a90c83c 23-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: optimize the RMW for stack accesses

When we are performing unaligned stack accesses in the 32-64B window
we have to do a read-modify-write cycle. E.g. for reading 8 bytes
from address 17:

0: tmp = stack[16]
1: gprLo = tmp >> 8
2: tmp = stack[20]
3: gprLo |= tmp << 24
4: tmp = stack[20]
5: gprHi = tmp >> 8
6: tmp = stack[24]
7: gprHi |= tmp << 24

The load on line 4 is unnecessary, because tmp already contains data
from stack[20].

For write we can optimize both loads and writebacks away.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a82b23fb 23-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add stack read support

Add simple stack read support, similar to write in every aspect,
but data flowing the other way. Note that unlike write which can
be done in smaller than word quantities, if registers are loaded
with less-than-word of stack contents - the values have to be
zero extended.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ee9133a8 23-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add stack write support

Stack is implemented by the LMEM register file. Unaligned accesses
to LMEM are not allowed. Accesses also have to be 4B wide.

To support stack we need to make sure offsets of pointers are known
at translation time (for now) and perform correct load/mask/shift
operations.

Since we can access first 64B of LMEM without much effort support
only stacks not bigger than 64B. Following commits will extend
the possible sizes beyond that.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ff42bb9f 23-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add helper for emitting nops

The need to emitting a few nops will become more common soon
as we add stack and map support. Add a helper. This allows
for code to be shorter but also may be handy for marking the
nops with a "reason" to ease applying optimizations.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bfddbc8a 12-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: support direct packet access in TC

Add support for direct packet access in TC, note that because
writing the packet will cause the verifier to generate a csum
fixup prologue we won't be able to offload packet writes from
TC, just yet, only the reads will work.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e663fe38 12-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: direct packet access - write

This patch adds ability to write packet contents using pre-validated
packet pointers (direct packet access).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2ca71441 12-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add support for direct packet access - read

In direct packet access bound checks are already done, we can
simply dereference the packet pointer.

Verifier/parser logic needs to record pointer type. Note that
although verifier does protect us from CTX vs other pointer
changes we will also want to differentiate between PACKET vs
MAP_VALUE or STACK, so we can add the check already.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0a793977 12-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: separate I/O from checks for legacy data load

Move data load into a separate function and separate it from
packet length checks of legacy I/O. This makes the code more
readable and easier to reuse.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 943c57b9 12-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: fix context accesses

Sizes of fields in struct xdp_md/xdp_buff and some in sk_buff depend
on target architecture. Take that into account and use struct xdp_buff,
not struct xdp_md.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3119d1fd 12-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: implement byte swap instruction

Implement byte swaps with rotations, shifts and byte loads.
Remember to clear upper parts of the 64 bit registers.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c000dfb5 12-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add mov helper

Register move operation is encoded as alu no op. This means
that one has to specify number of unused/none parameters to
the emit_alu(). Add a helper.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 26fa818d 12-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: fix compare instructions

Now that we have BPF assemebler support in LLVM 6 we can easily
test all compare instructions (LLVM 4 didn't generate most of them
from C). Fix the compare to immediates and refactor the order
of compare to regs to make sure they both follow the same pattern.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 82837370 12-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: add missing return in jne_imm optimization

We optimize comparisons to immediate 0 as if (reg.lo | reg.hi).
The early return statement was missing, however, which means we
would generate two comparisons - optimized one followed by a
normal 2x 32 bit compare.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bc8c80a8 12-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: reorder arguments to emit_ld_field_any()

ld_field instruction has the following format in NFP assembler:

ld_field[dst, 1000, src, <<24]

reoder parameters to emit_ld_field_any() to make it closer to
the familiar assembler order.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2de1be1d 08-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: pass dst register to ld_field instruction

ld_field instruction is a bit special because the encoding uses
two source registers and one of them becomes the output. We do
need to pass the dst register to our encoding helpers though,
otherwise the "write both banks" flag will not be observed.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2e85d388 08-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: byte swap the instructions

Device expects the instructions in little endian. Make sure we
byte swap on big endian hosts.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1c03e03f 08-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: pad code with valid nops

We need to append up to 8 nops after last instruction to make
sure the CPU will not fetch garbage instructions with invalid
ECC if the code store was not initialized.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fd068ddc8 08-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: calculate code store ECC

In the initial PoC firmware I simply disabled ECC on the instruction
store. Do the ECC calculation for generated instructions in the driver.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 18e53b6c 08-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: move to datapath ABI version 2

Datapath ABI version 2 stores the packet information in LMEM
instead of NNRs. We also have strict restrictions on which
GPRs we can use. Only GPRs 0-23 are reserved for BPF.

Adjust the static register locations and "ABI" registers.
Note that packet length is packed with other info so we have
to extract it into one of the scratch registers, OTOH since
LMEM can be used in restricted operands we don't have to
extract packet pointer.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 995e101f 08-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: encode extended LM pointer operands

Most instructions have special fields which allow switching
between base and extended Local Memory pointers. Introduce
those to register encoding, we will use the extra LM pointers
to access high addresses of the stack.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 509144e2 08-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: remove packet marking support

Temporarily drop support for skb->mark. We are primarily focusing
on XDP offload, and implementing skb->mark on the new datapath has
lower priority.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 226e0e94 08-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: remove register rename

Remove the register renumbering optimization. To implement calling
map and other helpers we need more strict register layout. We can't
freely reassign register numbers.

This will have the effect of running in 4 context/thread mode, which
should be OK since we are moving towards integrating the BPF closer
with FW app datapath anyway, and the target datapath itself runs in
4 context mode.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3cae1319 08-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: encode all 64bit shifts

Add encodings of all 64bit shift operations.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2a15bb1a 08-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: move software reg helpers and cmd table out of translator

Move the software reg helpers and some static data to nfp_asm.c.
They are related to the previous patch, but move is done in a separate
commit for ease of review.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b3f868df 08-Oct-2017 Jakub Kicinski <kuba@kernel.org>

nfp: bpf: use the power of sparse to check we encode registers right

Define a new __bitwise type for software representation of registers.
This will allow us to catch incorrect parameter types using sparse.

Accessors we define also allow us to return correct enum type and
therefore ensure all switches handle all register types.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5dd294d4 09-Aug-2017 Daniel Borkmann <daniel@iogearbox.net>

bpf, nfp: implement jiting of BPF_J{LT,LE}

This work implements jiting of BPF_J{LT,LE} instructions with
BPF_X/BPF_K variants for the nfp eBPF JIT. The two BPF_J{SLT,SLE}
instructions have not been added yet given BPF_J{SGT,SGE} are
not supported yet either.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d9ae7f2b 31-May-2017 Jakub Kicinski <kuba@kernel.org>

nfp: move eBPF offload files to BPF app directory

Pure move of eBPF offload files to BPF app directory,
only change the names and relative header location.
nfp_asm.h stays in the main dir and it doesn't really
have to include nfp_bpf.h.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>