History log of /linux-master/net/netfilter/nft_bitwise.c
Revision Date Author Comments
# a412dbf4 21-Jun-2023 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: limit allowed range via nla_policy

These NLA_U32 types get stored in u8 fields, reject invalid values
instead of silently casting to u8.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 14e8b293 25-May-2023 Jeremy Sowden <jeremy@azazel.net>

netfilter: nft_bitwise: fix register tracking

At the end of `nft_bitwise_reduce`, there is a loop which is intended to
update the bitwise expression associated with each tracked destination
register. However, currently, it just updates the first register
repeatedly. Fix it.

Fixes: 34cc9e52884a ("netfilter: nf_tables: cancel tracking for clobbered destination registers")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7d34aa3e 14-Oct-2022 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters

Add a 'reset' flag just like with nft_object_ops::dump. This will be
useful to reset "anonymous stateful objects", e.g. simple rule counters.

No functional change intended.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 341b6941 08-Aug-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: upfront validation of data via nft_data_init()

Instead of parsing the data and then validate that type and length are
correct, pass a description of the expected data so it can be validated
upfront before parsing it to bail out earlier.

This patch adds a new .size field to specify the maximum size of the
data area. The .len field is optional and it is used as an input/output
field, it provides the specific length of the expected data in the input
path. If then .len field is not specified, then obtained length from the
netlink attribute is stored. This is required by cmp, bitwise, range and
immediate, which provide no netlink attribute that describes the data
length. The immediate expression uses the destination register type to
infer the expected data type.

Relying on opencoded validation of the expected data might lead to
subtle bugs as described in 7e6bc1f6cabc ("netfilter: nf_tables:
stricter validation of element data").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 00bd4352 04-Apr-2022 Jeremy Sowden <jeremy@azazel.net>

netfilter: bitwise: improve error goto labels

Replace two labels (`err1` and `err2`) with more informative ones.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>


# c70b921f 04-Apr-2022 Jeremy Sowden <jeremy@azazel.net>

netfilter: bitwise: replace hard-coded size with `sizeof` expression

When calculating the length of an array, use the appropriate `sizeof`
expression for its type, rather than an integer literal.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 31818213 27-Mar-2022 Jeremy Sowden <jeremy@azazel.net>

netfilter: bitwise: fix reduce comparisons

The `nft_bitwise_reduce` and `nft_bitwise_fast_reduce` functions should
compare the bitwise operation in `expr` with the tracked operation
associated with the destination register of `expr`. However, instead of
being called on `expr` and `track->regs[priv->dreg].selector`,
`nft_expr_priv` is called on `expr` twice, so both reduce functions
return true even when the operations differ.

Fixes: be5650f8f47e ("netfilter: nft_bitwise: track register operations")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 03858af01 14-Mar-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_ct: track register operations

Check if the destination register already contains the data that this ct
expression performs. This allows to skip this redundant operation. If
the destination contains a different selector, update the register
tracking information.

Export nft_expr_reduce_bitwise as a symbol since nft_ct might be
compiled as a module.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 34cc9e52 14-Mar-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: cancel tracking for clobbered destination registers

Output of expressions might be larger than one single register, this might
clobber existing data. Reset tracking for all destination registers that
required to store the expression output.

This patch adds three new helper functions:

- nft_reg_track_update: cancel previous register tracking and update it.
- nft_reg_track_cancel: cancel any previous register tracking info.
- __nft_reg_track_cancel: cancel only one single register tracking info.

Partial register clobbering detection is also supported by checking the
.num_reg field which describes the number of register that are used.

This patch updates the following expressions:

- meta_bridge
- bitwise
- byteorder
- meta
- payload

to use these helper functions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# be5650f8 09-Jan-2022 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_bitwise: track register operations

Check if the destination register already contains the data that this
bitwise expression performs. This allows to skip this redundant
operation.

If the destination contains a different bitwise operation, cancel the
register tracking information. If the destination contains no bitwise
operation, update the register tracking information.

Update the payload and meta expression to check if this bitwise
operation has been already performed on the register. Hence, both the
payload/meta and the bitwise expressions are reduced.

There is also a special case: If source register != destination register
and source register is not updated by a previous bitwise operation, then
transfer selector from the source register to the destination register.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 345023b0 25-Jan-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: add nft_parse_register_store() and use it

This new function combines the netlink register attribute parser
and the store validation function.

This update requires to replace:

enum nft_registers dreg:8;

in many of the expression private areas otherwise compiler complains
with:

error: cannot take address of bit-field ‘dreg’

when passing the register field as reference.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4f16d25c 25-Jan-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nftables: add nft_parse_register_load() and use it

This new function combines the netlink register attribute parser
and the load validation function.

This update requires to replace:

enum nft_registers sreg:8;

in many of the expression private areas otherwise compiler complains
with:

error: cannot take address of bit-field ‘sreg’

when passing the register field as reference.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 10fdd6d8 01-Oct-2020 Phil Sutter <phil@nwl.cc>

netfilter: nf_tables: Implement fast bitwise expression

A typical use of bitwise expression is to mask out parts of an IP
address when matching on the network part only. Optimize for this common
use with a fast variant for NFT_BITWISE_BOOL-type expressions operating
on 32bit-sized values.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 049dee95 23-Feb-2020 Jeremy Sowden <jeremy@azazel.net>

netfilter: bitwise: use more descriptive variable-names.

Name the mask and xor data variables, "mask" and "xor," instead of "d1"
and "d2."

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 567d746b 15-Jan-2020 Jeremy Sowden <jeremy@azazel.net>

netfilter: bitwise: add support for shifts.

Hitherto nft_bitwise has only supported boolean operations: NOT, AND, OR
and XOR. Extend it to do shifts as well.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 779f725e 15-Jan-2020 Jeremy Sowden <jeremy@azazel.net>

netfilter: bitwise: add NFTA_BITWISE_DATA attribute.

Add a new bitwise netlink attribute that will be used by shift
operations to store the size of the shift. It is not used by boolean
operations.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ed991d43 15-Jan-2020 Jeremy Sowden <jeremy@azazel.net>

netfilter: bitwise: only offload boolean operations.

Only boolean operations supports offloading, so check the type of the
operation and return an error for other types.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4d57ca2b 15-Jan-2020 Jeremy Sowden <jeremy@azazel.net>

netfilter: bitwise: add helper for dumping boolean operations.

Split the code specific to dumping bitwise boolean operations out into a
separate function. A similar function will be added later for shift
operations.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 71d6ded3 15-Jan-2020 Jeremy Sowden <jeremy@azazel.net>

netfilter: bitwise: add helper for evaluating boolean operations.

Split the code specific to evaluating bitwise boolean operations out
into a separate function. Similar functions will be added later for
shift operations.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3f8d9eb0 15-Jan-2020 Jeremy Sowden <jeremy@azazel.net>

netfilter: bitwise: add helper for initializing boolean operations.

Split the code specific to initializing bitwise boolean operations out
into a separate function. A similar function will be added later for
shift operations.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9d1f9799 15-Jan-2020 Jeremy Sowden <jeremy@azazel.net>

netfilter: bitwise: add NFTA_BITWISE_OP netlink attribute.

Add a new bitwise netlink attribute, NFTA_BITWISE_OP, which is set to a
value of a new enum, nft_bitwise_ops. It describes the type of
operation an expression contains. Currently, it only has one value:
NFT_BITWISE_BOOL. More values will be added later to implement shifts.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 577c734a 15-Jan-2020 Jeremy Sowden <jeremy@azazel.net>

netfilter: bitwise: replace gotos with returns.

When dumping a bitwise expression, if any of the puts fails, we use goto
to jump to a label. However, no clean-up is required and the only
statement at the label is a return. Drop the goto's and return
immediately instead.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 265ec7b0 15-Jan-2020 Jeremy Sowden <jeremy@azazel.net>

netfilter: bitwise: remove NULL comparisons from attribute checks.

In later patches, we will be adding more checks. In order to be
consistent and prevent complaints from checkpatch.pl, replace the
existing comparisons with NULL with logical NOT operators.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fbf19ddf 15-Jan-2020 Jeremy Sowden <jeremy@azazel.net>

netfilter: nf_tables: white-space fixes.

Indentation fixes for the parameters of a few nft functions.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0d2c96af 06-Dec-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: validate NFT_DATA_VALUE after nft_data_init()

Userspace might bogusly sent NFT_DATA_VERDICT in several netlink
attributes that assume NFT_DATA_VALUE. Moreover, make sure that error
path invokes nft_data_release() to decrement the reference count on the
chain object.

Fixes: 96518518cc41 ("netfilter: add nftables")
Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# de2a6052 28-Oct-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables_offload: check for register data length mismatches

Make sure register data length does not mismatch immediate data length,
otherwise hit EOPNOTSUPP.

Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 83c156d3 14-Aug-2019 Nathan Chancellor <nathan@kernel.org>

netfilter: nft_bitwise: Adjust parentheses to fix memcmp size argument

clang warns:

net/netfilter/nft_bitwise.c:138:50: error: size argument in 'memcmp'
call is a comparison [-Werror,-Wmemsize-comparison]
if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
~~~~~~~~~~~~~~~~~~^~
net/netfilter/nft_bitwise.c:138:6: note: did you mean to compare the
result of 'memcmp' instead?
if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
^
)
net/netfilter/nft_bitwise.c:138:32: note: explicitly cast the argument
to size_t to silence this warning
if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
^
(size_t)(
1 error generated.

Adjust the parentheses so that the result of the sizeof is used for the
size argument in memcmp, rather than the result of the comparison (which
would always be true because sizeof is a non-zero number).

Fixes: bd8699e9e292 ("netfilter: nft_bitwise: add offload support")
Link: https://github.com/ClangBuiltLinux/linux/issues/638
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# bd8699e9 30-Jul-2019 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_bitwise: add offload support

Extract mask from bitwise operation and store it into the corresponding
context register so the cmp instruction can set the mask accordingly.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d2912cb1 04-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500

Based on 2 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 10870dd8 08-Jan-2019 Florian Westphal <fw@strlen.de>

netfilter: nf_tables: add direct calls for all builtin expressions

With CONFIG_RETPOLINE its faster to add an if (ptr == &foo_func)
check and and use direct calls for all the built-in expressions.

~15% improvement in pathological cases.

checkpatch doesn't like the X macro due to the embedded return statement,
but the macro has a very limited scope so I don't think its a problem.

I would like to avoid bugs of the form
If (e->ops->eval == (unsigned long)nft_foo_eval)
nft_bar_eval();

and open-coded if ()/else if()/else cascade, thus the macro.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 59105446 15-May-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: revisit chain/object refcounting from elements

Andreas reports that the following incremental update using our commit
protocol doesn't work.

# nft -f incremental-update.nft
delete element ip filter client_to_any { 10.180.86.22 : goto CIn_1 }
delete chain ip filter CIn_1
... Error: Could not process rule: Device or resource busy

The existing code is not well-integrated into the commit phase protocol,
since element deletions do not result in refcount decrement from the
preparation phase. This results in bogus EBUSY errors like the one
above.

Two new functions come with this patch:

* nft_set_elem_activate() function is used from the abort path, to
restore the set element refcounting on objects that occurred from
the preparation phase.

* nft_set_elem_deactivate() that is called from nft_del_setelem() to
decrement set element refcounting on objects from the preparation
phase in the commit protocol.

The nft_data_uninit() has been renamed to nft_data_release() since this
function does not uninitialize any data store in the data register,
instead just releases the references to objects. Moreover, a new
function nft_data_hold() has been introduced to be used from
nft_set_elem_activate().

Reported-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 71df14b0 15-May-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: missing sanitization in data from userspace

Do not assume userspace always sends us NFT_DATA_VALUE for bitwise and
cmp expressions. Although NFT_DATA_VERDICT does not make any sense, it
is still possible to handcraft a netlink message using this incorrect
data type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4e24877e 06-Nov-2016 Liping Zhang <zlpnobody@gmail.com>

netfilter: nf_tables: simplify the basic expressions' init routine

Some basic expressions are built into nf_tables.ko, such as nft_cmp,
nft_lookup, nft_range and so on. But these basic expressions' init
routine is a little ugly, too many goto errX labels, and we forget
to call nft_range_module_exit in the exit routine, although it is
harmless.

Acctually, the init and exit routines of these basic expressions
are same, i.e. do nft_register_expr in the init routine and do
nft_unregister_expr in the exit routine.

So it's better to arrange them into an array and deal with them
together.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 36b701fa 14-Sep-2016 Laura Garcia Liebana <nevola@gmail.com>

netfilter: nf_tables: validate maximum value of u32 netlink attributes

Fetch value and validate u32 netlink attribute. This validation is
usually required when the u32 netlink attributes are being stored in a
field whose size is smaller.

This patch revisits 4da449ae1df9 ("netfilter: nft_exthdr: Add size check
on u8 nft_exthdr attributes").

Fixes: 96518518cc41 ("netfilter: add nftables")
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d0a11fc3 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: support variable sized data in nft_data_init()

Add a size argument to nft_data_init() and pass in the available space.
This will be used by the following patches to support variable sized
set element data.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 49499c3e 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: switch registers to 32 bit addressing

Switch the nf_tables registers from 128 bit addressing to 32 bit
addressing to support so called concatenations, where multiple values
can be concatenated over multiple registers for O(1) exact matches of
multiple dimensions using sets.

The old register values are mapped to areas of 128 bits for compatibility.
When dumping register numbers, values are expressed using the old values
if they refer to the beginning of a 128 bit area for compatibility.

To support concatenations, register loads of less than a full 32 bit
value need to be padded. This mainly affects the payload and exthdr
expressions, which both unconditionally zero the last word before
copying the data.

Userspace fully passes the testsuite using both old and new register
addressing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b1c96ed3 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add register parsing/dumping helpers

Add helper functions to parse and dump register values in netlink attributes.
These helpers will later be changed to take care of translation between the
old 128 bit and the new 32 bit register numbers.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fad136ea 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: convert expressions to u32 register pointers

Simple conversion to use u32 pointers to the beginning of the registers
to keep follow up patches smaller.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a55e22e9 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: get rid of NFT_REG_VERDICT usage

Replace the array of registers passed to expressions by a struct nft_regs,
containing the verdict as a seperate member, which aliases to the
NFT_REG_VERDICT register.

This is needed to seperate the verdict from the data registers completely,
so their size can be changed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d07db988 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: introduce nft_validate_register_load()

Change nft_validate_input_register() to not only validate the input
register number, but also the length of the load, and rename it to
nft_validate_register_load() to reflect that change.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 27e6d201 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: kill nft_validate_output_register()

All users of nft_validate_register_store() first invoke
nft_validate_output_register(). There is in fact no use for using it
on its own, so simplify the code by folding the functionality into
nft_validate_register_store() and kill it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1ec10212 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: rename nft_validate_data_load()

The existing name is ambiguous, data is loaded as well when we read from
a register. Rename to nft_validate_register_store() for clarity and
consistency with the upcoming patch to introduce its counterpart,
nft_validate_register_load().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 45d9bcda 10-Apr-2015 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: validate len in nft_validate_data_load()

For values spanning multiple registers, we need to validate that enough
space is available from the destination register onwards. Add a len
argument to nft_validate_data_load() and consolidate the existing length
validations in preparation of that.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ef1f7df9 10-Oct-2013 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: expression ops overloading

Split the expression ops into two parts and support overloading of
the runtime expression ops based on the requested function through
a ->select_ops() callback.

This can be used to provide optimized implementations, for instance
for loading small aligned amounts of data from the packet or inlining
frequently used operations into the main evaluation loop.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 96518518 14-Oct-2013 Patrick McHardy <kaber@trash.net>

netfilter: add nftables

This patch adds nftables which is the intended successor of iptables.
This packet filtering framework reuses the existing netfilter hooks,
the connection tracking system, the NAT subsystem, the transparent
proxying engine, the logging infrastructure and the userspace packet
queueing facilities.

In a nutshell, nftables provides a pseudo-state machine with 4 general
purpose registers of 128 bits and 1 specific purpose register to store
verdicts. This pseudo-machine comes with an extensible instruction set,
a.k.a. "expressions" in the nftables jargon. The expressions included
in this patch provide the basic functionality, they are:

* bitwise: to perform bitwise operations.
* byteorder: to change from host/network endianess.
* cmp: to compare data with the content of the registers.
* counter: to enable counters on rules.
* ct: to store conntrack keys into register.
* exthdr: to match IPv6 extension headers.
* immediate: to load data into registers.
* limit: to limit matching based on packet rate.
* log: to log packets.
* meta: to match metainformation that usually comes with the skbuff.
* nat: to perform Network Address Translation.
* payload: to fetch data from the packet payload and store it into
registers.
* reject (IPv4 only): to explicitly close connection, eg. TCP RST.

Using this instruction-set, the userspace utility 'nft' can transform
the rules expressed in human-readable text representation (using a
new syntax, inspired by tcpdump) to nftables bytecode.

nftables also inherits the table, chain and rule objects from
iptables, but in a more configurable way, and it also includes the
original datatype-agnostic set infrastructure with mapping support.
This set infrastructure is enhanced in the follow up patch (netfilter:
nf_tables: add netlink set API).

This patch includes the following components:

* the netlink API: net/netfilter/nf_tables_api.c and
include/uapi/netfilter/nf_tables.h
* the packet filter core: net/netfilter/nf_tables_core.c
* the expressions (described above): net/netfilter/nft_*.c
* the filter tables: arp, IPv4, IPv6 and bridge:
net/ipv4/netfilter/nf_tables_ipv4.c
net/ipv6/netfilter/nf_tables_ipv6.c
net/ipv4/netfilter/nf_tables_arp.c
net/bridge/netfilter/nf_tables_bridge.c
* the NAT table (IPv4 only):
net/ipv4/netfilter/nf_table_nat_ipv4.c
* the route table (similar to mangle):
net/ipv4/netfilter/nf_table_route_ipv4.c
net/ipv6/netfilter/nf_table_route_ipv6.c
* internal definitions under:
include/net/netfilter/nf_tables.h
include/net/netfilter/nf_tables_core.h
* It also includes an skeleton expression:
net/netfilter/nft_expr_template.c
and the preliminary implementation of the meta target
net/netfilter/nft_meta_target.c

It also includes a change in struct nf_hook_ops to add a new
pointer to store private data to the hook, that is used to store
the rule list per chain.

This patch is based on the patch from Patrick McHardy, plus merged
accumulated cleanups, fixes and small enhancements to the nftables
code that has been done since 2009, which are:

From Patrick McHardy:
* nf_tables: adjust netlink handler function signatures
* nf_tables: only retry table lookup after successful table module load
* nf_tables: fix event notification echo and avoid unnecessary messages
* nft_ct: add l3proto support
* nf_tables: pass expression context to nft_validate_data_load()
* nf_tables: remove redundant definition
* nft_ct: fix maxattr initialization
* nf_tables: fix invalid event type in nf_tables_getrule()
* nf_tables: simplify nft_data_init() usage
* nf_tables: build in more core modules
* nf_tables: fix double lookup expression unregistation
* nf_tables: move expression initialization to nf_tables_core.c
* nf_tables: build in payload module
* nf_tables: use NFPROTO constants
* nf_tables: rename pid variables to portid
* nf_tables: save 48 bits per rule
* nf_tables: introduce chain rename
* nf_tables: check for duplicate names on chain rename
* nf_tables: remove ability to specify handles for new rules
* nf_tables: return error for rule change request
* nf_tables: return error for NLM_F_REPLACE without rule handle
* nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
* nf_tables: fix NLM_F_MULTI usage in netlink notifications
* nf_tables: include NLM_F_APPEND in rule dumps

From Pablo Neira Ayuso:
* nf_tables: fix stack overflow in nf_tables_newrule
* nf_tables: nft_ct: fix compilation warning
* nf_tables: nft_ct: fix crash with invalid packets
* nft_log: group and qthreshold are 2^16
* nf_tables: nft_meta: fix socket uid,gid handling
* nft_counter: allow to restore counters
* nf_tables: fix module autoload
* nf_tables: allow to remove all rules placed in one chain
* nf_tables: use 64-bits rule handle instead of 16-bits
* nf_tables: fix chain after rule deletion
* nf_tables: improve deletion performance
* nf_tables: add missing code in route chain type
* nf_tables: rise maximum number of expressions from 12 to 128
* nf_tables: don't delete table if in use
* nf_tables: fix basechain release

From Tomasz Bursztyka:
* nf_tables: Add support for changing users chain's name
* nf_tables: Change chain's name to be fixed sized
* nf_tables: Add support for replacing a rule by another one
* nf_tables: Update uapi nftables netlink header documentation

From Florian Westphal:
* nft_log: group is u16, snaplen u32

From Phil Oester:
* nf_tables: operational limit match

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>