History log of /linux-master/include/uapi/linux/netfilter.h
Revision Date Author Comments
# a0a4de4d 22-Aug-2022 Florian Westphal <fw@strlen.de>

netfilter: remove NFPROTO_DECNET

Decnet has been removed. so no need to reserve space in arrays for it.

Signed-off-by: Florian Westphal <fw@strlen.de>


# 42df6e1d 08-Oct-2021 Lukas Wunner <lukas@wunner.de>

netfilter: Introduce egress hook

Support classifying packets with netfilter on egress to satisfy user
requirements such as:
* outbound security policies for containers (Laura)
* filtering and mangling intra-node Direct Server Return (DSR) traffic
on a load balancer (Laura)
* filtering locally generated traffic coming in through AF_PACKET,
such as local ARP traffic generated for clustering purposes or DHCP
(Laura; the AF_PACKET plumbing is contained in a follow-up commit)
* L2 filtering from ingress and egress for AVB (Audio Video Bridging)
and gPTP with nftables (Pablo)
* in the future: in-kernel NAT64/NAT46 (Pablo)

The egress hook introduced herein complements the ingress hook added by
commit e687ad60af09 ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key"). A patch for nftables to hook up
egress rules from user space has been submitted separately, so users may
immediately take advantage of the feature.

Alternatively or in addition to netfilter, packets can be classified
with traffic control (tc). On ingress, packets are classified first by
tc, then by netfilter. On egress, the order is reversed for symmetry.
Conceptually, tc and netfilter can be thought of as layers, with
netfilter layered above tc.

Traffic control is capable of redirecting packets to another interface
(man 8 tc-mirred). E.g., an ingress packet may be redirected from the
host namespace to a container via a veth connection:
tc ingress (host) -> tc egress (veth host) -> tc ingress (veth container)

In this case, netfilter egress classifying is not performed when leaving
the host namespace! That's because the packet is still on the tc layer.
If tc redirects the packet to a physical interface in the host namespace
such that it leaves the system, the packet is never subjected to
netfilter egress classifying. That is only logical since it hasn't
passed through netfilter ingress classifying either.

Packets can alternatively be redirected at the netfilter layer using
nft fwd. Such a packet *is* subjected to netfilter egress classifying
since it has reached the netfilter layer.

Internally, the skb->nf_skip_egress flag controls whether netfilter is
invoked on egress by __dev_queue_xmit(). Because __dev_queue_xmit() may
be called recursively by tunnel drivers such as vxlan, the flag is
reverted to false after sch_handle_egress(). This ensures that
netfilter is applied both on the overlay and underlying network.

Interaction between tc and netfilter is possible by setting and querying
skb->mark.

If netfilter egress classifying is not enabled on any interface, it is
patched out of the data path by way of a static_key and doesn't make a
performance difference that is discernible from noise:

Before: 1537 1538 1538 1537 1538 1537 Mb/sec
After: 1536 1534 1539 1539 1539 1540 Mb/sec
Before + tc accept: 1418 1418 1418 1419 1419 1418 Mb/sec
After + tc accept: 1419 1424 1418 1419 1422 1420 Mb/sec
Before + tc drop: 1620 1619 1619 1619 1620 1620 Mb/sec
After + tc drop: 1616 1624 1625 1624 1622 1619 Mb/sec

When netfilter egress classifying is enabled on at least one interface,
a minimal performance penalty is incurred for every egress packet, even
if the interface it's transmitted over doesn't have any netfilter egress
rules configured. That is caused by checking dev->nf_hooks_egress
against NULL.

Measurements were performed on a Core i7-3615QM. Commands to reproduce:
ip link add dev foo type dummy
ip link set dev foo up
modprobe pktgen
echo "add_device foo" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1

Accept all traffic with tc:
tc qdisc add dev foo clsact
tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,'

Drop all traffic with tc:
tc qdisc add dev foo clsact
tc filter add dev foo egress bpf da bytecode '1,6 0 0 2,'

Apply this patch when measuring packet drops to avoid errors in dmesg:
https://lore.kernel.org/netdev/a73dda33-57f4-95d8-ea51-ed483abd6a7a@iogearbox.net/

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Laura García Liébana <nevola@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d25e2e93 14-Oct-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: restore NF_INET_NUMHOOKS

This definition is used by the iptables legacy UAPI, restore it.

Fixes: d3519cb89f6d ("netfilter: nf_tables: add inet ingress support")
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 60a3815d 07-Oct-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: add inet ingress support

This patch adds the NF_INET_INGRESS pseudohook for the NFPROTO_INET
family. This is a mapping this new hook to the existing NFPROTO_NETDEV
and NF_NETDEV_INGRESS hook. The hook does not guarantee that packets are
inet only, users must filter out non-ip traffic explicitly.

This infrastructure makes it easier to support this new hook in nf_tables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 357b6cc5 18-Mar-2020 Daniel Borkmann <daniel@iogearbox.net>

netfilter: revert introduction of egress hook

This reverts the following commits:

8537f78647c0 ("netfilter: Introduce egress hook")
5418d3881e1f ("netfilter: Generalize ingress hook")
b030f194aed2 ("netfilter: Rename ingress hook include file")

>From the discussion in [0], the author's main motivation to add a hook
in fast path is for an out of tree kernel module, which is a red flag
to begin with. Other mentioned potential use cases like NAT{64,46}
is on future extensions w/o concrete code in the tree yet. Revert as
suggested [1] given the weak justification to add more hooks to critical
fast-path.

[0] https://lore.kernel.org/netdev/cover.1583927267.git.lukas@wunner.de/
[1] https://lore.kernel.org/netdev/20200318.011152.72770718915606186.davem@davemloft.net/

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Miller <davem@davemloft.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Nacked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8537f786 10-Mar-2020 Lukas Wunner <lukas@wunner.de>

netfilter: Introduce egress hook

Commit e687ad60af09 ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key") introduced the ability to
classify packets on ingress.

Allow the same on egress. Position the hook immediately before a packet
is handed to tc and then sent out on an interface, thereby mirroring the
ingress order. This order allows marking packets in the netfilter
egress hook and subsequently using the mark in tc. Another benefit of
this order is consistency with a lot of existing documentation which
says that egress tc is performed after netfilter hooks.

Egress hooks already exist for the most common protocols, such as
NF_INET_LOCAL_OUT or NF_ARP_OUT, and those are to be preferred because
they are executed earlier during packet processing. However for more
exotic protocols, there is currently no provision to apply netfilter on
egress. A common workaround is to enslave the interface to a bridge and
use ebtables, or to resort to tc. But when the ingress hook was
introduced, consensus was that users should be given the choice to use
netfilter or tc, whichever tool suits their needs best:
https://lore.kernel.org/netdev/20150430153317.GA3230@salvia/
This hook is also useful for NAT46/NAT64, tunneling and filtering of
locally generated af_packet traffic such as dhclient.

There have also been occasional user requests for a netfilter egress
hook in the past, e.g.:
https://www.spinics.net/lists/netfilter/msg50038.html

Performance measurements with pktgen surprisingly show a speedup rather
than a slowdown with this commit:

* Without this commit:
Result: OK: 34240933(c34238375+d2558) usec, 100000000 (60byte,0frags)
2920481pps 1401Mb/sec (1401830880bps) errors: 0

* With this commit:
Result: OK: 33997299(c33994193+d3106) usec, 100000000 (60byte,0frags)
2941410pps 1411Mb/sec (1411876800bps) errors: 0

* Without this commit + tc egress:
Result: OK: 39022386(c39019547+d2839) usec, 100000000 (60byte,0frags)
2562631pps 1230Mb/sec (1230062880bps) errors: 0

* With this commit + tc egress:
Result: OK: 37604447(c37601877+d2570) usec, 100000000 (60byte,0frags)
2659259pps 1276Mb/sec (1276444320bps) errors: 0

* With this commit + nft egress:
Result: OK: 41436689(c41434088+d2600) usec, 100000000 (60byte,0frags)
2413320pps 1158Mb/sec (1158393600bps) errors: 0

Tested on a bare-metal Core i7-3615QM, each measurement was performed
three times to verify that the numbers are stable.

Commands to perform a measurement:
modprobe pktgen
echo "add_device lo@3" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i 'lo@3' -n 100000000

Commands for testing tc egress:
tc qdisc add dev lo clsact
tc filter add dev lo egress protocol ip prio 1 u32 match ip dst 4.3.2.1/32

Commands for testing nft egress:
nft add table netdev t
nft add chain netdev t co \{ type filter hook egress device lo priority 0 \; \}
nft add rule netdev t co ip daddr 4.3.2.1/32 drop

All testing was performed on the loopback interface to avoid distorting
measurements by the packet handling in the low-level Ethernet driver.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c3e93059 12-Nov-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: remove NFC_* cache bits

These are very very (for long time unused) caching infrastructure
definition, remove then. They have nothing to do with the NFC subsystem.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6f52b16c 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX license identifier to uapi header files with no license

Many user space API headers are missing licensing information, which
makes it hard for compliance tools to determine the correct license.

By default are files without license information under the default
license of the kernel, which is GPLV2. Marking them GPLV2 would exclude
them from being included in non GPLV2 code, which is obviously not
intended. The user space API headers fall under the syscall exception
which is in the kernels COPYING file:

NOTE! This copyright does *not* cover user programs that use kernel
services by normal system calls - this is merely considered normal use
of the kernel, and does *not* fall under the heading of "derived work".

otherwise syscall usage would not be possible.

Update the files which contain no license information with an SPDX
license identifier. The chosen identifier is 'GPL-2.0 WITH
Linux-syscall-note' which is the officially assigned identifier for the
Linux syscall exception. SPDX license identifiers are a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne. See the previous patch in this series for the
methodology of how this patch was researched.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 47b1f6fd 22-Feb-2017 Dmitry V. Levin <ldv@altlinux.org>

uapi: stop including linux/sysctl.h in uapi/linux/netfilter.h

linux/netfilter.h is the last uapi header file that includes
linux/sysctl.h but it does not depend on definitions provided
by this essentially dead header file.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 06fd3a39 03-Nov-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: deprecate NF_STOP

NF_STOP is only used by br_netfilter these days, and it can be emulated
with a combination of NF_STOLEN plus explicit call to the ->okfn()
function as Florian suggests.

To retain binary compatibility with userspace nf_queue application, we
have to keep NF_STOP around, so libnetfilter_queue userspace userspace
applications still work if they use NF_STOP for some exotic reason.

Out of tree modules using NF_STOP would break, but we don't care about
those.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a263653e 17-Jun-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: don't pull include/linux/netfilter.h from netns headers

This pulls the full hook netfilter definitions from all those that include
net_namespace.h.

Instead let's just include the bare minimum required in the new
linux/netfilter_defs.h file, and use it from the netfilter netns header files.

I also needed to include in.h and in6.h from linux/netfilter.h otherwise we hit
this compilation error:

In file included from include/linux/netfilter_defs.h:4:0,
from include/net/netns/netfilter.h:4,
from include/net/net_namespace.h:22,
from include/linux/netdevice.h:43,
from net/netfilter/nfnetlink_queue_core.c:23:
include/uapi/linux/netfilter.h:76:17: error: field ‘in’ has incomplete type struct in_addr in;

And also explicit include linux/netfilter.h in several spots.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>


# e687ad60 13-May-2015 Pablo Neira <pablo@netfilter.org>

netfilter: add netfilter ingress hook after handle_ing() under unique static key

This patch adds the Netfilter ingress hook just after the existing tc ingress
hook, that seems to be the consensus solution for this.

Note that the Netfilter hook resides under the global static key that enables
ingress filtering. Nonetheless, Netfilter still also has its own static key for
minimal impact on the existing handle_ing().

* Without this patch:

Result: OK: 6216490(c6216338+d152) usec, 100000000 (60byte,0frags)
16086246pps 7721Mb/sec (7721398080bps) errors: 100000000

42.46% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core
25.92% kpktgend_0 [kernel.kallsyms] [k] kfree_skb
7.81% kpktgend_0 [pktgen] [k] pktgen_thread_worker
5.62% kpktgend_0 [kernel.kallsyms] [k] ip_rcv
2.70% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal
2.34% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk
1.44% kpktgend_0 [kernel.kallsyms] [k] __build_skb

* With this patch:

Result: OK: 6214833(c6214731+d101) usec, 100000000 (60byte,0frags)
16090536pps 7723Mb/sec (7723457280bps) errors: 100000000

41.23% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core
26.57% kpktgend_0 [kernel.kallsyms] [k] kfree_skb
7.72% kpktgend_0 [pktgen] [k] pktgen_thread_worker
5.55% kpktgend_0 [kernel.kallsyms] [k] ip_rcv
2.78% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal
2.06% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk
1.43% kpktgend_0 [kernel.kallsyms] [k] __build_skb

* Without this patch + tc ingress:

tc filter add dev eth4 parent ffff: protocol ip prio 1 \
u32 match ip dst 4.3.2.1/32

Result: OK: 9269001(c9268821+d179) usec, 100000000 (60byte,0frags)
10788648pps 5178Mb/sec (5178551040bps) errors: 100000000

40.99% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core
17.50% kpktgend_0 [kernel.kallsyms] [k] kfree_skb
11.77% kpktgend_0 [cls_u32] [k] u32_classify
5.62% kpktgend_0 [kernel.kallsyms] [k] tc_classify_compat
5.18% kpktgend_0 [pktgen] [k] pktgen_thread_worker
3.23% kpktgend_0 [kernel.kallsyms] [k] tc_classify
2.97% kpktgend_0 [kernel.kallsyms] [k] ip_rcv
1.83% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal
1.50% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk
0.99% kpktgend_0 [kernel.kallsyms] [k] __build_skb

* With this patch + tc ingress:

tc filter add dev eth4 parent ffff: protocol ip prio 1 \
u32 match ip dst 4.3.2.1/32

Result: OK: 9308218(c9308091+d126) usec, 100000000 (60byte,0frags)
10743194pps 5156Mb/sec (5156733120bps) errors: 100000000

42.01% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core
17.78% kpktgend_0 [kernel.kallsyms] [k] kfree_skb
11.70% kpktgend_0 [cls_u32] [k] u32_classify
5.46% kpktgend_0 [kernel.kallsyms] [k] tc_classify_compat
5.16% kpktgend_0 [pktgen] [k] pktgen_thread_worker
2.98% kpktgend_0 [kernel.kallsyms] [k] ip_rcv
2.84% kpktgend_0 [kernel.kallsyms] [k] tc_classify
1.96% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal
1.57% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk

Note that the results are very similar before and after.

I can see gcc gets the code under the ingress static key out of the hot path.
Then, on that cold branch, it generates the code to accomodate the netfilter
ingress static key. My explanation for this is that this reduces the pressure
on the instruction cache for non-users as the new code is out of the hot path,
and it comes with minimal impact for tc ingress users.

Using gcc version 4.8.4 on:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
[...]
L1d cache: 16K
L1i cache: 64K
L2 cache: 2048K
L3 cache: 8192K

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1d49144c 02-Jan-2014 Patrick McHardy <kaber@trash.net>

netfilter: nf_tables: add "inet" table for IPv4/IPv6

This patch adds a new table family and a new filter chain that you can
use to attach IPv4 and IPv6 rules. This should help to simplify
rule-set maintainance in dual-stack setups.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 607ca46e 13-Oct-2012 David Howells <dhowells@redhat.com>

UAPI: (Scripted) Disintegrate include/linux

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>