History log of /linux-master/net/netfilter/nfnetlink_queue.c
Revision Date Author Comments
# 3f801968 14-Feb-2024 Florian Westphal <fw@strlen.de>

netfilter: move nf_reinject into nfnetlink_queue modules

No need to keep this in the core, move it to the nfnetlink_queue module.
nf_reroute is moved too, there were no other callers.

Signed-off-by: Florian Westphal <fw@strlen.de>


# f82777e8 06-Feb-2024 Florian Westphal <fw@strlen.de>

netfilter: nfnetlink_queue: un-break NF_REPEAT

Only override userspace verdict if the ct hook returns something
other than ACCEPT.

Else, this replaces NF_REPEAT (run all hooks again) with NF_ACCEPT
(move to next hook).

Fixes: 6291b3a67ad5 ("netfilter: conntrack: convert nf_conntrack_update to netfilter verdicts")
Reported-by: l.6diay@passmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6291b3a6 11-Oct-2023 Florian Westphal <fw@strlen.de>

netfilter: conntrack: convert nf_conntrack_update to netfilter verdicts

This function calls helpers that can return nf-verdicts, but then
those get converted to -1/0 as thats what the caller expects.

Theoretically NF_DROP could have an errno number set in the upper 24
bits of the return value. Or any of those helpers could return
NF_STOLEN, which would result in use-after-free.

This is fine as-is, the called functions don't do this yet.

But its better to avoid possible future problems if the upcoming
patchset to add NF_DROP_REASON() support gains further users, so remove
the 0/-1 translation from the picture and pass the verdicts down to
the caller.

Signed-off-by: Florian Westphal <fw@strlen.de>


# d457a0e3 08-Jun-2023 Eric Dumazet <edumazet@google.com>

net: move gso declarations and functions to their own files

Move declarations into include/net/gso.h and code into net/core/gso.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 28c1b6df 27-Mar-2023 Eric Sage <eric_sage@apple.com>

netfilter: nfnetlink_queue: enable classid socket info retrieval

This enables associating a socket with a v1 net_cls cgroup. Useful for
applying a per-cgroup policy when processing packets in userspace.

Signed-off-by: Eric Sage <eric_sage@apple.com>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 99a63d36 25-Jul-2022 Florian Westphal <fw@strlen.de>

netfilter: nf_queue: do not allow packet truncation below transport header offset

Domingo Dirutigliano and Nicola Guerrera report kernel panic when
sending nf_queue verdict with 1-byte nfta_payload attribute.

The IP/IPv6 stack pulls the IP(v6) header from the packet after the
input hook.

If user truncates the packet below the header size, this skb_pull() will
result in a malformed skb (skb->len < 0).

Fixes: 7af4cc3fa158 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink")
Reported-by: Domingo Dirutigliano <pwnzer0tt1@proton.me>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 80fcec67 02-Mar-2022 Martin KaFai Lau <kafai@fb.com>

net: Get rcv tstamp if needed in nfnetlink_{log, queue}.c

If skb has the (rcv) timestamp available, nfnetlink_{log, queue}.c
logs/outputs it to the userspace. When the locally generated skb is
looping from egress to ingress over a virtual interface (e.g. veth,
loopback...), skb->tstamp may have the delivery time before it is
known that will be delivered locally and received by another sk. Like
handling the delivery time in network tapping, use ktime_get_real() to
get the (rcv) timestamp. The earlier added helper skb_tstamp_cond() is
used to do this. false is passed to the second 'cond' arg such
that doing ktime_get_real() or not only depends on the
netstamp_needed_key static key.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 98eee88b 04-Feb-2022 Nicolas Dichtel <nicolas.dichtel@6wind.com>

nfqueue: enable to set skb->priority

This is a follow up of the previous patch that enables to get
skb->priority. It's now posssible to set it also.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8b541364 17-Jan-2022 Nicolas Dichtel <nicolas.dichtel@6wind.com>

netfilter: nfqueue: enable to get skb->priority

This info could be useful to improve traffic analysis.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c3873070 27-Feb-2022 Florian Westphal <fw@strlen.de>

netfilter: nf_queue: fix possible use-after-free

Eric Dumazet says:
The sock_hold() side seems suspect, because there is no guarantee
that sk_refcnt is not already 0.

On failure, we cannot queue the packet and need to indicate an
error. The packet will be dropped by the caller.

v2: split skb prefetch hunk into separate change

Fixes: 271b72c7fa82c ("udp: RCU handling for Unicast packets.")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 285c8a7a 06-Jan-2022 Florian Westphal <fw@strlen.de>

netfilter: make function op structures const

No functional changes, these structures should be const.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c5fc837b 19-Nov-2021 Florian Westphal <fw@strlen.de>

netfilter: nf_queue: remove leftover synchronize_rcu

Its no longer needed after commit 870299707436
("netfilter: nf_queue: move hookfn registration out of struct net").

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ebb966d3 10-Dec-2021 Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>

netfilter: fix regression in looped (broad|multi)cast's MAC handling

In commit 5648b5e1169f ("netfilter: nfnetlink_queue: fix OOB when mac
header was cleared"), the test for non-empty MAC header introduced in
commit 2c38de4c1f8da7 ("netfilter: fix looped (broad|multi)cast's MAC
handling") has been replaced with a test for a set MAC header.

This breaks the case when the MAC header has been reset (using
skb_reset_mac_header), as is the case with looped-back multicast
packets. As a result, the packets ending up in NFQUEUE get a bogus
hwaddr interpreted from the first bytes of the IP header.

This patch adds a test for a non-empty MAC header in addition to the
test for a set MAC header. The same two tests are also implemented in
nfnetlink_log.c, where the initial code of commit 2c38de4c1f8da7
("netfilter: fix looped (broad|multi)cast's MAC handling") has not been
touched, but where supposedly the same situation may happen.

Fixes: 5648b5e1169f ("netfilter: nfnetlink_queue: fix OOB when mac header was cleared")
Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b43c2793 26-Nov-2021 Florian Westphal <fw@strlen.de>

netfilter: nfnetlink_queue: silence bogus compiler warning

net/netfilter/nfnetlink_queue.c:601:36: warning: variable 'ctinfo' is
uninitialized when used here [-Wuninitialized]
if (ct && nfnl_ct->build(skb, ct, ctinfo, NFQA_CT, NFQA_CT_INFO) < 0)

ctinfo is only uninitialized if ct == NULL. Init it to 0 to silence this.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 5648b5e1 20-Oct-2021 Florian Westphal <fw@strlen.de>

netfilter: nfnetlink_queue: fix OOB when mac header was cleared

On 64bit platforms the MAC header is set to 0xffff on allocation and
also when a helper like skb_unset_mac_header() is called.

dev_parse_header may call skb_mac_header() which assumes valid mac offset:

BUG: KASAN: use-after-free in eth_header_parse+0x75/0x90
Read of size 6 at addr ffff8881075a5c05 by task nf-queue/1364
Call Trace:
memcpy+0x20/0x60
eth_header_parse+0x75/0x90
__nfqnl_enqueue_packet+0x1a61/0x3380
__nf_queue+0x597/0x1300
nf_queue+0xf/0x40
nf_hook_slow+0xed/0x190
nf_hook+0x184/0x440
ip_output+0x1c0/0x2a0
nf_reinject+0x26f/0x700
nfqnl_recv_verdict+0xa16/0x18b0
nfnetlink_rcv_msg+0x506/0xe70

The existing code only works if the skb has a mac header.

Fixes: 2c38de4c1f8da7 ("netfilter: fix looped (broad|multi)cast's MAC handling")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 87029970 04-Aug-2021 Florian Westphal <fw@strlen.de>

netfilter: nf_queue: move hookfn registration out of struct net

This was done to detect when the pernet->init() function was not called
yet, by checking if net->nf.queue_handler is NULL.

Once the nfnetlink_queue module is active, all struct net pointers
contain the same address. So place this back in nf_queue.c.

Handle the 'netns error unwind' test by checking nfnl_queue_net for a
NULL pointer and add a comment for this.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ef4b65e5 30-May-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: add struct nfgenmsg to struct nfnl_info and use it

Update the nfnl_info structure to add a pointer to the nfnetlink header.
This simplifies the existing codebase since this header is usually
accessed. Update existing clients to use this new field.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 50f2db9e 22-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: consolidate callback types

Add enum nfnl_callback_type to identify the callback type to provide one
single callback.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 797d4980 22-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: pass struct nfnl_info to rcu callbacks

Update rcu callbacks to use the nfnl_info structure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a6555365 22-Apr-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: add struct nfnl_info and pass it to callbacks

Add a new structure to reduce callback footprint and to facilite
extensions of the nfnetlink callback interface in the future.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 19c28b13 30-Mar-2021 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: add helper function to set up the nfnetlink header and use it

This patch adds a helper function to set up the netlink and nfnetlink headers.
Update existing codebase to use it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 83ace77f 20-Jan-2021 Florian Westphal <fw@strlen.de>

netfilter: ctnetlink: remove get_ct indirection

Use nf_ct_get() directly, its a small inline helper without dependencies.

Add CONFIG_NF_CONNTRACK guards to elide the relevant part when conntrack
isn't available at all.

v2: add ifdef guard around nf_ct_get call (kernel test robot)
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ee921183 23-Aug-2020 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: nfnetlink_unicast() reports EAGAIN instead of ENOBUFS

Frontend callback reports EAGAIN to nfnetlink to retry a command, this
is used to signal that module autoloading is required. Unfortunately,
nlmsg_unicast() reports EAGAIN in case the receiver socket buffer gets
full, so it enters a busy-loop.

This patch updates nfnetlink_unicast() to turn EAGAIN into ENOBUFS and
to use nlmsg_unicast(). Remove the flags field in nfnetlink_unicast()
since this is always MSG_DONTWAIT in the existing code which is exactly
what nlmsg_unicast() passes to netlink_unicast() as parameter.

Fixes: 96518518cc41 ("netfilter: add nftables")
Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 3f649ab7 03-Jun-2020 Kees Cook <keescook@chromium.org>

treewide: Remove uninitialized_var() usage

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var\(([^\)]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>


# dd3cc111 26-Mar-2020 Florian Westphal <fw@strlen.de>

netfilter: nf_queue: make nf_queue_entry_release_refs static

This is a preparation patch, no logical changes.
Move free_entry into core and rename it to something more sensible.

Will ease followup patches which will complicate the refcount handling.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2670ee77 13-Jan-2020 Jason A. Donenfeld <Jason@zx2c4.com>

net: netfilter: use skb_list_walk_safe helper for gso segments

This is a straight-forward conversion case for the new function, keeping
the flow of the existing code as intact as possible.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7e59b3fe 15-Jul-2019 yangxingwu <xingwu.yang@gmail.com>

netfilter: remove unnecessary spaces

This patch removes extra spaces.

Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# d2912cb1 04-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500

Based on 2 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 2cf6bffc 23-May-2019 Florian Westphal <fw@strlen.de>

netfilter: replace skb_make_writable with skb_ensure_writable

This converts all remaining users and then removes skb_make_writable.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8cb08174 26-Apr-2019 Johannes Berg <johannes.berg@intel.com>

netlink: make validation more configurable for future strictness

We currently have two levels of strict validation:

1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted

Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)

@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ae0be8de 26-Apr-2019 Michal Kubecek <mkubecek@suse.cz>

netlink: make nla_nest_start() add NLA_F_NESTED flag

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 916f6efa 16-Apr-2019 Florian Westphal <fw@strlen.de>

netfilter: never get/set skb->tstamp

setting net.netfilter.nf_conntrack_timestamp=1 breaks xmit with fq
scheduler. skb->tstamp might be "refreshed" using ktime_get_real(),
but fq expects CLOCK_MONOTONIC.

This patch removes all places in netfilter that check/set skb->tstamp:

1. To fix the bogus "start" time seen with conntrack timestamping for
outgoing packets, never use skb->tstamp and always use current time.
2. In nfqueue and nflog, only use skb->tstamp for incoming packets,
as determined by current hook (prerouting, input, forward).
3. xt_time has to use system clock as well rather than skb->tstamp.
We could still use skb->tstamp for prerouting/input/foward, but
I see no advantage to make this conditional.

Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
Cc: Eric Dumazet <edumazet@google.com>
Reported-by: Michal Soltys <soltys@ziu.info>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c4b0e771 18-Dec-2018 Florian Westphal <fw@strlen.de>

netfilter: avoid using skb->nf_bridge directly

This pointer is going to be removed soon, so use the existing helpers in
more places to avoid noise when the removal happens.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 82eea4cf 08-Nov-2018 Michał Mirosław <mirq-linux@rere.qmqm.pl>

nfnetlink/queue: use __vlan_hwaccel helpers

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ad18d7bf 04-Sep-2018 Michal 'vorner' Vaner <michal.vaner@avast.com>

netfilter: nfnetlink_queue: Solve the NFQUEUE/conntrack clash for NF_REPEAT

NF_REPEAT places the packet at the beginning of the iptables chain
instead of accepting or rejecting it right away. The packet however will
reach the end of the chain and continue to the end of iptables
eventually, so it needs the same handling as NF_ACCEPT and NF_DROP.

Fixes: 368982cd7d1b ("netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracks")
Signed-off-by: Michal 'vorner' Vaner <michal.vaner@avast.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a8305bff 29-Jul-2018 David S. Miller <davem@davemloft.net>

net: Add and use skb_mark_not_on_list().

An SKB is not on a list if skb->next is NULL.

Codify this convention into a helper function and use it
where we are dequeueing an SKB and need to mark it as such.

Signed-off-by: David S. Miller <davem@davemloft.net>


# ba062ebb 13-Jun-2018 Eric Dumazet <edumazet@google.com>

netfilter: nf_queue: augment nfqa_cfg_policy

Three attributes are currently not verified, thus can trigger KMSAN
warnings such as :

BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline]
BUG: KMSAN: uninit-value in nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268
CPU: 1 PID: 4521 Comm: syz-executor120 Not tainted 4.17.0+ #5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:113
kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1117
__msan_warning_32+0x70/0xc0 mm/kmsan/kmsan_instr.c:620
__arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
__fswab32 include/uapi/linux/swab.h:59 [inline]
nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268
nfnetlink_rcv_msg+0xb2e/0xc80 net/netfilter/nfnetlink.c:212
netlink_rcv_skb+0x37e/0x600 net/netlink/af_netlink.c:2448
nfnetlink_rcv+0x2fe/0x680 net/netfilter/nfnetlink.c:513
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0x1680/0x1750 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:629 [inline]
sock_sendmsg net/socket.c:639 [inline]
___sys_sendmsg+0xec8/0x1320 net/socket.c:2117
__sys_sendmsg net/socket.c:2155 [inline]
__do_sys_sendmsg net/socket.c:2164 [inline]
__se_sys_sendmsg net/socket.c:2162 [inline]
__x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x43fd59
RSP: 002b:00007ffde0e30d28 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fd59
RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401680
R13: 0000000000401710 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189
kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315
kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322
slab_post_alloc_hook mm/slab.h:446 [inline]
slab_alloc_node mm/slub.c:2753 [inline]
__kmalloc_node_track_caller+0xb35/0x11b0 mm/slub.c:4395
__kmalloc_reserve net/core/skbuff.c:138 [inline]
__alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206
alloc_skb include/linux/skbuff.h:988 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876
sock_sendmsg_nosec net/socket.c:629 [inline]
sock_sendmsg net/socket.c:639 [inline]
___sys_sendmsg+0xec8/0x1320 net/socket.c:2117
__sys_sendmsg net/socket.c:2155 [inline]
__do_sys_sendmsg net/socket.c:2164 [inline]
__se_sys_sendmsg net/socket.c:2162 [inline]
__x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: fdb694a01f1f ("netfilter: Add fail-open support")
Fixes: 829e17a1a602 ("[NETFILTER]: nfnetlink_queue: allow changing queue length through netlink")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 368982cd 23-May-2018 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracks

In nfqueue, two consecutive skbuffs may race to create the conntrack
entry. Hence, the one that loses the race gets dropped due to clash in
the insertion into the hashes from the nf_conntrack_confirm() path.

This patch adds a new nf_conntrack_update() function which searches for
possible clashes and resolve them. NAT mangling for the packet losing
race is corrected by using the conntrack information that won race.

In order to avoid direct module dependencies with conntrack and NAT, the
nf_ct_hook and nf_nat_hook structures are used for this purpose.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# c3506372 10-Apr-2018 Christoph Hellwig <hch@lst.de>

proc: introduce proc_create_net{,_data}

Variants of proc_create{,_data} that directly take a struct seq_operations
and deal with network namespaces in ->open and ->release. All callers of
proc_create + seq_open_net converted over, and seq_{open,release}_net are
removed entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>


# 2f635cee 27-Mar-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Drop pernet_operations::async

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5191d70f 12-Mar-2018 Arushi Singhal <arushisinghal19971997@gmail.com>

netfilter: Replace printk() with pr_*() and define pr_fmt()

Using pr_<loglevel>() is more concise than printk(KERN_<LOGLEVEL>).
This patch:
* Replace printks having a log level with the appropriate
pr_*() macros.
* Define pr_fmt() to include relevant name.
* Remove redundant prefixes from pr_*() calls.
* Indent the code where possible.
* Remove the useless output messages.
* Remove periods from messages.

Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0c3d5a96 12-Mar-2018 Joe Perches <joe@perches.com>

net: drivers/net: Remove unnecessary skb_copy_expand OOM messages

skb_copy_expand without __GFP_NOWARN already does a dump_stack
on OOM so these messages are redundant.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bd54dce0 06-Mar-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Convert nfnl_queue_net_ops

These pernet_operations register and unregister net::nf::queue_handler
and /proc entry. The handler is accessed only under RCU, so this looks
safe to convert them.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4c87158d 15-Jan-2018 Alexey Dobriyan <adobriyan@gmail.com>

netfilter: delete /proc THIS_MODULE references

/proc has been ignoring struct file_operations::owner field for 10 years.
Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba
("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
inode->i_fop is initialized with proxy struct file_operations for
regular files:

- if (de->proc_fops)
- inode->i_fop = de->proc_fops;
+ if (de->proc_fops) {
+ if (S_ISREG(inode->i_mode))
+ inode->i_fop = &proc_reg_file_ops;
+ else
+ inode->i_fop = de->proc_fops;
+ }

VFS stopped pinning module at this point.

# ipvs
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 26888dfd 30-Nov-2017 Florian Westphal <fw@strlen.de>

netfilter: core: remove synchronize_net call if nfqueue is used

since commit 960632ece6949b ("netfilter: convert hook list to an array")
nfqueue no longer stores a pointer to the hook that caused the packet
to be queued. Therefore no extra synchronize_net() call is needed after
dropping the packets enqueued by the old rule blob.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 613d0776 12-Nov-2017 Vasily Averin <vvs@virtuozzo.com>

netfilter: exit_net cleanup check added

Be sure that lists initialized in net_init hook was return to initial
state.

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 14cd5d4a 23-Oct-2017 Mark Rutland <mark.rutland@arm.com>

locking/atomics, net/netlink/netfilter: Convert ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE()

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts netlink and netfilter code and comments to use
{READ,WRITE}_ONCE() consistently.

----
virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Westphal <fw@strlen.de>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-7-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 5da773a3 25-Jul-2017 Florian Westphal <fw@strlen.de>

netfilter: nfnetlink_queue: don't queue dying conntracks to userspace

When skb is queued to userspace it leaves softirq/rcu protection.
skb->nfct (via conntrack extensions such as helper) could then reference
modules that no longer exist if the conntrack was not yet confirmed.

nf_ct_iterate_destroy() will set the DYING bit for unconfirmed
conntracks, we therefore solve this race as follows:

1. take the queue spinlock.
2. check if the conntrack is unconfirmed and has dying bit set.
In this case, we must discard skb while we're still inside
rcu read-side section.
3. If nf_ct_iterate_destroy() is called right after the packet is queued
to userspace, it will be removed from the queue via
nf_ct_iterate_destroy -> nf_queue_nf_hook_drop.

When userspace sends the verdict (nfnetlink takes rcu read lock), there
are two cases to consider:

1. nf_ct_iterate_destroy() was called while packet was out.
In this case, skb will have been removed from the queue already
and no reinject takes place as we won't find a matching entry for the
packet id.

2. nf_ct_iterate_destroy() gets called right after verdict callback
found and removed the skb from queue list.

In this case, skb->nfct is marked as dying but it is still valid.
The skb will be dropped either in nf_conntrack_confirm (we don't
insert DYING conntracks into hash table) or when we try to queue
the skb again, but either events don't occur before the rcu read lock
is dropped.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0b35f603 18-Jul-2017 Taehee Yoo <ap420073@gmail.com>

netfilter: Remove duplicated rcu_read_lock.

This patch removes duplicate rcu_read_lock().

1. IPVS part:

According to Julian Anastasov's mention, contexts of ipvs are described
at: http://marc.info/?l=netfilter-devel&m=149562884514072&w=2, in summary:

- packet RX/TX: does not need locks because packets come from hooks.
- sync msg RX: backup server uses RCU locks while registering new
connections.
- ip_vs_ctl.c: configuration get/set, RCU locks needed.
- xt_ipvs.c: It is a netfilter match, running from hook context.

As result, rcu_read_lock and rcu_read_unlock can be removed from:

- ip_vs_core.c: all
- ip_vs_ctl.c:
- only from ip_vs_has_real_service
- ip_vs_ftp.c: all
- ip_vs_proto_sctp.c: all
- ip_vs_proto_tcp.c: all
- ip_vs_proto_udp.c: all
- ip_vs_xmit.c: all (contains only packet processing)

2. Netfilter part:

There are three types of functions that are guaranteed the rcu_read_lock().
First, as result, functions are only called by nf_hook():

- nf_conntrack_broadcast_help(), pptp_expectfn(), set_expected_rtp_rtcp().
- tcpmss_reverse_mtu(), tproxy_laddr4(), tproxy_laddr6().
- match_lookup_rt6(), check_hlist(), hashlimit_mt_common().
- xt_osf_match_packet().

Second, functions that caller already held the rcu_read_lock().
- destroy_conntrack(), ctnetlink_conntrack_event().
- ctnl_timeout_find_get(), nfqnl_nf_hook_drop().

Third, functions that are mixed with type1 and type2.

These functions are called by nf_hook() also these are called by
ordinary functions that already held the rcu_read_lock():

- __ctnetlink_glue_build(), ctnetlink_expect_event().
- ctnetlink_proto_size().

Applied files are below:

- nf_conntrack_broadcast.c, nf_conntrack_core.c, nf_conntrack_netlink.c.
- nf_conntrack_pptp.c, nf_conntrack_sip.c, nfnetlink_cttimeout.c.
- nfnetlink_queue.c, xt_TCPMSS.c, xt_TPROXY.c, xt_addrtype.c.
- xt_connlimit.c, xt_hashlimit.c, xt_osf.c

Detailed calltrace can be found at:
http://marc.info/?l=netfilter-devel&m=149667610710350&w=2

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 04ba724b 19-Jun-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: extended ACK reporting

Pass down struct netlink_ext_ack as parameter to all of our nfnetlink
subsystem callbacks, so we can work on follow up patches to provide
finer grain error reporting using the new infrastructure that
2d4bc93368f5 ("netlink: extended ACK reporting") provides.

No functional change, just pass down this new object to callbacks.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4df864c1 16-Jun-2017 Johannes Berg <johannes.berg@intel.com>

networking: make skb_put & friends return void pointers

It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:

@@
expression SKB, LEN;
typedef u8;
identifier fn = { skb_put, __skb_put };
@@
- *(fn(SKB, LEN))
+ *(u8 *)fn(SKB, LEN)

@@
expression E, SKB, LEN;
identifier fn = { skb_put, __skb_put };
type T;
@@
- E = ((T *)(fn(SKB, LEN)))
+ E = fn(SKB, LEN)

which actually doesn't cover pskb_put since there are only three
users overall.

A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 039b40ee 24-Apr-2017 Florian Westphal <fw@strlen.de>

netfilter: nf_queue: only call synchronize_net twice if nf_queue is active

nf_unregister_net_hook(s) can avoid a second call to synchronize_net,
provided there is no nfqueue active in that net namespace (which is
the common case).

This also gets rid of the extra arg to nf_queue_nf_hook_drop(), normally
this gets called during netns cleanup so no packets should be queued.

For the rare case of base chain being unregistered or module removal
while nfqueue is in use the extra hiccup due to the packet drops isn't
a big deal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fceb6435 12-Apr-2017 Johannes Berg <johannes.berg@intel.com>

netlink: pass extended ACK struct to parsing functions

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d4ef3835 02-Apr-2017 Arushi Singhal <arushisinghal19971997@gmail.com>

netfilter: Remove exceptional & on function name

Remove & from function pointers to conform to the style found elsewhere
in the file. Done using the following semantic patch

// <smpl>
@r@
identifier f;
@@

f(...) { ... }
@@
identifier r.f;
@@

- &f
+ f
// </smpl>

Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# dedb67c4 28-Mar-2017 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: Add nfnl_msg_type() helper function

Add and use nfnl_msg_type() function to replace opencoded nfnetlink
message type. I suggested this change, Arushi Singhal made an initial
patch to address this but was missing several spots.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 77c1c03c 28-Mar-2017 Liping Zhang <zlpnobody@gmail.com>

netfilter: nfnetlink_queue: fix secctx memory leak

We must call security_release_secctx to free the memory returned by
security_secid_to_secctx, otherwise memory may be leaked forever.

Fixes: ef493bd930ae ("netfilter: nfnetlink_queue: add security context information")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2456e855 25-Dec-2016 Thomas Gleixner <tglx@linutronix.de>

ktime: Get rid of the union

ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>


# c7d03a00 16-Nov-2016 Alexey Dobriyan <adobriyan@gmail.com>

netns: make struct pernet_operations::id unsigned int

Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

void f(long *p, int i)
{
g(p[i]);
}

roughly translates to

movsx rsi, esi
mov rdi, [rsi+...]
call g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
function old new delta
nfsd4_lock 3886 3959 +73
tipc_link_build_proto_msg 1096 1140 +44
mac80211_hwsim_new_radio 2776 2808 +32
tipc_mon_rcv 1032 1058 +26
svcauth_gss_legacy_init 1413 1429 +16
tipc_bcbase_select_primary 379 392 +13
nfsd4_exchange_id 1247 1260 +13
nfsd4_setclientid_confirm 782 793 +11
...
put_client_renew_locked 494 480 -14
ip_set_sockfn_get 730 716 -14
geneve_sock_add 829 813 -16
nfsd4_sequence_done 721 703 -18
nlmclnt_lookup_host 708 686 -22
nfsd4_lockt 1085 1063 -22
nfs_get_client 1077 1050 -27
tcf_bpf_init 1106 1076 -30
nfsd4_encode_fattr 5997 5930 -67
Total: Before=154856051, After=154854321, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 01886bd9 03-Nov-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: remove hook_entries field from nf_hook_state

This field is only useful for nf_queue, so store it in the
nf_queue_entry structure instead, away from the core path. Pass
hook_head to nf_hook_slow().

Since we always have a valid entry on the first iteration in
nf_iterate(), we can use 'do { ... } while (entry)' loop instead.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 886bc503 30-Oct-2016 Florian Westphal <fw@strlen.de>

netfilter: nf_queue: place volatile data in own cacheline

As the comment indicates, the data at the end of nfqnl_instance struct is
written on every queue/dequeue, so it should reside in its own cacheline.

Before this change, 'lock' was in first cacheline so we dirtied both.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e3b37f11 21-Sep-2016 Aaron Conole <aconole@bytheb.org>

netfilter: replace list_head with single linked list

The netfilter hook list never uses the prev pointer, and so can be trimmed to
be a simple singly-linked list.

In addition to having a more light weight structure for hook traversal,
struct net becomes 5568 bytes (down from 6400) and struct net_device becomes
2176 bytes (down from 2240).

Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# e2361cb9 21-Sep-2016 Aaron Conole <aconole@bytheb.org>

netfilter: Remove explicit rcu_read_lock in nf_hook_slow

All of the callers of nf_hook_slow already hold the rcu_read_lock, so this
cleanup removes the recursive call. This is just a cleanup, as the locking
code gracefully handles this situation.

Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 4e6577de 09-Sep-2016 Gao Feng <fgao@ikuai8.com>

netfilter: Add the missed return value check of register_netdevice_notifier

There are some codes of netfilter module which did not check the return
value of register_netdevice_notifier. Add the checks now.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 00a3101f 08-Aug-2016 Liping Zhang <liping.zhang@spreadtrum.com>

netfilter: nfnetlink_queue: reject verdict request from different portid

Like NFQNL_MSG_VERDICT_BATCH do, we should also reject the verdict
request when the portid is not same with the initial portid(maybe
from another process).

Fixes: 97d32cf9440d ("netfilter: nfnetlink_queue: batch verdict support")
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# dc3ee32e 13-May-2016 Eric W. Biederman <ebiederm@xmission.com>

netfilter: nf_queue: Make the queue_handler pernet

Florian Weber reported:
> Under full load (unshare() in loop -> OOM conditions) we can
> get kernel panic:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
> [..]
> task: ffff88012dfa3840 ti: ffff88012dffc000 task.ti: ffff88012dffc000
> RIP: 0010:[<ffffffff81476c85>] [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
> RSP: 0000:ffff88012dfffd80 EFLAGS: 00010206
> RAX: 0000000000000008 RBX: ffffffff81add0c0 RCX: ffff88013fd80000
> [..]
> Call Trace:
> [<ffffffff81474d98>] nf_queue_nf_hook_drop+0x18/0x20
> [<ffffffff814738eb>] nf_unregister_net_hook+0xdb/0x150
> [<ffffffff8147398f>] netfilter_net_exit+0x2f/0x60
> [<ffffffff8141b088>] ops_exit_list.isra.4+0x38/0x60
> [<ffffffff8141b652>] setup_net+0xc2/0x120
> [<ffffffff8141bd09>] copy_net_ns+0x79/0x120
> [<ffffffff8106965b>] create_new_namespaces+0x11b/0x1e0
> [<ffffffff810698a7>] unshare_nsproxy_namespaces+0x57/0xa0
> [<ffffffff8104baa2>] SyS_unshare+0x1b2/0x340
> [<ffffffff81608276>] entry_SYSCALL_64_fastpath+0x1e/0xa8
> Code: 65 00 48 89 e5 41 56 41 55 41 54 53 83 e8 01 48 8b 97 70 12 00 00 48 98 49 89 f4 4c 8b 74 c2 18 4d 8d 6e 08 49 81 c6 88 00 00 00 <49> 8b 5d 00 48 85 db 74 1a 48 89 df 4c 89 e2 48 c7 c6 90 68 47
>

The simple fix for this requires a new pernet variable for struct
nf_queue that indicates when it is safe to use the dynamically
allocated nf_queue state.

As we need a variable anyway make nf_register_queue_handler and
nf_unregister_queue_handler pernet. This allows the existing logic of
when it is safe to use the state from the nfnetlink_queue module to be
reused with no changes except for making it per net.

The syncrhonize_rcu from nf_unregister_queue_handler is moved to a new
function nfnl_queue_net_exit_batch so that the worst case of having a
syncrhonize_rcu in the pernet exit path is not experienced in batch
mode.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a7f18845 12-May-2016 Florian Westphal <fw@strlen.de>

netfilter: nfnetlink_queue: fix timestamp attribute

Since 4.4 we erronously use timestamp of the netlink skb (which is zero).

Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1066
Fixes: b28b1e826f818c30ea7 ("netfilter: nfnetlink_queue: use y2038 safe timestamp")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8d45ff22 26-Mar-2016 Stephane Bryant <stephane.ml.bryant@gmail.com>

netfilter: bridge: nf queue verdict to use NFQA_VLAN and NFQA_L2HDR

This makes nf queues use NFQA_VLAN and NFQA_L2HDR in verdict to modify the
original skb

Signed-off-by: Stephane Bryant <stephane.ml.bryant@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 15824ab2 26-Mar-2016 Stephane Bryant <stephane.ml.bryant@gmail.com>

netfilter: bridge: pass L2 header and VLAN as netlink attributes in queues to userspace

- This creates 2 netlink attribute NFQA_VLAN and NFQA_L2HDR.
- These are filled up for the PF_BRIDGE family on the way to userspace.
- NFQA_VLAN is a nested attribute, with the NFQA_VLAN_PROTO and the
NFQA_VLAN_TCI carrying the corresponding vlan_proto and vlan_tci
fields from the skb using big endian ordering (and using the CFI
bit as the VLAN_TAG_PRESENT flag in vlan_tci as in the skb)

Signed-off-by: Stephane Bryant <stephane.ml.bryant@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 93140113 22-Mar-2016 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink_queue: honor NFQA_CFG_F_FAIL_OPEN when netlink unicast fails

When netlink unicast fails to deliver the message to userspace, we
should also check if the NFQA_CFG_F_FAIL_OPEN flag is set so we reinject
the packet back to the stack.

I think the user expects no packet drops when this flag is set due to
queueing to userspace errors, no matter if related to the internal queue
or when sending the netlink message to userspace.

The userspace application will still get the ENOBUFS error via recvmsg()
so the user still knows that, with the current configuration that is in
place, the userspace application is not consuming the messages at the
pace that the kernel needs.

Reported-by: "Yigal Reiss (yreiss)" <yreiss@cisco.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: "Yigal Reiss (yreiss)" <yreiss@cisco.com>


# c5b0db32 18-Feb-2016 Florian Westphal <fw@strlen.de>

nfnetlink: Revert "nfnetlink: add support for memory mapped netlink"

reverts commit 3ab1f683bf8b ("nfnetlink: add support for memory mapped
netlink")'

Like previous commits in the series, remove wrappers that are not needed
after mmapped netlink removal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 71b2e5f5 04-Jan-2016 Ken-ichirou MATSUZAWA <chamaken@gmail.com>

netfilter: nfnetlink_queue: autoload nf_conntrack_netlink module NFQA_CFG_F_CONNTRACK config flag

This patch enables to load nf_conntrack_netlink module if
NFQA_CFG_F_CONNTRACK config flag is specified.

Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 21c3c971 04-Jan-2016 Ken-ichirou MATSUZAWA <chamaken@gmail.com>

netfilter: nfnetlink_queue: just returns error for unknown command

This patch stops processing options for unknown command.

Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 17bc6b48 04-Jan-2016 Ken-ichirou MATSUZAWA <chamaken@gmail.com>

netfilter: nfnetlink_queue: don't handle options after unbind

This patch stops processing after destroying a queue instance.

Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 60d2c7f9 04-Jan-2016 Ken-ichirou MATSUZAWA <chamaken@gmail.com>

netfilter: nfnetlink_queue: validate dependencies to avoid breaking atomicity

Check that dependencies are fulfilled before updating the queue
instance, otherwise we can leave things in intermediate state on errors
in nfqnl_recv_config().

Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 7b8002a1 15-Dec-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink: pass down netns pointer to call() and call_rcu()

Adapt callsites to avoid recurrent lookup of the netns pointer.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 639e077b 06-Dec-2015 Nikolay Borisov <kernel@kyup.com>

netfilter: nfnetlink_queue: Unregister pernet subsys in case of init failure

Commit 3bfe049807c2403 ("netfilter: nfnetlink_{log,queue}:
Register pernet in first place") reorganised the initialisation
order of the pernet_subsys to avoid "use-before-initialised"
condition. However, in doing so the cleanup logic in nfnetlink_queue
got botched in that the pernet_subsys wasn't cleaned in case
nfnetlink_subsys_register failed. This patch adds the necessary
cleanup routine call.

Fixes: 3bfe049807c2403 ("netfilter: nfnetlink_{log,queue}: Register pernet in first place")
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8e662164 19-Nov-2015 Arnd Bergmann <arnd@arndb.de>

netfilter: nfnetlink_queue: avoid harmless unnitialized variable warnings

Several ARM default configurations give us warnings on recent
compilers about potentially uninitialized variables in the
nfnetlink code in two functions:

net/netfilter/nfnetlink_queue.c: In function 'nfqnl_build_packet_message':
net/netfilter/nfnetlink_queue.c:519:19: warning: 'nfnl_ct' may be used uninitialized in this function [-Wmaybe-uninitialized]
if (ct && nfnl_ct->build(skb, ct, ctinfo, NFQA_CT, NFQA_CT_INFO) < 0)

Moving the rcu_dereference(nfnl_ct_hook) call outside of the
conditional code avoids the warning without forcing us to
preinitialize the variable.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: a4b4766c3ceb ("netfilter: nfnetlink_queue: rename related to nfqueue attaching conntrack info")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ed78d09d 13-Oct-2015 Florian Westphal <fw@strlen.de>

netfilter: make nf_queue_entry_get_refs return void

We don't care if module is being unloaded anymore since hook unregister
handling will destroy queue entries using that hook.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# a4b4766c 04-Oct-2015 Ken-ichirou MATSUZAWA <chamaken@gmail.com>

netfilter: nfnetlink_queue: rename related to nfqueue attaching conntrack info

The idea of this series of patch is to attach conntrack information to
nflog like nfqueue has already done. nfqueue conntrack info attaching
basis is generic, rename those names to generic one, glue.

Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# b28b1e82 04-Oct-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink_queue: use y2038 safe timestamp

The __build_packet_message function fills a nfulnl_msg_packet_timestamp
structure that uses 64-bit seconds and is therefore y2038 safe, but
it uses an intermediate 'struct timespec' which is not.

This trivially changes the code to use 'struct timespec64' instead,
to correct the result on 32-bit architectures.

This is a copy and paste of Arnd's original patch for nfnetlink_log.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 32f40c5f 30-Sep-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: rename nfnetlink_queue_core.c to nfnetlink_queue.c

Now that we have integrated the ct glue code into nfnetlink_queue without
introducing dependencies with the conntrack code.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 8c88f87c 07-Jun-2012 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nfnetlink_queue: add NAT TCP sequence adjustment if packet mangled

User-space programs that receive traffic via NFQUEUE may mangle packets.
If NAT is enabled, this usually puzzles sequence tracking, leading to
traffic disruptions.

With this patch, nfnl_queue will make the corresponding NAT TCP sequence
adjustment if:

1) The packet has been mangled,
2) the NFQA_CFG_F_CONNTRACK flag has been set, and
3) NAT is detected.

There are some records on the Internet complaning about this issue:
http://stackoverflow.com/questions/260757/packet-mangling-utilities-besides-iptables

By now, we only support TCP since we have no helpers for DCCP or SCTP.
Better to add this if we ever have some helper over those layer 4 protocols.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 9cb01766 06-Jun-2012 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: add glue code to integrate nfnetlink_queue and ctnetlink

This patch allows you to include the conntrack information together
with the packet that is sent to user-space via NFQUEUE.

Previously, there was no integration between ctnetlink and
nfnetlink_queue. If you wanted to access conntrack information
from your libnetfilter_queue program, you required to query
ctnetlink from user-space to obtain it. Thus, delaying the packet
processing even more.

Including the conntrack information is optional, you can set it
via NFQA_CFG_F_CONNTRACK flag with the new NFQA_CFG_FLAGS attribute.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# fdb694a0 23-May-2012 Krishna Kumar <krkumar2@in.ibm.com>

netfilter: Add fail-open support

Implement a new "fail-open" mode where packets are not dropped
upon queue-full condition. This mode can be enabled/disabled per
queue using netlink NFQA_CFG_FLAGS & NFQA_CFG_MASK attributes.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Vivek Kashyap <vivk@us.ibm.com>
Signed-off-by: Sridhar Samudrala <samudrala@us.ibm.com>


# e87cc472 13-May-2012 Joe Perches <joe@perches.com>

net: Convert net_ratelimit uses to net_<level>_ratelimited

Standardize the net core ratelimited logging functions.

Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a447189e 29-Mar-2012 David S. Miller <davem@davemloft.net>

nfnetlink_queue: Stop using NLA_PUT*().

These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>


# c6675233 30-Aug-2011 Florian Westphal <fw@strlen.de>

netfilter: nf_queue: reject NF_STOLEN verdicts from userspace

A userspace listener may send (bogus) NF_STOLEN verdict, which causes skb leak.

This problem was previously fixed via
64507fdbc29c3a622180378210ecea8659b14e40 (netfilter:
nf_queue: fix NF_STOLEN skb leak) but this had to be reverted because
NF_STOLEN can also be returned by a netfilter hook when iterating the
rules in nf_reinject.

Reject userspace NF_STOLEN verdict, as suggested by Michal Miroslaw.

This is complementary to commit fad54440438a7c231a6ae347738423cbabc936d9
(netfilter: avoid double free in nf_reinject).

Cc: Julian Anastasov <ja@ssi.bg>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# 60063497 26-Jul-2011 Arun Sharma <asharma@fb.com>

atomic: use <linux/atomic.h>

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 97d32cf9 19-Jul-2011 Florian Westphal <fw@strlen.de>

netfilter: nfnetlink_queue: batch verdict support

Introduces a new nfnetlink type that applies a given
verdict to all queued packets with an id <= the id in the verdict
message.

If a mark is provided it is applied to all matched packets.

This reduces the number of verdicts that have to be sent.
Applications that make use of this feature need to maintain
a timeout to send a batchverdict periodically to avoid starvation.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# 5863702a 19-Jul-2011 Eric Dumazet <eric.dumazet@gmail.com>

netfilter: nfnetlink_queue: assert monotonic packet ids

Packet identifier is currently setup in nfqnl_build_packet_message(),
using one atomic_inc_return().

Problem is that since several cpus might concurrently call
nfqnl_enqueue_packet() for the same queue, we can deliver packets to
consumer in non monotonic way (packet N+1 being delivered after packet
N)

This patch moves the packet id setup from nfqnl_build_packet_message()
to nfqnl_enqueue_packet() to guarantee correct delivery order.

This also removes one atomic operation.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Pablo Neira Ayuso <pablo@netfilter.org>
CC: Eric Leblond <eric@regit.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# 84a797dd 18-Jul-2011 Eric Dumazet <eric.dumazet@gmail.com>

netfilter: nfnetlink_queue: provide rcu enabled callbacks

nenetlink_queue operations on SMP are not efficent if several queues are
used, because of nfnl_mutex contention when applications give packet
verdict.

Use new call_rcu field in struct nfnl_callback to advertize a callback
that is called under rcu_read_lock instead of nfnl_mutex.

On my 2x4x2 machine, I was able to reach 2.000.000 pps going through
user land returning NF_ACCEPT verdicts without losses, instead of less
than 500.000 pps before patch.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Eric Leblond <eric@regit.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# 2c38de4c 16-Jun-2011 Nicolas Cavallari <cavallar@lri.fr>

netfilter: fix looped (broad|multi)cast's MAC handling

By default, when broadcast or multicast packet are sent from a local
application, they are sent to the interface then looped by the kernel
to other local applications, going throught netfilter hooks in the
process.

These looped packet have their MAC header removed from the skb by the
kernel looping code. This confuse various netfilter's netlink queue,
netlink log and the legacy ip_queue, because they try to extract a
hardware address from these packets, but extracts a part of the IP
header instead.

This patch prevent NFQUEUE, NFLOG and ip_QUEUE to include a MAC header
if there is none in the packet.

Signed-off-by: Nicolas Cavallari <cavallar@lri.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# f1585086 18-Jan-2011 Florian Westphal <fw@strlen.de>

netfilter: nfnetlink_queue: return error number to caller

instead of returning -1 on error, return an error number to allow the
caller to handle some errors differently.

ECANCELED is used to indicate that the hook is going away and should be
ignored.

A followup patch will introduce more 'ignore this hook' conditions,
(depending on queue settings) and will move kfree_skb responsibility
to the caller.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# a53f4b61 01-Jul-2010 Paul E. McKenney <paulmck@kernel.org>

Revert "net: Make accesses to ->br_port safe for sparse RCU"

This reverts commit 81bdf5bd7349bd4523538cbd7878f334bc2bfe14, which is
obsoleted by commit f350a0a87374 from the net tree.


# f350a0a8 15-Jun-2010 Jiri Pirko <jpirko@redhat.com>

bridge: use rx_handler_data pointer to store net_bridge_port pointer

Register net_bridge_port pointer as rx_handler data pointer. As br_port is
removed from struct net_device, another netdev priv_flag is added to indicate
the device serves as a bridge port. Also rcuized pointers are now correctly
dereferenced in br_fdb.c and in netfilter parts.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 81bdf5bd 02-May-2010 Paul E. McKenney <paulmck@kernel.org>

net: Make accesses to ->br_port safe for sparse RCU

The new versions of the rcu_dereference() APIs requires that any pointers
passed to one of these APIs be fully defined. The ->br_port field
in struct net_device points to a struct net_bridge_port, which is an
incomplete type. This commit therefore changes ->br_port to be a void*,
and introduces a br_port() helper function to convert the type to struct
net_bridge_port, and applies this new helper function where required.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>


# c463ac97 09-Jun-2010 Eric Dumazet <eric.dumazet@gmail.com>

netfilter: nfnetlink_queue: some optimizations

- Use an atomic_t for id_sequence to avoid a spin_lock/spin_unlock pair

- Group highly modified struct nfqnl_instance fields together

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# e9f13cab 08-Apr-2010 Herbert Xu <herbert@gondor.apana.org.au>

netfilter: only do skb_checksum_help on CHECKSUM_PARTIAL in nfnetlink_queue

As we will set ip_summed to CHECKSUM_NONE when necessary in
nfqnl_mangle, there is no need to zap CHECKSUM_COMPLETE in
nfqnl_build_packet_message.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# ca1c2e2d 11-Feb-2010 Alexey Dobriyan <adobriyan@gmail.com>

netfilter: don't use INIT_RCU_HEAD()

call_rcu() will unconditionally reinitialize RCU head anyway.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# a5d896ad 18-Jan-2010 Eric Leblond <eric@inl.fr>

netfilter: nfnetlink_queue: simplify warning message

This patch remove variable part from a debug message to have
message concatenation from syslog.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# cd8c20b6 13-Jan-2010 Alexey Dobriyan <adobriyan@gmail.com>

netfilter: nfnetlink: netns support

Make nfnl socket per-petns.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# dee5817e 06-Nov-2009 Patrick McHardy <kaber@trash.net>

netfilter: remove unneccessary checks from netlink notifiers

The NETLINK_URELEASE notifier is only invoked for bound sockets, so
there is no need to check ->pid again.

Signed-off-by: Patrick McHardy <kaber@trash.net>


# 39938324 25-Aug-2009 Patrick McHardy <kaber@trash.net>

netfilter: nfnetlink: constify message attributes and headers

Signed-off-by: Patrick McHardy <kaber@trash.net>


# 67137f3c 07-Jun-2009 Jesper Dangaard Brouer <hawk@comx.dk>

nfnetlink_queue: Use rcu_barrier() on module unload.

This module uses rcu_call() thus it should use rcu_barrier() on module unload.

Also fixed a trivial typo 'nfetlink' -> 'nfnetlink' in comment.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 721499e8 19-Jul-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

netns: Use net_eq() to compare net-namespaces for optimization.

Without CONFIG_NET_NS, namespace is always &init_net.
Compiler will be able to omit namespace comparisons with this patch.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e64bda89 09-Jun-2008 Rami Rosen <ramirose@gmail.com>

netfilter: {ip,ip6,nfnetlink}_queue: misc cleanups

- No need to perform data_len = 0 in the switch command, since data_len
is initialized to 0 in the beginning of the ipq_build_packet_message()
method.

- {ip,ip6}_queue: We can reach nlmsg_failure only from one place; skb is
sure to be NULL when getting there; since skb is NULL, there is no need
to check this fact and call kfree_skb().

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9a732ed6 29-Apr-2008 Arnaud Ebalard <arno@natisbad.org>

netfilter: {nfnetlink,ip,ip6}_queue: fix skb_over_panic when enlarging packets

While reinjecting *bigger* modified versions of IPv6 packets using
libnetfilter_queue, things work fine on a 2.6.24 kernel (2.6.22 too)
but I get the following on recents kernels (2.6.25, trace below is
against today's net-2.6 git tree):

skb_over_panic: text:c04fddb0 len:696 put:632 head:f7592c00 data:f7592c00 tail:0xf7592eb8 end:0xf7592e80 dev:eth0
------------[ cut here ]------------
invalid opcode: 0000 [#1] PREEMPT
Process sendd (pid: 3657, ti=f6014000 task=f77c31d0 task.ti=f6014000)
Stack: c071e638 c04fddb0 000002b8 00000278 f7592c00 f7592c00 f7592eb8 f7592e80
f763c000 f6bc5200 f7592c40 f6015c34 c04cdbfc f6bc5200 00000278 f6015c60
c04fddb0 00000020 f72a10c0 f751b420 00000001 0000000a 000002b8 c065582c
Call Trace:
[<c04fddb0>] ? nfqnl_recv_verdict+0x1c0/0x2e0
[<c04cdbfc>] ? skb_put+0x3c/0x40
[<c04fddb0>] ? nfqnl_recv_verdict+0x1c0/0x2e0
[<c04fd115>] ? nfnetlink_rcv_msg+0xf5/0x160
[<c04fd03e>] ? nfnetlink_rcv_msg+0x1e/0x160
[<c04fd020>] ? nfnetlink_rcv_msg+0x0/0x160
[<c04f8ed7>] ? netlink_rcv_skb+0x77/0xa0
[<c04fcefc>] ? nfnetlink_rcv+0x1c/0x30
[<c04f8c73>] ? netlink_unicast+0x243/0x2b0
[<c04cfaba>] ? memcpy_fromiovec+0x4a/0x70
[<c04f9406>] ? netlink_sendmsg+0x1c6/0x270
[<c04c8244>] ? sock_sendmsg+0xc4/0xf0
[<c011970d>] ? set_next_entity+0x1d/0x50
[<c0133a80>] ? autoremove_wake_function+0x0/0x40
[<c0118f9e>] ? __wake_up_common+0x3e/0x70
[<c0342fbf>] ? n_tty_receive_buf+0x34f/0x1280
[<c011d308>] ? __wake_up+0x68/0x70
[<c02cea47>] ? copy_from_user+0x37/0x70
[<c04cfd7c>] ? verify_iovec+0x2c/0x90
[<c04c837a>] ? sys_sendmsg+0x10a/0x230
[<c011967a>] ? __dequeue_entity+0x2a/0xa0
[<c011970d>] ? set_next_entity+0x1d/0x50
[<c0345397>] ? pty_write+0x47/0x60
[<c033d59b>] ? tty_default_put_char+0x1b/0x20
[<c011d2e9>] ? __wake_up+0x49/0x70
[<c033df99>] ? tty_ldisc_deref+0x39/0x90
[<c033ff20>] ? tty_write+0x1a0/0x1b0
[<c04c93af>] ? sys_socketcall+0x7f/0x260
[<c0102ff9>] ? sysenter_past_esp+0x6a/0x91
[<c05f0000>] ? snd_intel8x0m_probe+0x270/0x6e0
=======================
Code: 00 00 89 5c 24 14 8b 98 9c 00 00 00 89 54 24 0c 89 5c 24 10 8b 40 50 89 4c 24 04 c7 04 24 38 e6 71 c0 89 44 24 08 e8 c4 46 c5 ff <0f> 0b eb fe 55 89 e5 56 89 d6 53 89 c3 83 ec 0c 8b 40 50 39 d0
EIP: [<c04ccdfc>] skb_over_panic+0x5c/0x60 SS:ESP 0068:f6015bf8


Looking at the code, I ended up in nfq_mangle() function (called by
nfqnl_recv_verdict()) which performs a call to skb_copy_expand() due to
the increased size of data passed to the function. AFAICT, it should ask
for 'diff' instead of 'diff - skb_tailroom(e->skb)'. Because the
resulting sk_buff has not enough space to support the skb_put(skb, diff)
call a few lines later, this results in the call to skb_over_panic().

The patch below asks for allocation of a copy with enough space for
mangled packet and the same amount of headroom as old sk_buff. While
looking at how the regression appeared (e2b58a67), I noticed the same
pattern in ipq_mangle_ipv6() and ipq_mangle_ipv4(). The patch corrects
those locations too.

Tested with bigger reinjected IPv6 packets (nfqnl_mangle() path), things
are ok (2.6.25 and today's net-2.6 git tree).

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8eeee8b1 27-Mar-2008 Denis V. Lunev <den@openvz.org>

[NETFILTER]: Replate direct proc_fops assignment with proc_create call.

This elliminates infamous race during module loading when one could lookup
proc entry without proc_fops assigned.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c346dca1 25-Mar-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.

Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>


# 914afea8 10-Mar-2008 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: fix EPERM when binding/unbinding and instance 0 exists

Similar to the nfnetlink_log problem, nfnetlink_queue incorrectly
returns -EPERM when binding or unbinding to an address family and
queueing instance 0 exists and is owned by a different process. Unlike
nfnetlink_log it previously completes the operation, but it is still
incorrect.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cabaa9bf 10-Mar-2008 Eric Leblond <eric@inl.fr>

[NETFILTER]: nfnetlink_queue: fix computation of allocated size for netlink skb.

Size of the netlink skb was wrongly computed because the formula was using
NLMSG_ALIGN instead of NLMSG_SPACE. NLMSG_ALIGN does not add the room for
netlink header as NLMSG_SPACE does. This was causing a failure of message
building in some cases.

On my test system, all messages for packets in range [8*k+41, 8*k+48] where k
is an integer were invalid and the corresponding packets were dropped.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e2b58a67 19-Feb-2008 Patrick McHardy <kaber@trash.net>

[NETFILTER]: {ip,ip6,nfnetlink}_queue: fix SKB_LINEAR_ASSERT when mangling packet data

As reported by Tomas Simonaitis <tomas.simonaitis@gmail.com>,
inserting new data in skbs queued over {ip,ip6,nfnetlink}_queue
triggers a SKB_LINEAR_ASSERT in skb_put().

Going back through the git history, it seems this bug is present since
at least 2.6.12-rc2, probably even since the removal of
skb_linearize() for netfilter.

Linearize non-linear skbs through skb_copy_expand() when enlarging
them. Tested by Thomas, fixes bugzilla #9933.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ca7c48ca 31-Jan-2008 Eric Dumazet <dada1@cosmosbay.com>

[NETFILTER]: Supress some sparse warnings

CHECK net/netfilter/nf_conntrack_expect.c
net/netfilter/nf_conntrack_expect.c:429:13: warning: context imbalance in 'exp_seq_start' - wrong count at exit
net/netfilter/nf_conntrack_expect.c:441:13: warning: context imbalance in 'exp_seq_stop' - unexpected unlock
CHECK net/netfilter/nf_log.c
net/netfilter/nf_log.c:105:13: warning: context imbalance in 'seq_start' - wrong count at exit
net/netfilter/nf_log.c:125:13: warning: context imbalance in 'seq_stop' - unexpected unlock
CHECK net/netfilter/nfnetlink_queue.c
net/netfilter/nfnetlink_queue.c:363:7: warning: symbol 'size' shadows an earlier one
net/netfilter/nfnetlink_queue.c:217:9: originally declared here
net/netfilter/nfnetlink_queue.c:847:13: warning: context imbalance in 'seq_start' - wrong count at exit
net/netfilter/nfnetlink_queue.c:859:13: warning: context imbalance in 'seq_stop' - unexpected unlock

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# baab2ce7 17-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_{queue,log}: return proper error codes in instance_create

Currently we return EINVAL for "instance exists", "allocation failed" and
"module unloaded below us", which is completely inapproriate.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cd21f0ac 17-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_{queue,log}: return ENOTSUPP for unknown cfg commands

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4ad9d4fa 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: update copyright

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0ef0f465 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: remove useless enqueue status codes

The queueing core doesn't care about the exact return value from
the queue handler, so there's no need to go through the trouble
of returning a meaningful value as long as we indicate an error.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 861934c7 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: eliminate impossible switch case

We don't need a default case in nfqnl_build_packet_message(), the
copy_mode is validated when it is set. Tell the compiler about
the possible types and remove the default case. Saves 80b of
text on x86_64.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ea3a66ff 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: use endianness-aware attribute functions

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9d6023ab 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: mark hash table __read_mostly

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2bd01197 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: remove useless debugging

Originally I wanted to just remove the QDEBUG macro and use pr_debug, but
none of the messages seems worth keeping.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c5de0dfd 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: kill useless wrapper

nfqnl_set_mode takes the queue lock and calls __nfqnl_set_mode. Just move
the code from __nfqnl_set_mode to nfqnl_set_mode since there is no other
user.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9872bec7 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink: use RCU for queue instances hash

Use RCU for queue instances hash. Avoids multiple atomic operations
for each packet.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a3c8e7fd 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: fix checks in nfqnl_recv_config

The peer_pid must be checked in all cases when a queue exists, currently
it is not checked if for NFQA_CFG_QUEUE_MAXLEN when a NFQA_CFG_CMD
attribute exists in some cases. Same for the queue existance check,
which can cause a NULL pointer dereference.

Also consistently return -ENODEV for "queue not found". -ENOENT would
be better, but that is already used to indicate a queued skb id doesn't
exist.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e48b9b2f 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: avoid unnecessary atomic operation

The sequence counter doesn't need to be an atomic_t, just move the increment
inside the locked section.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4b3d15ef 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: {nfnetlink,ip,ip6}_queue: kill issue_verdict

Now that issue_verdict doesn't need to free the queue entries anymore,
all it does is disable local BHs and call nf_reinject. Move the BH
disabling to the okfn invocation in nf_reinject and kill the
issue_verdict functions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 02f014d8 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nf_queue: move list_head/skb/id to struct nf_info

Move common fields for queue management to struct nf_info and rename it
to struct nf_queue_entry. The avoids one allocation/free per packet and
simplifies the code a bit.

Alternatively we could add some private room at the tail, but since
all current users use identical structs this seems easier.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b43d8d85 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: deobfuscate entry lookups

A queue entry lookup currently looks like this:

find_dequeue_entry -> __find_dequeue_entry ->
__find_entry -> cmpfn -> id_cmp

Use simple open-coded list walking and kill the cmpfn for
find_dequeue_entry. Instead add it to nfqnl_flush (after
similar cleanups) and use nfqnl_flush for both complete
flushes and flushing entries related to a device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0ac41e81 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: {nf_netlink,ip,ip6}_queue: use list_for_each_entry

Use list_add_tail/list_for_each_entry instead of list_add and
list_for_each_prev as a preparation for switching to RCU.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c01cd429 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nf_queue: move queueing related functions/struct to seperate header

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f9d8928f 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nf_queue: remove unused data pointer

Remove the data pointer from struct nf_queue_handler. It has never been used
and is useless for the only handler that really matters, nfnetlink_queue,
since the handler is shared between all instances.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e3ac5298 05-Dec-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nf_queue: make queue_handler const

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2ca7b0ac 14-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au>

[NETFILTER]: Avoid skb_copy/pskb_copy/skb_realloc_headroom

This patch replaces unnecessary uses of skb_copy, pskb_copy and
skb_realloc_headroom by functions such as skb_make_writable and
pskb_expand_head.

This allows us to remove the double pointers later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 37d41879 14-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au>

[NETFILTER]: Do not copy skb in skb_make_writable

Now that all callers of netfilter can guarantee that the skb is not shared,
we no longer have to copy the skb in skb_make_writable.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e2da5913 10-Oct-2007 Pavel Emelyanov <xemul@openvz.org>

[NETFILTER]: Make netfilter code use the seq_open_private

Just switch to the consolidated calls.

ipt_recent() has to initialize the private, so use
the __seq_open_private() helper.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5bf75853 28-Sep-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: use netlink policy

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fdf70832 28-Sep-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink: rename functions containing 'nfattr'

There is no struct nfattr anymore, rename functions to 'nlattr'.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# df6fb868 28-Sep-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink: convert to generic netlink attribute functions

Get rid of the duplicated rtnetlink macros and use the generic netlink
attribute functions. The old duplicated stuff is moved to a new header
file that exists just for userspace.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7c8d4cb4 28-Sep-2007 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink: make subsystem and callbacks const

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b95cce35 26-Sep-2007 Stephen Hemminger <shemminger@linux-foundation.org>

[NET]: Wrap hard_header_parse

Wrap the hard_header_parse function to simplify next step of
header_ops conversion.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b4b51029 12-Sep-2007 Eric W. Biederman <ebiederm@xmission.com>

[NET]: Support multiple network namespaces with netlink

Each netlink socket will live in exactly one network namespace,
this includes the controlling kernel sockets.

This patch updates all of the existing netlink protocols
to only support the initial network namespace. Request
by clients in other namespaces will get -ECONREFUSED.
As they would if the kernel did not have the support for
that netlink protocol compiled in.

As each netlink protocol is updated to be multiple network
namespace safe it can register multiple kernel sockets
to acquire a presence in the rest of the network namespaces.

The implementation in af_netlink is a simple filter implementation
at hash table insertion and hash table look up time.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e9dc8653 12-Sep-2007 Eric W. Biederman <ebiederm@xmission.com>

[NET]: Make device event notification network namespace safe

Every user of the network device notifiers is either a protocol
stack or a pseudo device. If a protocol stack that does not have
support for multiple network namespaces receives an event for a
device that is not in the initial network namespace it quite possibly
can get confused and do the wrong thing.

To avoid problems until all of the protocol stacks are converted
this patch modifies all netdev event handlers to ignore events on
devices that are not in the initial network namespace.

As the rest of the code is made network namespace aware these
checks can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 56b3d975 11-Jul-2007 Philippe De Muyter <phdm@macqel.be>

[NET]: Make all initialized struct seq_operations const.

Make all initialized struct seq_operations in net/ const

Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ce7663d8 07-Jul-2007 Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>

[NETFILTER]: nfnetlink_queue: don't unregister handler of other subsystem

The queue handlers registered by ip[6]_queue.ko at initialization should
not be unregistered according to requests from userland program
using nfnetlink_queue. If we allow that, there is no way to register
the handlers of built-in ip[6]_queue again.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 27d7ff46 31-Mar-2007 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[SK_BUFF]: Introduce skb_copy_to_linear_data{_offset}

To clearly state the intent of copying to linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>


# 1d00a4eb 23-Mar-2007 Thomas Graf <tgraf@suug.ch>

[NETLINK]: Remove error pointer from netlink message handler

The error pointer argument in netlink message handlers is used
to signal the special case where processing has to be interrupted
because a dump was started but no error happened. Instead it is
simpler and more clear to return -EINTR and have netlink_run_queue()
deal with getting the queue right.

nfnetlink passed on this error pointer to its subsystem handlers
but only uses it to signal the start of a netlink dump. Therefore
it can be removed there as well.

This patch also cleans up the error handling in the affected
message handlers to be consistent since it had to be touched anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 27a884dc 19-Apr-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

[SK_BUFF]: Convert skb->tail to sk_buff_data_t

So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)

Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b7aa0bf7 19-Apr-2007 Eric Dumazet <dada1@cosmosbay.com>

[NET]: convert network timestamps to ktime_t

We currently use a special structure (struct skb_timeval) and plain
'struct timeval' to store packet timestamps in sk_buffs and struct
sock.

This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16

I suggest using ktime_t that is a nice abstraction of high resolution
time services, currently capable of nanosecond resolution.

As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
a 8 byte shrink of this structure on 64bit architectures. Some other
structures also benefit from this size reduction (struct ipq in
ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)

Once this ktime infrastructure adopted, we can more easily provide
nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
SO_TIMESTAMPNS/SCM_TIMESTAMPNS)

Note : this patch includes a bug correction in
compat_sock_get_timestamp() where a "err = 0;" was missing (so this
syscall returned -ENOENT instead of 0)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
CC: John find <linux.kernel@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 601e68e1 12-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[NETFILTER]: Fix whitespace errors

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# da7071d7 12-Feb-2007 Arjan van de Ven <arjan@linux.intel.com>

[PATCH] mark struct file_operations const 8

Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 829e17a1 28-Nov-2006 Eric Leblond <eric@inl.fr>

[NETFILTER]: nfnetlink_queue: allow changing queue length through netlink

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>


# 82e91ffe 09-Nov-2006 Thomas Graf <tgraf@suug.ch>

[NET]: Turn nfmark into generic mark

nfmark is being used in various subsystems and has become
the defacto mark field for all kinds of packets. Therefore
it makes sense to rename it to `mark' and remove the
dependency on CONFIG_NETFILTER.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 98a4a861 08-Nov-2006 Al Viro <viro@zeniv.linux.org.uk>

[NETFILTER]: trivial annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d8a585d7 14-Nov-2006 Patrick McHardy <kaber@trash.net>

[NETFILTER]: Use pskb_trim in {ip,ip6,nfnetlink}_queue

Based on patch by James D. Nurmi:

I've got some code very dependant on nfnetlink_queue, and turned up a
large number of warns coming from skb_trim. While it's quite possibly
my code, having not seen it on older kernels made me a bit suspect.

Anyhow, based on some googling I turned up this thread:
http://lkml.org/lkml/2006/8/13/56

And believe the issue to be related, so attached is a small patch to
the kernel -- not sure if this is completely correct, but for anyone
else hitting the WARN_ON(1) in skbuff.h, it might be helpful..

Signed-off-by: James D. Nurmi <jdnurmi@gmail.com>

Ported to ip6_queue and nfnetlink_queue and added return value
checks.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# febf0a43 03-Nov-2006 Al Viro <viro@zeniv.linux.org.uk>

[NETFILTER] bug: skb->protocol is already net-endian

htons() is not needed (and no, it's not misspelled ntohs() -
userland expects net-endian here).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1158ba27 22-Aug-2006 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: fix typo in error message

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 84fa7933 29-Aug-2006 Patrick McHardy <kaber@trash.net>

[NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE

Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose
checksum still needs to be completed) and CHECKSUM_COMPLETE (for
incoming packets, device supplied full checksum).

Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ef47c6a7 27-Jun-2006 Patrick McHardy <kaber@trash.net>

[NETFILTER]: ip_queue/nfnetlink_queue: drop bridge port references when dev disappears

When a device that is acting as a bridge port is unregistered, the
ip_queue/nfnetlink_queue notifier doesn't check if its one of
physindev/physoutdev and doesn't release the references if it is.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 32292a7f 06-Apr-2006 Patrick McHardy <kaber@trash.net>

[NETFILTER]: Fix section mismatch warnings

Fix section mismatch warnings caused by netfilter's init_or_cleanup
functions used in many places by splitting the init from the cleanup
parts.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 65b4b4e8 28-Mar-2006 Andrew Morton <akpm@osdl.org>

[NETFILTER]: Rename init functions.

Every netfilter module uses `init' for its module_init() function and
`fini' or `cleanup' for its module_exit() function.

Problem is, this creates uninformative initcall_debug output and makes
ctags rather useless.

So go through and rename them all to $(filename)_init and
$(filename)_fini.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f0d83583 22-Mar-2006 Pablo Neira Ayuso <pablo@netfilter.org>

[NETFILTER]: nfnetlink_queue: fix nfnetlink message size

Fix oversized message, use NLMSG_SPACE just one since it reserves space
for the netlink header and NFA_SPACE for every attribute.

Thanks to Harald Welte for the feedback

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 406dbfc9 12-Mar-2006 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: fix possible NULL-ptr dereference

Fix NULL-ptr dereference when a config message for a non-existant
queue containing only an NFQA_CFG_PARAMS attribute is received.

Coverity #433

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a706124d 04-Feb-2006 Patrick McHardy <kaber@trash.net>

[NETFILTER]: nfnetlink_queue: fix packet marking over netlink

The packet marked is the netlink skb, not the queued skb.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3e4ead4f 05-Jan-2006 Jesper Juhl <jesper.juhl@gmail.com>

[NETFILTER]: Decrease number of pointer derefs in nfnetlink_queue.c

Benefits of the patch:
- Fewer pointer dereferences should make the code slightly faster.
- Size of generated code is smaller
- improved readability

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 181a46a5 04-Jan-2006 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[NETFILTER]: Use macro for spinlock_t/rwlock_t initializations/definition.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 37d2e7a2 14-Nov-2005 Harald Welte <laforge@netfilter.org>

[NETFILTER] nfnetlink: unconditionally require CAP_NET_ADMIN

This patch unconditionally requires CAP_NET_ADMIN for all nfnetlink
messages. It also removes the per-message cap_required field, since all
existing subsystems use CAP_NET_ADMIN for all their messages anyway.

Patrick McHardy owes me a beer if we ever need to re-introduce this.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 10dfdc69 03-Nov-2005 Harald Welte <laforge@netfilter.org>

[NETFILTER] nfnetlink: Use kzalloc

These is a cleanup patch, kzalloc can be used in a couple of cases

Signed-off-by: Samir Bellabes <sbellabes@mandriva.com>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 325ed823 03-Oct-2005 Herbert Xu <herbert@gondor.apana.org.au>

[NET]: Fix packet timestamping.

I've found the problem in general. It affects any 64-bit
architecture. The problem occurs when you change the system time.

Suppose that when you boot your system clock is forward by a day.
This gets recorded down in skb_tv_base. You then wind the clock back
by a day. From that point onwards the offset will be negative which
essentially overflows the 32-bit variables they're stored in.

In fact, why don't we just store the real time stamp in those 32-bit
variables? After all, we're not going to overflow for quite a while
yet.

When we do overflow, we'll need a better solution of course.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e7dfb09a 06-Sep-2005 Patrick McHardy <kaber@trash.net>

[NETFILTER]: Fix HW checksum handling in nfnetlink_queue

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aa07ca57 05-Sep-2005 Harald Welte <laforge@netfilter.org>

[NETFILTER] remove bogus hand-coded htonll() from nenetlink_queue

htonll() is nothing else than cpu_to_be64(), so we'd rather call the
latter.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 395dde20 05-Sep-2005 Adrian Bunk <bunk@stusta.de>

[NETFILTER]: net/netfilter/nfnetlink*: make functions static

This patch makes needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a61bbcf2 14-Aug-2005 Patrick McHardy <kaber@trash.net>

[NET]: Store skb->timestamp as offset to a base timestamp

Reduces skb size by 8 bytes on 64-bit.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bbd86b9f 09-Aug-2005 Harald Welte <laforge@netfilter.org>

[NETFILTER]: add /proc/net/netfilter interface to nf_queue

This patch adds a /proc/net/netfilter/nf_queue file, similar to the
recently-added /proc/net/netfilter/nf_log. It indicates which queue
handler is registered to which protocol family. This is useful since
there are now multiple queue handlers in the treee (ip[6]_queue,
nfnetlink_queue).

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fbcd923c 09-Aug-2005 Harald Welte <laforge@netfilter.org>

[NETFILTER]: add correct bridging support to nfnetlink_{queue,log}

This patch adds support for passing the real 'physical' device ifindex
down to userspace via nfnetlink_log and nfnetlink_queue.

This feature basically obsoletes net/bridge/netfilter/ebt_ulog.c, and
it is likely ebt_ulog.c will die with one of the next couple of
patches.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 927ccbcc 09-Aug-2005 Harald Welte <laforge@netfilter.org>

[NETFILTER]: attribute count is an attribute of message type, not subsytem

Prior to this patch, every nfnetlink subsystem had to specify it's
attribute count. However, in reality the attribute count depends on
the message type within the subsystem, not the subsystem itself. This
patch moves 'attr_count' from 'struct nfnetlink_subsys' into
nfnl_callback to fix this.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0597f268 09-Aug-2005 Harald Welte <laforge@netfilter.org>

[NETFILTER]: Add new "nfnetlink_log" userspace packet logging facility

This is a generic (layer3 independent) version of what ipt_ULOG is already
doing for IPv4 today. ipt_ULOG, ebt_ulog and finally also ip[6]t_LOG will
be deprecated by this mechanism in the long term.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 838ab636 09-Aug-2005 Harald Welte <laforge@netfilter.org>

[NETFILTER]: Add refcounting and /proc/net/netfilter interface to nfnetlink_queue

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7af4cc3f 09-Aug-2005 Harald Welte <laforge@netfilter.org>

[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink

- Add new nfnetlink_queue module
- Add new ipt_NFQUEUE and ip6t_NFQUEUE modules to access queue numbers 1-65535
- Mark ip_queue and ip6_queue Kconfig options as OBSOLETE
- Update feature-removal-schedule to remove ip[6]_queue in December

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>