History log of /linux-master/net/bridge/br_netfilter_ipv6.c
Revision Date Author Comments
# 98748088 11-Jan-2024 Pavel Tikhomirov <ptikhomirov@virtuozzo.com>

netfilter: bridge: replace physindev with physinif in nf_bridge_info

An skb can be added to a neigh->arp_queue while waiting for an arp
reply. Where original skb's skb->dev can be different to neigh's
neigh->dev. For instance in case of bridging dnated skb from one veth to
another, the skb would be added to a neigh->arp_queue of the bridge.

As skb->dev can be reset back to nf_bridge->physindev and used, and as
there is no explicit mechanism that prevents this physindev from been
freed under us (for instance neigh_flush_dev doesn't cleanup skbs from
different device's neigh queue) we can crash on e.g. this stack:

arp_process
neigh_update
skb = __skb_dequeue(&neigh->arp_queue)
neigh_resolve_output(..., skb)
...
br_nf_dev_xmit
br_nf_pre_routing_finish_bridge_slow
skb->dev = nf_bridge->physindev
br_handle_frame_finish

Let's use plain ifindex instead of net_device link. To peek into the
original net_device we will use dev_get_by_index_rcu(). Thus either we
get device and are safe to use it or we don't get it and drop skb.

Fixes: c4e70a87d975 ("netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# cf8b7c1a 11-Oct-2023 Florian Westphal <fw@strlen.de>

netfilter: bridge: convert br_netfilter to NF_DROP_REASON

errno is 0 because these hooks are called from prerouting and forward.
There is no socket that the errno would ever be propagated to.

Other netfilter modules (e.g. nf_nat, conntrack, ...) can be converted
in a similar way.

Signed-off-by: Florian Westphal <fw@strlen.de>


# 28e144cf 07-Mar-2023 Xin Long <lucien.xin@gmail.com>

netfilter: move br_nf_check_hbh_len to utils

Rename br_nf_check_hbh_len() to nf_ip6_check_hbh_len() and move it
to netfilter utils, so that it can be used by other modules, like
ovs and tc.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 0b24bd71 07-Mar-2023 Xin Long <lucien.xin@gmail.com>

netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len

br_nf_check_hbh_len() is a function to check the Hop-by-hop option
header, and shouldn't do pskb_trim_rcsum() there. This patch is to
pass pkt_len out to br_validate_ipv6() and do pskb_trim_rcsum()
after calling br_validate_ipv6() instead.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>


# a7f1a2f4 07-Mar-2023 Xin Long <lucien.xin@gmail.com>

netfilter: bridge: check len before accessing more nh data

In the while loop of br_nf_check_hbh_len(), similar to ip6_parse_tlv(),
before accessing 'nh[off + 1]', it should add a check 'len < 2'; and
before parsing IPV6_TLV_JUMBO, it should add a check 'optlen > len',
in case of overflows.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>


# 9ccff83b 07-Mar-2023 Xin Long <lucien.xin@gmail.com>

netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len

When checking Hop-by-hop option header, if the option data is in
nonlinear area, it should do pskb_may_pull instead of discarding
the skb as a bad IPv6 packet.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>


# d047283a 30-Aug-2022 Harsh Modi <harshmodi@google.com>

netfilter: br_netfilter: Drop dst references before setting.

The IPv6 path already drops dst in the daddr changed case, but the IPv4
path does not. This change makes the two code paths consistent.

Further, it is possible that there is already a metadata_dst allocated from
ingress that might already be attached to skbuff->dst while following
the bridge path. If it is not released before setting a new
metadata_dst, it will be leaked. This is similar to what is done in
bpf_set_tunnel_key() or ip6_route_input().

It is important to note that the memory being leaked is not the dst
being set in the bridge code, but rather memory allocated from some
other code path that is not being freed correctly before the skb dst is
overwritten.

An example of the leakage fixed by this commit found using kmemleak:

unreferenced object 0xffff888010112b00 (size 256):
comm "softirq", pid 0, jiffies 4294762496 (age 32.012s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 80 16 f1 83 ff ff ff ff ................
e1 4e f6 82 ff ff ff ff 00 00 00 00 00 00 00 00 .N..............
backtrace:
[<00000000d79567ea>] metadata_dst_alloc+0x1b/0xe0
[<00000000be113e13>] udp_tun_rx_dst+0x174/0x1f0
[<00000000a36848f4>] geneve_udp_encap_recv+0x350/0x7b0
[<00000000d4afb476>] udp_queue_rcv_one_skb+0x380/0x560
[<00000000ac064aea>] udp_unicast_rcv_skb+0x75/0x90
[<000000009a8ee8c5>] ip_protocol_deliver_rcu+0xd8/0x230
[<00000000ef4980bb>] ip_local_deliver_finish+0x7a/0xa0
[<00000000d7533c8c>] __netif_receive_skb_one_core+0x89/0xa0
[<00000000a879497d>] process_backlog+0x93/0x190
[<00000000e41ade9f>] __napi_poll+0x28/0x170
[<00000000b4c0906b>] net_rx_action+0x14f/0x2a0
[<00000000b20dd5d4>] __do_softirq+0xf4/0x305
[<000000003a7d7e15>] __irq_exit_rcu+0xc3/0x140
[<00000000968d39a2>] sysvec_apic_timer_interrupt+0x9e/0xc0
[<000000009e920794>] asm_sysvec_apic_timer_interrupt+0x16/0x20
[<000000008942add0>] native_safe_halt+0x13/0x20

Florian Westphal says: "Original code was likely fine because nothing
ever did set a skb->dst entry earlier than bridge in those days."

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Harsh Modi <harshmodi@google.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# ff6d090d 10-Jun-2019 Christian Brauner <christian@brauner.io>

netfilter: bridge: port sysctls to use brnf_net

This ports the sysctls to use struct brnf_net.

With this patch we make it possible to namespace the br_netfilter module in
the following patch.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# e166e4fd 13-Mar-2019 Xin Long <lucien.xin@gmail.com>

netfilter: bridge: set skb transport_header before entering NF_INET_PRE_ROUTING

Since Commit 21d1196a35f5 ("ipv4: set transport header earlier"),
skb->transport_header has been always set before entering INET
netfilter. This patch is to set skb->transport_header for bridge
before entering INET netfilter by bridge-nf-call-iptables.

It also fixes an issue that sctp_error() couldn't compute a right
csum due to unset skb->transport_header.

Fixes: e6d8b64b34aa ("net: sctp: fix and consolidate SCTP checksumming code")
Reported-by: Li Shuang <shuali@redhat.com>
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 6c57f045 17-Jan-2019 Ross Lagerwall <ross.lagerwall@citrix.com>

net: Fix usage of pskb_trim_rcsum

In certain cases, pskb_trim_rcsum() may change skb pointers.
Reinitialize header pointers afterwards to avoid potential
use-after-frees. Add a note in the documentation of
pskb_trim_rcsum(). Found by KASAN.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# de8bda1d 18-Dec-2018 Florian Westphal <fw@strlen.de>

net: convert bridge_nf to use skb extension infrastructure

This converts the bridge netfilter (calling iptables hooks from bridge)
facility to use the extension infrastructure.

The bridge_nf specific hooks in skb clone and free paths are removed, they
have been replaced by the skb_ext hooks that do the same as the bridge nf
allocations hooks did.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7c0f6ba6 24-Dec-2016 Linus Torvalds <torvalds@linux-foundation.org>

Replace <asm/uaccess.h> with <linux/uaccess.h> globally

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c5136b15 21-Sep-2016 Florian Westphal <fw@strlen.de>

netfilter: bridge: add and use br_nf_hook_thresh

This replaces the last uses of NF_HOOK_THRESH().
Followup patch will remove it and rename nf_hook_thresh.

The reason is that inet (non-bridge) netfilter no longer invokes the
hooks from hooks, so we do no longer need the thresh value to skip hooks
with a lower priority.

The bridge netfilter however may need to do this. br_nf_hook_thresh is a
wrapper that is supposed to do this, i.e. only call hooks with a
priority that exceeds NF_BR_PRI_BRNF.

It's used only in the recursion cases of br_netfilter. It invokes
nf_hook_slow while holding an rcu read-side critical section to make a
future cleanup simpler.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 1d015503 27-Apr-2016 Eric Dumazet <edumazet@google.com>

ipv6: rename IP6_INC_STATS_BH()

Rename IP6_INC_STATS_BH() to __IP6_INC_STATS()
and IP6_ADD_STATS_BH() to __IP6_ADD_STATS()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c1444c63 25-Sep-2015 Eric W. Biederman <ebiederm@xmission.com>

bridge: Pass net into br_validate_ipv4 and br_validate_ipv6

The network namespace is easiliy available in state->net so use it.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 06198b34 18-Sep-2015 Eric W. Biederman <ebiederm@xmission.com>

netfilter: Pass priv instead of nf_hook_ops to netfilter hooks

Only pass the void *priv parameter out of the nf_hook_ops. That is
all any of the functions are interested now, and by limiting what is
passed it becomes simpler to change implementation details.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 0c4b51f0 15-Sep-2015 Eric W. Biederman <ebiederm@xmission.com>

netfilter: Pass net into okfn

This is immediately motivated by the bridge code that chains functions that
call into netfilter. Without passing net into the okfns the bridge code would
need to guess about the best expression for the network namespace to process
packets in.

As net is frequently one of the first things computed in continuation functions
after netfilter has done it's job passing in the desired network namespace is in
many cases a code simplification.

To support this change the function dst_output_okfn is introduced to
simplify passing dst_output as an okfn. For the moment dst_output_okfn
just silently drops the struct net.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 29a26a56 15-Sep-2015 Eric W. Biederman <ebiederm@xmission.com>

netfilter: Pass struct net into the netfilter hooks

Pass a network namespace parameter into the netfilter hooks. At the
call site of the netfilter hooks the path a packet is taking through
the network stack is well known which allows the network namespace to
be easily and reliabily.

This allows the replacement of magic code like
"dev_net(state->in?:state->out)" that appears at the start of most
netfilter hooks with "state->net".

In almost all cases the network namespace passed in is derived
from the first network device passed in, guaranteeing those
paths will not see any changes in practice.

The exceptions are:
xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp)
ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp)
ipv4/raw.c:raw_send_hdrinc() sock_net(sk)
ipv6/ip6_output.c:ip6_xmit() sock_net(sk)
ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev)
ipv6/raw.c:raw6_send_hdrinc() sock_net(sk)
br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev

In all cases these exceptions seem to be a better expression for the
network namespace the packet is being processed in then the historic
"dev_net(in?in:out)". I am documenting them in case something odd
pops up and someone starts trying to track down what happened.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 72b1e5e4 23-Jul-2015 Florian Westphal <fw@strlen.de>

netfilter: bridge: reduce nf_bridge_info to 32 bytes again

We can use union for most of the temporary cruft (original ipv4/ipv6
address, source mac, physoutdev) since they're used during different
stages of br netfilter traversal.

Also get rid of the last two ->mask users.

Shrinks struct from 48 to 32 on 64bit arch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 86e89718 07-Jul-2015 Julien Grall <julien.grall@citrix.com>

netfilter: bridge: Use __in6_dev_get rather than in6_dev_get in br_validate_ipv6

The commit efb6de9b4ba0092b2c55f6a52d16294a8a698edd "netfilter: bridge:
forward IPv6 fragmented packets" introduced a new function
br_validate_ipv6 which take a reference on the inet6 device. Although,
the reference is not released at the end.

This will result to the impossibility to destroy any netdevice using
ipv6 and bridge.

It's possible to directly retrieve the inet6 device without taking a
reference as all netfilter hooks are protected by rcu_read_lock via
nf_hook_slow.

Spotted while trying to destroy a Xen guest on the upstream Linux:
"unregister_netdevice: waiting for vif1.0 to become free. Usage count = 1"

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Cc: Bernhard Thaler <bernhard.thaler@wvnet.at>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: fw@strlen.de
Cc: ian.campbell@citrix.com
Cc: wei.liu2@citrix.com
Cc: Bob Liu <bob.liu@oracle.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 230ac490 16-Jun-2015 Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: bridge: split ipv6 code into separated file

Resolve compilation breakage when CONFIG_IPV6 is not set by moving the IPv6
code into a separated br_netfilter_ipv6.c file.

Fixes: efb6de9b4ba0 ("netfilter: bridge: forward IPv6 fragmented packets")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>