History log of /linux-master/net/ipv4/ip_vti.c
Revision Date Author Comments
# 9b5b3637 06-Feb-2024 Eric Dumazet <edumazet@google.com>

ip_tunnel: use exit_batch_rtnl() method

exit_batch_rtnl() is called while RTNL is held,
and devices to be unregistered can be queued in the dev_kill_list.

This saves one rtnl_lock()/rtnl_unlock() pair
and one unregister_netdevice_many() call.

This patch takes care of ipip, ip_vti, and ip_gre tunnels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Antoine Tenart <atenart@kernel.org>
Link: https://lore.kernel.org/r/20240206144313.2050392-15-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# b058a5d2 08-Feb-2024 Breno Leitao <leitao@debian.org>

net: fill in MODULE_DESCRIPTION()s for ipv4 modules

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the IPv4 modules.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240208164244.3818498-7-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 2b1dc628 04-Oct-2023 Florian Westphal <fw@strlen.de>

xfrm: pass struct net to xfrm_decode_session wrappers

Preparation patch, extra arg is not used.
No functional changes intended.

This is needed to replace the xfrm session decode functions with
the flow dissector.

skb_flow_dissect() cannot be used as-is, because it attempts to deduce the
'struct net' to use for bpf program fetch from skb->sk or skb->dev, but
xfrm code path can see skbs that have neither sk or dev filled in.

So either flow dissector needs to try harder, e.g. by also trying
skb->dst->dev, or we have to pass the struct net explicitly.

Passing the struct net doesn't look too bad to me, most places
already have it available or can derive it from the output device.

Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/netdev/202309271628.27fd2187-oliver.sang@intel.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 6018a266 10-Jul-2023 Zhengchao Shao <shaozhengchao@huawei.com>

ip_vti: fix potential slab-use-after-free in decode_session6

When ip_vti device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when ip_vti device sends IPv6 packets.
As commit f855691975bb ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.

Fixes: f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# c4794d22 15-Nov-2022 Eric Dumazet <edumazet@google.com>

ipv4: tunnels: use DEV_STATS_INC()

Most of code paths in tunnels are lockless (eg NETIF_F_LLTX in tx).

Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a1b7e1a 12-Oct-2021 Jakub Kicinski <kuba@kernel.org>

ip: use dev_addr_set() in tunnels

Use dev_addr_set() instead of writing to netdev->dev_addr
directly in ip tunnels drivers.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 3e7a1c7c 27-Jul-2021 Arnd Bergmann <arnd@arndb.de>

ip_tunnel: use ndo_siocdevprivate

The various ipv4 and ipv6 tunnel drivers each implement a set
of 12 SIOCDEVPRIVATE commands for managing tunnels. These
all work correctly in compat mode.

Move them over to the new .ndo_siocdevprivate operation.

Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 79121184 19-Nov-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

ipv4: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of just
letting the code fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# c7c1abfd 26-Feb-2021 Eyal Birger <eyal.birger@gmail.com>

vti: fix ipv4 pmtu check to honor ip header df

Frag needed should only be sent if the header enables DF.

This fix allows packets larger than MTU to pass the vti interface
and be fragmented after encapsulation, aligning behavior with
non-vti xfrm.

Fixes: d6af1a31cc72 ("vti: Add pmtu handling to vti_xmit.")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 4372339e 26-Feb-2021 Jason A. Donenfeld <Jason@zx2c4.com>

net: always use icmp{,v6}_ndo_send from ndo_start_xmit

There were a few remaining tunnel drivers that didn't receive the prior
conversion to icmp{,v6}_ndo_send. Knowing now that this could lead to
memory corrution (see ee576c47db60 ("net: icmp: pass zeroed opts from
icmp{,v6}_ndo_send before sending") for details), there's even more
imperative to have these all converted. So this commit goes through the
remaining cases that I could find and does a boring translation to the
ndo variety.

The Fixes: line below is the merge that originally added icmp{,v6}_
ndo_send and converted the first batch of icmp{,v6}_send users. The
rationale then for the change applies equally to this patch. It's just
that these drivers were left out of the initial conversion because these
network devices are hiding in net/ rather than in drivers/net/.

Cc: Florian Westphal <fw@strlen.de>
Cc: Willem de Bruijn <willemb@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Fixes: 803381f9f117 ("Merge branch 'icmp-account-for-NAT-when-sending-icmps-from-ndo-layer'")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f3feb24 07-Nov-2020 Heiner Kallweit <hkallweit1@gmail.com>

vti: switch to dev_get_tstats64

Replace ip_tunnel_get_stats64() with the new identical core function
dev_get_tstats64().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 560b50cf 05-Oct-2020 Fabian Frederick <fabf@skynet.be>

ipv4: use dev_sw_netstats_rx_add()

use new helper for netstats settings

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 61ee4137 31-Jul-2020 YueHaibing <yuehaibing@huawei.com>

ip_vti: Fix unused variable warning

If CONFIG_INET_XFRM_TUNNEL is set but CONFIG_IPV6 is n,

net/ipv4/ip_vti.c:493:27: warning: 'vti_ipip6_handler' defined but not used [-Wunused-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# b328ecc4 17-Jul-2020 Steffen Klassert <steffen.klassert@secunet.com>

xfrm: Make the policy hold queue work with VTI.

We forgot to support the xfrm policy hold queue when
VTI was implemented. This patch adds everything we
need so that we can use the policy hold queue together
with VTI interfaces.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 55a48c7e 13-Jul-2020 Xin Long <lucien.xin@gmail.com>

ip_vti: not register vti_ipip_handler twice

An xfrm_tunnel object is linked into the list when registering,
so vti_ipip_handler can not be registered twice, otherwise its
next pointer will be overwritten on the second time.

So this patch is to define a new xfrm_tunnel object to register
for AF_INET6.

Fixes: e6ce64570f24 ("ip_vti: support IPIP6 tunnel processing")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# e6ce6457 06-Jul-2020 Xin Long <lucien.xin@gmail.com>

ip_vti: support IPIP6 tunnel processing

For IPIP6 tunnel processing, the functions called will be the
same as that for IPIP tunnel's. So reuse it and register it
with family == AF_INET6.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 87e66b96 06-Jul-2020 Xin Long <lucien.xin@gmail.com>

ip_vti: support IPIP tunnel processing with .cb_handler

With tunnel4_input_afinfo added, IPIP tunnel processing in
ip_vti can be easily done with .cb_handler. So replace the
processing by calling ip_tunnel_rcv() with it.

v1->v2:
- no change.
v2-v3:
- enable it only when CONFIG_INET_XFRM_TUNNEL is defined, to fix
the build error, reported by kbuild test robot.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# ab59d2b6 29-Jun-2020 Jason A. Donenfeld <Jason@zx2c4.com>

net: vti: implement header_ops->parse_protocol for AF_PACKET

Vti uses skb->protocol to determine packet type, and bails out if it's
not set. For AF_PACKET injection, we need to support its call chain of:

packet_sendmsg -> packet_snd -> packet_parse_headers ->
dev_parse_header_protocol -> parse_protocol

Without a valid parse_protocol, this returns zero, and vti rejects the
skb. So, this wires up the ip_tunnel handler for layer 3 packets for
that case.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 607259a6 19-May-2020 Christoph Hellwig <hch@lst.de>

net: add a new ndo_tunnel_ioctl method

This method is used to properly allow kernel callers of the IPv4 route
management ioctls. The exsting ip_tunnel_ioctl helper is renamed to
ip_tunnel_ctl to better reflect that it doesn't directly implement ioctls
touching user memory, and is used for the guts of ndo_tunnel_ctl
implementations. A new ip_tunnel_ioctl helper is added that can be wired
up directly to the ndo_do_ioctl method and takes care of the copy to and
from userspace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 976eba8a 21-Apr-2020 Xin Long <lucien.xin@gmail.com>

ip_vti: receive ipip packet by calling ip_tunnel_rcv

In Commit dd9ee3444014 ("vti4: Fix a ipip packet processing bug in
'IPCOMP' virtual tunnel"), it tries to receive IPIP packets in vti
by calling xfrm_input(). This case happens when a small packet or
frag sent by peer is too small to get compressed.

However, xfrm_input() will still get to the IPCOMP path where skb
sec_path is set, but never dropped while it should have been done
in vti_ipcomp4_protocol.cb_handler(vti_rcv_cb), as it's not an
ipcomp4 packet. This will cause that the packet can never pass
xfrm4_policy_check() in the upper protocol rcv functions.

So this patch is to call ip_tunnel_rcv() to process IPIP packets
instead.

Fixes: dd9ee3444014 ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# f1ed1026 04-Feb-2020 Nicolas Dichtel <nicolas.dichtel@6wind.com>

vti[6]: fix packet tx through bpf_redirect() in XinY cases

I forgot the 4in6/6in4 cases in my previous patch. Let's fix them.

Fixes: 95224166a903 ("vti[6]: fix packet tx through bpf_redirect()")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 95224166 13-Jan-2020 Nicolas Dichtel <nicolas.dichtel@6wind.com>

vti[6]: fix packet tx through bpf_redirect()

With an ebpf program that redirects packets through a vti[6] interface,
the packets are dropped because no dst is attached.

This could also be reproduced with an AF_PACKET socket, with the following
python script (vti1 is an ip_vti interface):

import socket
send_s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, 0)
# scapy
# p = IP(src='10.100.0.2', dst='10.200.0.1')/ICMP(type='echo-request')
# raw(p)
req = b'E\x00\x00\x1c\x00\x01\x00\x00@\x01e\xb2\nd\x00\x02\n\xc8\x00\x01\x08\x00\xf7\xff\x00\x00\x00\x00'
send_s.sendto(req, ('vti1', 0x800, 0, 0))

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 8247a79e 21-Dec-2019 Hangbin Liu <liuhangbin@gmail.com>

vti: do not confirm neighbor when do pmtu update

When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.

Although vti and vti6 are immune to this problem because they are IFF_NOARP
interfaces, as Guillaume pointed. There is still no sense to confirm neighbour
here.

v5: Update commit description.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c593642c 09-Dec-2019 Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>

treewide: Use sizeof_field() macro

Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

if [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# c9500d7b 29-Mar-2019 Florian Westphal <fw@strlen.de>

xfrm: store xfrm_mode directly, not its address

This structure is now only 4 bytes, so its more efficient
to cache a copy rather than its address.

No significant size difference in allmodconfig vmlinux.

With non-modular kernel that has all XFRM options enabled, this
series reduces vmlinux image size by ~11kb. All xfrm_mode
indirections are gone and all modes are built-in.

before (ipsec-next master):
text data bss dec filename
21071494 7233140 11104324 39408958 vmlinux.master

after this series:
21066448 7226772 11104324 39397544 vmlinux.patched

With allmodconfig kernel, the size increase is only 362 bytes,
even all the xfrm config options removed in this series are
modular.

before:
text data bss dec filename
15731286 6936912 4046908 26715106 vmlinux.master

after this series:
15731492 6937068 4046908 26715468 vmlinux

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 4c145dce 29-Mar-2019 Florian Westphal <fw@strlen.de>

xfrm: make xfrm modes builtin

after previous changes, xfrm_mode contains no function pointers anymore
and all modules defining such struct contain no code except an init/exit
functions to register the xfrm_mode struct with the xfrm core.

Just place the xfrm modes core and remove the modules,
the run-time xfrm_mode register/unregister functionality is removed.

Before:

text data bss dec filename
7523 200 2364 10087 net/xfrm/xfrm_input.o
40003 628 440 41071 net/xfrm/xfrm_state.o
15730338 6937080 4046908 26714326 vmlinux

7389 200 2364 9953 net/xfrm/xfrm_input.o
40574 656 440 41670 net/xfrm/xfrm_state.o
15730084 6937068 4046908 26714060 vmlinux

The xfrm*_mode_{transport,tunnel,beet} modules are gone.

v2: replace CONFIG_INET6_XFRM_MODE_* IS_ENABLED guards with CONFIG_IPV6
ones rather than removing them.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# b45714b1 29-Mar-2019 Florian Westphal <fw@strlen.de>

xfrm: prefer family stored in xfrm_mode struct

Now that we have the family available directly in the
xfrm_mode struct, we can use that and avoid one extra dereference.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# f981c57f 23-Mar-2019 Jeremy Sowden <jeremy@azazel.net>

vti4: eliminated some duplicate code.

The ipip tunnel introduced in commit dd9ee3444014 ("vti4: Fix a ipip
packet processing bug in 'IPCOMP' virtual tunnel") largely duplicated
the existing vti_input and vti_recv functions. Refactored to
deduplicate the common code.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 01ce31c5 19-Mar-2019 Jeremy Sowden <jeremy@azazel.net>

vti4: removed duplicate log message.

Removed info log-message if ipip tunnel registration fails during
module-initialization: it adds nothing to the error message that is
written on all failures.

Fixes: dd9ee3444014e ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 5483844c 19-Mar-2019 Jeremy Sowden <jeremy@azazel.net>

vti4: ipip tunnel deregistration fixes.

If tunnel registration failed during module initialization, the module
would fail to deregister the IPPROTO_COMP protocol and would attempt to
deregister the tunnel.

The tunnel was not deregistered during module-exit.

Fixes: dd9ee3444014e ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# dd9ee344 06-Jan-2019 Su Yanjun <suyj.fnst@cn.fujitsu.com>

vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel

Recently we run a network test over ipcomp virtual tunnel.We find that
if a ipv4 packet needs fragment, then the peer can't receive
it.

We deep into the code and find that when packet need fragment the smaller
fragment will be encapsulated by ipip not ipcomp. So when the ipip packet
goes into xfrm, it's skb->dev is not properly set. The ipv4 reassembly code
always set skb'dev to the last fragment's dev. After ipv4 defrag processing,
when the kernel rp_filter parameter is set, the skb will be drop by -EXDEV
error.

This patch adds compatible support for the ipip process in ipcomp virtual tunnel.

Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# cb9f1b78 30-Dec-2018 Willem de Bruijn <willemb@google.com>

ip: validate header length on virtual device xmit

KMSAN detected read beyond end of buffer in vti and sit devices when
passing truncated packets with PF_PACKET. The issue affects additional
ip tunnel devices.

Extend commit 76c0ddd8c3a6 ("ip6_tunnel: be careful when accessing the
inner header") and commit ccfec9e5cb2d ("ip_tunnel: be careful when
accessing the inner header").

Move the check to a separate helper and call at the start of each
ndo_start_xmit function in net/ipv4 and net/ipv6.

Minor changes:
- convert dev_kfree_skb to kfree_skb on error path,
as dev_kfree_skb calls consume_skb which is not for error paths.
- use pskb_network_may_pull even though that is pedantic here,
as the same as pskb_may_pull for devices without llheaders.
- do not cache ipv6 hdrs if used only once
(unsafe across pskb_may_pull, was more relevant to earlier patch)

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1042caa7 25-Sep-2018 Maciej Żenczykowski <maze@google.com>

net-ipv4: remove 2 always zero parameters from ipv4_redirect()

(the parameters in question are mark and flow_flags)

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d888f396 25-Sep-2018 Maciej Żenczykowski <maze@google.com>

net-ipv4: remove 2 always zero parameters from ipv4_update_pmtu()

(the parameters in question are mark and flow_flags)

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cd1aa9c2 19-Aug-2018 Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

ip_vti: fix a null pointer deferrence when create vti fallback tunnel

After set fb_tunnels_only_for_init_net to 1, the itn->fb_tunnel_dev will
be NULL and will cause following crash:

[ 2742.849298] BUG: unable to handle kernel NULL pointer dereference at 0000000000000941
[ 2742.851380] PGD 800000042c21a067 P4D 800000042c21a067 PUD 42aaed067 PMD 0
[ 2742.852818] Oops: 0002 [#1] SMP PTI
[ 2742.853570] CPU: 7 PID: 2484 Comm: unshare Kdump: loaded Not tainted 4.18.0-rc8+ #2
[ 2742.855163] Hardware name: Fedora Project OpenStack Nova, BIOS seabios-1.7.5-11.el7 04/01/2014
[ 2742.856970] RIP: 0010:vti_init_net+0x3a/0x50 [ip_vti]
[ 2742.858034] Code: 90 83 c0 48 c7 c2 20 a1 83 c0 48 89 fb e8 6e 3b f6 ff 85 c0 75 22 8b 0d f4 19 00 00 48 8b 93 00 14 00 00 48 8b 14 ca 48 8b 12 <c6> 82 41 09 00 00 04 c6 82 38 09 00 00 45 5b c3 66 0f 1f 44 00 00
[ 2742.861940] RSP: 0018:ffff9be28207fde0 EFLAGS: 00010246
[ 2742.863044] RAX: 0000000000000000 RBX: ffff8a71ebed4980 RCX: 0000000000000013
[ 2742.864540] RDX: 0000000000000000 RSI: 0000000000000013 RDI: ffff8a71ebed4980
[ 2742.866020] RBP: ffff8a71ea717000 R08: ffffffffc083903c R09: ffff8a71ea717000
[ 2742.867505] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a71ebed4980
[ 2742.868987] R13: 0000000000000013 R14: ffff8a71ea5b49c0 R15: 0000000000000000
[ 2742.870473] FS: 00007f02266c9740(0000) GS:ffff8a71ffdc0000(0000) knlGS:0000000000000000
[ 2742.872143] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2742.873340] CR2: 0000000000000941 CR3: 000000042bc20006 CR4: 00000000001606e0
[ 2742.874821] Call Trace:
[ 2742.875358] ops_init+0x38/0xf0
[ 2742.876078] setup_net+0xd9/0x1f0
[ 2742.876789] copy_net_ns+0xb7/0x130
[ 2742.877538] create_new_namespaces+0x11a/0x1d0
[ 2742.878525] unshare_nsproxy_namespaces+0x55/0xa0
[ 2742.879526] ksys_unshare+0x1a7/0x330
[ 2742.880313] __x64_sys_unshare+0xe/0x20
[ 2742.881131] do_syscall_64+0x5b/0x180
[ 2742.881933] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reproduce:
echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
modprobe ip_vti
unshare -n

Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2f635cee 27-Mar-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Drop pernet_operations::async

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 03080e5e 15-Mar-2018 Stefano Brivio <sbrivio@redhat.com>

vti4: Don't override MTU passed on link creation via IFLA_MTU

Don't hardcode a MTU value on vti tunnel initialization,
ip_tunnel_newlink() is able to deal with this already. See also
commit ffc2b6ee4174 ("ip_gre: fix IFLA_MTU ignored on NEWLINK").

Fixes: 1181412c1a67 ("net/ipv4: VTI support new module for ip_vti.")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# dd1df247 15-Mar-2018 Stefano Brivio <sbrivio@redhat.com>

vti4: Don't count header length twice on tunnel setup

This re-introduces the effect of commit a32452366b72 ("vti4:
Don't count header length twice.") which was accidentally
reverted by merge commit f895f0cfbb77 ("Merge branch 'master' of
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec").

The commit message from Steffen Klassert said:

We currently count the size of LL_MAX_HEADER and struct iphdr
twice for vti4 devices, this leads to a wrong device mtu.
The size of LL_MAX_HEADER and struct iphdr is already counted in
ip_tunnel_bind_dev(), so don't do it again in vti_tunnel_init().

And this is still the case now: ip_tunnel_bind_dev() already
accounts for the header length of the link layer (not
necessarily LL_MAX_HEADER, if the output device is found), plus
one IP header.

For example, with a vti device on top of veth, with MTU of 1500,
the existing implementation would set the initial vti MTU to
1332, accounting once for LL_MAX_HEADER (128, included in
hard_header_len by vti) and twice for the same IP header (once
from hard_header_len, once from ip_tunnel_bind_dev()).

It should instead be 1480, because ip_tunnel_bind_dev() is able
to figure out that the output device is veth, so no additional
link layer header is attached, and will properly count one
single IP header.

The existing issue had the side effect of avoiding PMTUD for
most xfrm policies, by arbitrarily lowering the initial MTU.
However, the only way to get a consistent PMTU value is to let
the xfrm PMTU discovery do its course, and commit d6af1a31cc72
("vti: Add pmtu handling to vti_xmit.") now takes care of local
delivery cases where the application ignores local socket
notifications.

Fixes: b9959fd3b0fa ("vti: switch to new ip tunnel code")
Fixes: f895f0cfbb77 ("Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 31502104 26-Feb-2018 Kirill Tkhai <ktkhai@virtuozzo.com>

net: Convert ipgre_net_ops, ipgre_tap_net_ops, erspan_net_ops, vti_net_ops and ipip_net_ops

These pernet_operations are similar to bond_net_ops. Exit methods
unregisters all net ipgre/ipgre_tap/erspan/vti/ipip devices, and it
looks like another pernet_operations are not interested in foreign
net ipgre/ipgre_tap/erspan/vti/ipip list. Init method also does not
intersect with something pernet-specific. So, it's possible
to mark them async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f15ca723 25-Jan-2018 Nicolas Dichtel <nicolas.dichtel@6wind.com>

net: don't call update_pmtu unconditionally

Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
"BUG: unable to handle kernel NULL pointer dereference at (null)"

Let's add a helper to check if update_pmtu is available before calling it.

Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path")
CC: Roman Kapl <code@rkapl.cz>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# db329190 28-Oct-2017 Xin Long <lucien.xin@gmail.com>

ip_vti: remove the useless err_count check in vti_xmit

Unlike ipip and gre, ip_vti never uses err_count in vti4_err,
so no need to check err_count in vti_xmit, it's value always 0.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 36f6ee22 26-Sep-2017 Alexey Kodanev <alexey.kodanev@oracle.com>

vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit

When running LTP IPsec tests, KASan might report:

BUG: KASAN: use-after-free in vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
Read of size 4 at addr ffff880dc6ad1980 by task swapper/0/0
...
Call Trace:
<IRQ>
dump_stack+0x63/0x89
print_address_description+0x7c/0x290
kasan_report+0x28d/0x370
? vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
__asan_report_load4_noabort+0x19/0x20
vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
? vti_init_net+0x190/0x190 [ip_vti]
? save_stack_trace+0x1b/0x20
? save_stack+0x46/0xd0
dev_hard_start_xmit+0x147/0x510
? icmp_echo.part.24+0x1f0/0x210
__dev_queue_xmit+0x1394/0x1c60
...
Freed by task 0:
save_stack_trace+0x1b/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x70/0xc0
kmem_cache_free+0x81/0x1e0
kfree_skbmem+0xb1/0xe0
kfree_skb+0x75/0x170
kfree_skb_list+0x3e/0x60
__dev_queue_xmit+0x1298/0x1c60
dev_queue_xmit+0x10/0x20
neigh_resolve_output+0x3a8/0x740
ip_finish_output2+0x5c0/0xe70
ip_finish_output+0x4ba/0x680
ip_output+0x1c1/0x3a0
xfrm_output_resume+0xc65/0x13d0
xfrm_output+0x1e4/0x380
xfrm4_output_finish+0x5c/0x70

Can be fixed if we get skb->len before dst_output().

Fixes: b9959fd3b0fa ("vti: switch to new ip tunnel code")
Fixes: 22e1b23dafa8 ("vti6: Support inter address family tunneling.")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 64bc1781 19-Sep-2017 Eric Dumazet <edumazet@google.com>

ipv4: speedup ipv6 tunnels dismantle

Implement exit_batch() method to dismantle more devices
per round.

(rtnl_lock() ...
unregister_netdevice_many() ...
rtnl_unlock())

Tested:
$ cat add_del_unshare.sh
for i in `seq 1 40`
do
(for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
done
wait ; grep net_namespace /proc/slabinfo

Before patch :
$ time ./add_del_unshare.sh
net_namespace 126 282 5504 1 2 : tunables 8 4 0 : slabdata 126 282 0

real 1m38.965s
user 0m0.688s
sys 0m37.017s

After patch:
$ time ./add_del_unshare.sh
net_namespace 135 291 5504 1 2 : tunables 8 4 0 : slabdata 135 291 0

real 0m22.117s
user 0m0.728s
sys 0m35.328s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6b1c42e9 17-Jul-2017 Florian Westphal <fw@strlen.de>

vti: revert flush x-netns xfrm cache when vti interface is removed

flow cache is removed in next commit.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a8b8a889 25-Jun-2017 Matthias Schiffer <mschiffer@universe-factory.net>

net: add netlink_ext_ack argument to rtnl_link_ops.validate

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ad744b22 25-Jun-2017 Matthias Schiffer <mschiffer@universe-factory.net>

net: add netlink_ext_ack argument to rtnl_link_ops.changelink

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7a3f4a18 25-Jun-2017 Matthias Schiffer <mschiffer@universe-factory.net>

net: add netlink_ext_ack argument to rtnl_link_ops.newlink

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8ed508fd 08-May-2017 Hangbin Liu <liuhangbin@gmail.com>

vti: check nla_put_* return value

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9830ad4c 18-Apr-2017 Craig Gallek <kraig@google.com>

ip_tunnel: Allow policy-based routing through tunnels

This feature allows the administrator to set an fwmark for
packets traversing a tunnel. This allows the use of independent
routing tables for tunneled packets without the use of iptables.

There is no concept of per-packet routing decisions through IPv4
tunnels, so this implementation does not need to work with
per-packet route lookups as the v6 implementation may
(with IP6_TNL_F_USE_ORIG_FWMARK).

Further, since the v4 tunnel ioctls share datastructures
(which can not be trivially modified) with the kernel's internal
tunnel configuration structures, the mark attribute must be stored
in the tunnel structure itself and passed as a parameter when
creating or changing tunnel attributes.

Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c7d03a00 16-Nov-2016 Alexey Dobriyan <adobriyan@gmail.com>

netns: make struct pernet_operations::id unsigned int

Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

void f(long *p, int i)
{
g(p[i]);
}

roughly translates to

movsx rsi, esi
mov rdi, [rsi+...]
call g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
function old new delta
nfsd4_lock 3886 3959 +73
tipc_link_build_proto_msg 1096 1140 +44
mac80211_hwsim_new_radio 2776 2808 +32
tipc_mon_rcv 1032 1058 +26
svcauth_gss_legacy_init 1413 1429 +16
tipc_bcbase_select_primary 379 392 +13
nfsd4_exchange_id 1247 1260 +13
nfsd4_setclientid_confirm 782 793 +11
...
put_client_renew_locked 494 480 -14
ip_set_sockfn_get 730 716 -14
geneve_sock_add 829 813 -16
nfsd4_sequence_done 721 703 -18
nlmclnt_lookup_host 708 686 -22
nfsd4_lockt 1085 1063 -22
nfs_get_client 1077 1050 -27
tcf_bpf_init 1106 1076 -30
nfsd4_encode_fattr 5997 5930 -67
Total: Before=154856051, After=154854321, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1fb81e09 07-Sep-2016 thomas.zeitlhofer+lkml@ze-it.at <thomas.zeitlhofer+lkml@ze-it.at>

vti: use right inner_mode for inbound inter address family policy checks

In case of inter address family tunneling (IPv6 over vti4 or IPv4 over
vti6), the inbound policy checks in vti_rcv_cb() and vti6_rcv_cb() are
using the wrong address family. As a result, all inbound inter address
family traffic is dropped.

Use the xfrm_ip2inner_mode() helper, as done in xfrm_input() (i.e., also
increment LINUX_MIB_XFRMINSTATEMODEERROR in case of error), to select the
inner_mode that contains the right address family for the inbound policy
checks.

Signed-off-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# a5d0dc81 09-Aug-2016 Lance Richardson <lrichard@redhat.com>

vti: flush x-netns xfrm cache when vti interface is removed

When executing the script included below, the netns delete operation
hangs with the following message (repeated at 10 second intervals):

kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

This occurs because a reference to the lo interface in the "secure" netns
is still held by a dst entry in the xfrm bundle cache in the init netns.

Address this problem by garbage collecting the tunnel netns flow cache
when a cross-namespace vti interface receives a NETDEV_DOWN notification.

A more detailed description of the problem scenario (referencing commands
in the script below):

(1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1

The vti_test interface is created in the init namespace. vti_tunnel_init()
attaches a struct ip_tunnel to the vti interface's netdev_priv(dev),
setting the tunnel net to &init_net.

(2) ip link set vti_test netns secure

The vti_test interface is moved to the "secure" netns. Note that
the associated struct ip_tunnel still has tunnel->net set to &init_net.

(3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

The first packet sent using the vti device causes xfrm_lookup() to be
called as follows:

dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0);

Note that tunnel->net is the init namespace, while skb_dst(skb) references
the vti_test interface in the "secure" namespace. The returned dst
references an interface in the init namespace.

Also note that the first parameter to xfrm_lookup() determines which flow
cache is used to store the computed xfrm bundle, so after xfrm_lookup()
returns there will be a cached bundle in the init namespace flow cache
with a dst referencing a device in the "secure" namespace.

(4) ip netns del secure

Kernel begins to delete the "secure" namespace. At some point the
vti_test interface is deleted, at which point dst_ifdown() changes
the dst->dev in the cached xfrm bundle flow from vti_test to lo (still
in the "secure" namespace however).
Since nothing has happened to cause the init namespace's flow cache
to be garbage collected, this dst remains attached to the flow cache,
so the kernel loops waiting for the last reference to lo to go away.

<Begin script>
ip link add br1 type bridge
ip link set dev br1 up
ip addr add dev br1 1.1.1.1/8

ip netns add secure
ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
ip link set vti_test netns secure
ip netns exec secure ip link set vti_test up
ip netns exec secure ip link s lo up
ip netns exec secure ip addr add dev lo 192.168.100.1/24
ip netns exec secure ip route add 192.168.200.0/24 dev vti_test
ip xfrm policy flush
ip xfrm state flush
ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \
proto esp mode tunnel mark 1
ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \
proto esp mode tunnel mark 1
ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \
mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \
mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788

ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

ip netns del secure
<End script>

Reported-by: Hangbin Liu <haliu@redhat.com>
Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d6af1a31 16-Mar-2016 Steffen Klassert <steffen.klassert@secunet.com>

vti: Add pmtu handling to vti_xmit.

We currently rely on the PMTU discovery of xfrm.
However if a packet is locally sent, the PMTU mechanism
of xfrm tries to do local socket notification what
might not work for applications like ping that don't
check for this. So add pmtu handling to vti_xmit to
report MTU changes immediately.

Reported-by: Mark McKinstry <Mark.McKinstry@alliedtelesis.co.nz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 039f5062 24-Dec-2015 Pravin B Shelar <pshelar@nicira.com>

ip_tunnel: Move stats update to iptunnel_xmit()

By moving stats update into iptunnel_xmit(), we can simplify
iptunnel_xmit() usage. With this change there is no need to
call another function (iptunnel_xmit_stats()) to update stats
in tunnel xmit code path.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dfc3b0e8 26-Nov-2015 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

net: remove unnecessary mroute.h includes

It looks like many files are including mroute.h unnecessarily, so remove
the include. Most importantly remove it from ipv6.

CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 13206b6b 07-Oct-2015 Eric W. Biederman <ebiederm@xmission.com>

net: Pass net into dst_output and remove dst_output_okfn

Replace dst_output_okfn with dst_output

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a70649e 15-Sep-2015 Eric W. Biederman <ebiederm@xmission.com>

net: Merge dst_output and dst_output_sk

Add a sock paramter to dst_output making dst_output_sk superfluous.
Add a skb->sk parameter to all of the callers of dst_output
Have the callers of dst_output_sk call dst_output.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d55c670c 27-May-2015 Alexander Duyck <alexander.h.duyck@redhat.com>

ip_vti/ip6_vti: Preserve skb->mark after rcv_cb call

The vti6_rcv_cb and vti_rcv_cb calls were leaving the skb->mark modified
after completing the function. This resulted in the original skb->mark
value being lost. Since we only need skb->mark to be set for
xfrm_policy_check we can pull the assignment into the rcv_cb calls and then
just restore the original mark after xfrm_policy_check has been completed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# cd5279c1 27-May-2015 Alexander Duyck <alexander.h.duyck@redhat.com>

ip_vti/ip6_vti: Do not touch skb->mark on xmit

Instead of modifying skb->mark we can simply modify the flowi_mark that is
generated as a result of the xfrm_decode_session. By doing this we don't
need to actually touch the skb->mark and it can be preserved as it passes
out through the tunnel.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 00db4124 03-Apr-2015 Ian Morris <ipm@chirality.org.uk>

ipv4: coding style: comparison for inequality with NULL

The ipv4 code uses a mixture of coding styles. In some instances check
for non-NULL pointer is done as x != NULL and sometimes as x. x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.

No changes detected by objdiff.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1e99584b 02-Apr-2015 Nicolas Dichtel <nicolas.dichtel@6wind.com>

ipip,gre,vti,sit: implement ndo_get_iflink

Don't use dev->iflink anymore.

CC: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 67b61f6c 29-Mar-2015 Jiri Benc <jbenc@redhat.com>

netlink: implement nla_get_in_addr and nla_get_in6_addr

Those are counterparts to nla_put_in_addr and nla_put_in6_addr.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 930345ea 29-Mar-2015 Jiri Benc <jbenc@redhat.com>

netlink: implement nla_put_in_addr and nla_put_in6_addr

IP addresses are often stored in netlink attributes. Add generic functions
to do that.

For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1728d4fa 15-Jan-2015 Nicolas Dichtel <nicolas.dichtel@6wind.com>

tunnels: advertise link netns via netlink

Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 20ea60ca 23-Nov-2014 lucien <lucien.xin@gmail.com>

ip_tunnel: the lack of vti_link_ops' dellink() cause kernel panic

Now the vti_link_ops do not point the .dellink, for fb tunnel device
(ip_vti0), the net_device will be removed as the default .dellink is
unregister_netdevice_queue,but the tunnel still in the tunnel list,
then if we add a new vti tunnel, in ip_tunnel_find():

hlist_for_each_entry_rcu(t, head, hash_node) {
if (local == t->parms.iph.saddr &&
remote == t->parms.iph.daddr &&
link == t->parms.link &&
==> type == t->dev->type &&
ip_tunnel_key_match(&t->parms, flags, key))
break;
}

the panic will happen, cause dev of ip_tunnel *t is null:
[ 3835.072977] IP: [<ffffffffa04103fd>] ip_tunnel_find+0x9d/0xc0 [ip_tunnel]
[ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0
[ 3835.073008] Oops: 0000 [#1] SMP
.....
[ 3835.073008] Stack:
[ 3835.073008] ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0
[ 3835.073008] ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858
[ 3835.073008] ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000
[ 3835.073008] Call Trace:
[ 3835.073008] [<ffffffffa0411924>] ip_tunnel_newlink+0x64/0x160 [ip_tunnel]
[ 3835.073008] [<ffffffffa040b2e3>] vti_newlink+0x43/0x70 [ip_vti]
[ 3835.073008] [<ffffffff8150d4da>] rtnl_newlink+0x4fa/0x5f0
[ 3835.073008] [<ffffffff812f68bb>] ? nla_strlcpy+0x5b/0x70
[ 3835.073008] [<ffffffff81508fb0>] ? rtnl_link_ops_get+0x40/0x60
[ 3835.073008] [<ffffffff8150d11f>] ? rtnl_newlink+0x13f/0x5f0
[ 3835.073008] [<ffffffff81509cf4>] rtnetlink_rcv_msg+0xa4/0x270
[ 3835.073008] [<ffffffff8126adf5>] ? sock_has_perm+0x75/0x90
[ 3835.073008] [<ffffffff81509c50>] ? rtnetlink_rcv+0x30/0x30
[ 3835.073008] [<ffffffff81529e39>] netlink_rcv_skb+0xa9/0xc0
[ 3835.073008] [<ffffffff81509c48>] rtnetlink_rcv+0x28/0x30
....

modprobe ip_vti
ip link del ip_vti0 type vti
ip link add ip_vti0 type vti
rmmod ip_vti

do that one or more times, kernel will panic.

fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in
which we skip the unregister of fb tunnel device. do the same on ip6_vti.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 02875878 05-Oct-2014 Eric Dumazet <edumazet@google.com>

net: better IFF_XMIT_DST_RELEASE support

Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1990e4f8 09-May-2014 Mathias Krause <minipli@googlemail.com>

vti: Simplify error handling in module init and exit

The error handling in the module init and exit functions can be
shortened to safe us some code.

1/ Remove the code duplications in the init function, jump straight to
the existing cleanup code by adding some labels. Also give the error
message some more value by telling the reason why loading the module has
failed. Furthermore fix the "IPSec" typo -- it should be "IPsec" instead.

2/ Remove the error handling in the exit function as the only legitimate
reason xfrm4_protocol_deregister() might fail is inet_del_protocol()
returning -1. That, in turn, means some other protocol handler had been
registered for this very protocol in the meantime. But that essentially
means we haven't been handling that protocol any more, anyway. What it
definitely means not is that we "can't deregister tunnel". Therefore
just get rid of that bogus warning. It's plain wrong.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 7c8e6b9c 07-Jun-2014 Dmitry Popov <ixaphire@qrator.net>

ip_vti: Fix 'ip tunnel add' with 'key' parameters

ip tunnel add remote 10.2.2.1 local 10.2.2.2 mode vti ikey 1 okey 2
translates to p->iflags = VTI_ISVTI|GRE_KEY and p->i_key = 1, but GRE_KEY !=
TUNNEL_KEY, so ip_tunnel_ioctl would set i_key to 0 (same story with o_key)
making us unable to create vti tunnels with [io]key via ip tunnel.

We cannot simply translate GRE_KEY to TUNNEL_KEY (as GRE module does) because
vti_tunnels with same local/remote addresses but different ikeys will be treated
as different then. So, imo the best option here is to move p->i_flags & *_KEY
check for vti tunnels from ip_tunnel.c to ip_vti.c and to think about [io]_mark
field for ip_tunnel_parm in the future.

Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6d004d6c 12-May-2014 Steffen Klassert <steffen.klassert@secunet.com>

vti: Use the tunnel mark for lookup in the error handlers.

We need to use the mark we get from the tunnels o_key to
lookup the right vti state in the error handlers. This patch
ensures that.

Fixes: df3893c1 ("vti: Update the ipv4 side to use it's own receive hook.")
Fixes: fa9ad96d ("vti6: Update the ipv6 side to use its own receive hook.")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# a3245236 16-Apr-2014 Steffen Klassert <steffen.klassert@secunet.com>

vti4: Don't count header length twice.

We currently count the size of LL_MAX_HEADER and struct iphdr
twice for vti4 devices, this leads to a wrong device mtu.
The size of LL_MAX_HEADER and struct iphdr is already counted in
ip_tunnel_bind_dev(), so don't do it again in vti_tunnel_init().

Fixes: b9959fd3 ("vti: switch to new ip tunnel code")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 8d89dcdf 11-Apr-2014 Nicolas Dichtel <nicolas.dichtel@6wind.com>

vti: don't allow to add the same tunnel twice

Before the patch, it was possible to add two times the same tunnel:
ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41
ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41

It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
argument dev->type, which was set only later (when calling ndo_init handler
in register_netdevice()). Let's set this type in the setup handler, which is
called before newlink handler.

Introduced by commit b9959fd3b0fa ("vti: switch to new ip tunnel code").

CC: Cong Wang <amwang@redhat.com>
CC: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 895de9a3 21-Feb-2014 Steffen Klassert <steffen.klassert@secunet.com>

vti4: Enable namespace changing

vti4 is now fully namespace aware, so allow namespace changing
for vti devices

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 6e2de802 21-Feb-2014 Steffen Klassert <steffen.klassert@secunet.com>

vti4: Check the tunnel endpoints of the xfrm state and the vti interface

The tunnel endpoints of the xfrm_state we got from the xfrm_lookup
must match the tunnel endpoints of the vti interface. This patch
ensures this matching.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 78a010cc 21-Feb-2014 Steffen Klassert <steffen.klassert@secunet.com>

vti4: Support inter address family tunneling.

With this patch we can tunnel ipv6 traffic via a vti4
interface. A vti4 interface can now have an ipv6 address
and ipv6 traffic can be routed via a vti4 interface.
The resulting traffic is xfrm transformed and tunneled
throuhg ipv4 if matching IPsec policies and states are
present.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# a34cd4f3 21-Feb-2014 Steffen Klassert <steffen.klassert@secunet.com>

vti4: Use the on xfrm_lookup returned dst_entry directly

We need to be protocol family indepenent to support
inter addresss family tunneling with vti. So use a
dst_entry instead of the ipv4 rtable in vti_tunnel_xmit.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# df3893c1 21-Feb-2014 Steffen Klassert <steffen.klassert@secunet.com>

vti: Update the ipv4 side to use it's own receive hook.

With this patch, vti uses the IPsec protocol multiplexer to
register it's own receive side hooks for ESP, AH and IPCOMP.

Vti now does the following on receive side:

1. Do an input policy check for the IPsec packet we received.
This is required because this packet could be already
prosecces by IPsec, so an inbuond policy check is needed.

2. Mark the packet with the i_key. The policy and the state
must match this key now. Policy and state belong to the outer
namespace and policy enforcement is done at the further layers.

3. Call the generic xfrm layer to do decryption and decapsulation.

4. Wait for a callback from the xfrm layer to properly clean the
skb to not leak informations on namespace and to update the
device statistics.

On transmit side:

1. Mark the packet with the o_key. The policy and the state
must match this key now.

2. Do a xfrm_lookup on the original packet with the mark applied.

3. Check if we got an IPsec route.

4. Clean the skb to not leak informations on namespace
transitions.

5. Attach the dst_enty we got from the xfrm_lookup to the skb.

6. Call dst_output to do the IPsec processing.

7. Do the device statistics.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 3acfa1e7 18-Jan-2014 Eric Dumazet <edumazet@google.com>

ipv4: be friend with drop monitor

Replace some dev_kfree_skb() with kfree_skb() calls when
we drop one skb, this might help bug tracking.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f84985f 03-Jan-2014 Li RongQing <roy.qing.li@gmail.com>

net: unify the pcpu_tstats and br_cpu_netstats as one

They are same, so unify them as one, pcpu_sw_netstats.

Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats
from if_tunnel and remove br_cpu_netstats from br_private.h

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 236c9f84 19-Nov-2013 fan.du <fan.du@windriver.com>

xfrm: Release dst if this dst is improper for vti tunnel

After searching rt by the vti tunnel dst/src parameter,
if this rt has neither attached to any transformation
nor the transformation is not tunnel oriented, this rt
should be released back to ip layer.

otherwise causing dst memory leakage.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7263a518 08-Oct-2013 Christophe Gouault <christophe.gouault@6wind.com>

vti: get rid of nf mark rule in prerouting

This patch fixes and improves the use of vti interfaces (while
lightly changing the way of configuring them).

Currently:

- it is necessary to identify and mark inbound IPsec
packets destined to each vti interface, via netfilter rules in
the mangle table at prerouting hook.

- the vti module cannot retrieve the right tunnel in input since
commit b9959fd3: vti tunnels all have an i_key, but the tunnel lookup
is done with flag TUNNEL_NO_KEY, so there no chance to retrieve them.

- the i_key is used by the outbound processing as a mark to lookup
for the right SP and SA bundle.

This patch uses the o_key to store the vti mark (instead of i_key) and
enables:

- to avoid the need for previously marking the inbound skbuffs via a
netfilter rule.
- to properly retrieve the right tunnel in input, only based on the IPsec
packet outer addresses.
- to properly perform an inbound policy check (using the tunnel o_key
as a mark).
- to properly perform an outbound SPD and SAD lookup (using the tunnel
o_key as a mark).
- to keep the current mark of the skbuff. The skbuff mark is neither
used nor changed by the vti interface. Only the vti interface o_key
is used.

SAs have a wildcard mark.
SPs have a mark equal to the vti interface o_key.

The vti interface must be created as follows (i_key = 0, o_key = mark):

ip link add vti1 mode vti local 1.1.1.1 remote 2.2.2.2 okey 1

The SPs attached to vti1 must be created as follows (mark = vti1 o_key):

ip xfrm policy add dir out mark 1 tmpl src 1.1.1.1 dst 2.2.2.2 \
proto esp mode tunnel
ip xfrm policy add dir in mark 1 tmpl src 2.2.2.2 dst 1.1.1.1 \
proto esp mode tunnel

The SAs are created with the default wildcard mark. There is no
distinction between global vs. vti SAs. Just their addresses will
possibly link them to a vti interface:

ip xfrm state add src 1.1.1.1 dst 2.2.2.2 proto esp spi 1000 mode tunnel \
enc "cbc(aes)" "azertyuiopqsdfgh"

ip xfrm state add src 2.2.2.2 dst 1.1.1.1 proto esp spi 2000 mode tunnel \
enc "cbc(aes)" "sqbdhgqsdjqjsdfh"

To avoid matching "global" (not vti) SPs in vti interfaces, global SPs
should no use the default wildcard mark, but explicitly match mark 0.

To avoid a double SPD lookup in input and output (in global and vti SPDs),
the NOPOLICY and NOXFRM options should be set on the vti interfaces:

echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_policy
echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_xfrm

The outgoing traffic is steered to vti1 by a route via the vti interface:

ip route add 192.168.0.0/16 dev vti1

The incoming IPsec traffic is steered to vti1 because its outer addresses
match the vti1 tunnel configuration.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aba82695 28-Aug-2013 Fan Du <fan.du@windriver.com>

{ipv4,xfrm}: Introduce xfrm_tunnel_notifier for xfrm tunnel mode callback

Some thoughts on IPv4 VTI implementation:

The connection between VTI receiving part and xfrm tunnel mode input process
is hardly a "xfrm_tunnel", xfrm_tunnel is used in places where, e.g ipip/sit
and xfrm4_tunnel, acts like a true "tunnel" device.

In addition, IMHO, VTI doesn't need vti_err to do something meaningful, as all
VTI needs is just a notifier to be called whenever xfrm_input ingress a packet
to update statistics.

A IPsec protected packet is first handled by protocol handlers, e.g AH/ESP,
to check packet authentication or encryption rightness. PMTU update is taken
care of in this stage by protocol error handler.

Then the packet is rearranged properly depending on whether it's transport
mode or tunnel mode packed by mode "input" handler. The VTI handler code
takes effects in this stage in tunnel mode only. So it neither need propagate
PMTU, as it has already been done if necessary, nor the VTI handler is
qualified as a xfrm_tunnel.

So this patch introduces xfrm_tunnel_notifier and meanwhile wipe out vti_err
code.

Signed-off-by: Fan Du <fan.du@windriver.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>


# 6c742e71 13-Aug-2013 Nicolas Dichtel <nicolas.dichtel@6wind.com>

ipip: add x-netns support

This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.

When one of the two netns is removed, the tunnel is destroyed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b9959fd3 20-Jul-2013 Amerigo Wang <amwang@redhat.com>

vti: switch to new ip tunnel code

GRE tunnel and IPIP tunnel already switched to the new
ip tunnel code, VTI tunnel can use it too.

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Saurabh Mohan <saurabh.mohan@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ab6c7a0a 28-Jun-2013 Cong Wang <amwang@redhat.com>

vti: remove duplicated code to fix a memory leak

vti module allocates dev->tstats twice: in vti_fb_tunnel_init()
and in vti_tunnel_init(), this lead to a memory leak of
dev->tstats.

Just remove the duplicated operations in vti_fb_tunnel_init().

(candidate for -stable)

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Saurabh Mohan <saurabh.mohan@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# baafc77b 10-Jun-2013 Saurabh Mohan <saurabh@vyatta.com>

net/ipv4: ip_vti clear skb cb before tunneling.

If users apply shaper to vti tunnel then it will cause a kernel crash. The
problem seems to be due to the vti_tunnel_xmit function not clearing
skb->opt field before passing the packet to xfrm tunneling code.

Signed-off-by: Saurabh Mohan <saurabh@vyatta.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f61dd388 25-Mar-2013 Pravin B Shelar <pshelar@nicira.com>

Tunneling: use IP Tunnel stats APIs.

Use common function get calculate rtnl_link_stats64 stats.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c5441932 25-Mar-2013 Pravin B Shelar <pshelar@nicira.com>

GRE: Refactor GRE tunneling code.

Following patch refactors GRE code into ip tunneling code and GRE
specific code. Common tunneling code is moved to ip_tunnel module.
ip_tunnel module is written as generic library which can be used
by different tunneling implementations.

ip_tunnel module contains following components:
- packet xmit and rcv generic code. xmit flow looks like
(gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out.
- hash table of all devices.
- lookup for tunnel devices.
- control plane operations like device create, destroy, ioctl, netlink
operations code.
- registration for tunneling modules, like gre, ipip etc.
- define single pcpu_tstats dev->tstats.
- struct tnl_ptk_info added to pass parsed tunnel packet parameters.

ipip.h header is renamed to ip_tunnel.h

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 52e804c6 15-Nov-2012 Eric W. Biederman <ebiederm@xmission.com>

net: Allow userns root to control ipv4

Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed
while resource control is left unchanged.

Allow creating raw sockets.
Allow the SIOCSARP ioctl to control the arp cache.
Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting gre tunnels.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipip tunnels.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipsec virtual tunnel interfaces.

Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
sockets.

Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
arbitrary ip options.

Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
Allow setting the IP_TRANSPARENT ipv4 socket option.
Allow setting the TCP_REPAIR socket option.
Allow setting the TCP_CONGESTION socket option.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b2942004 14-Nov-2012 Saurabh Mohan <saurabh.mohan@vyatta.com>

ipv4/ip_vti.c: VTI fix post-decryption forwarding

With the latest kernel there are two things that must be done post decryption
so that the packet are forwarded.
1. Remove the mark from the packet. This will cause the packet to not match
the ipsec-policy again. However doing this causes the post-decryption check to
fail also and the packet will get dropped. (cat /proc/net/xfrm_stat).
2. Remove the sp association in the skbuff so that no policy check is done on
the packet for VTI tunnels.

Due to #2 above we must now do a security-policy check in the vti rcv path
prior to resetting the mark in the skbuff.

Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
Reported-by: Ruben Herold <ruben@puettmann.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e086cadc 11-Nov-2012 Amerigo Wang <amwang@redhat.com>

net: unify for_each_ip_tunnel_rcu()

The defitions of for_each_ip_tunnel_rcu() are same,
so unify it. Also, don't hide the parameter 't'.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aa0010f8 11-Nov-2012 Amerigo Wang <amwang@redhat.com>

net: convert __IPTUNNEL_XMIT() to an inline function

__IPTUNNEL_XMIT() is an ugly macro, convert it to a static
inline function, so make it more readable.

IPTUNNEL_XMIT() is unused, just remove it.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8437e761 10-Oct-2012 stephen hemminger <shemminger@vyatta.com>

vti: fix sparse bit endian warnings

Use be32_to_cpu instead of htonl to keep sparse happy.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b0558ef2 24-Sep-2012 stephen hemminger <shemminger@vyatta.com>

xfrm: remove extranous rcu_read_lock

The handlers for xfrm_tunnel are always invoked with rcu read lock
already.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e7d4b18c 23-Jul-2012 Saurabh <saurabh.mohan@vyatta.com>

net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse.

With CONFIG_SPARSE_RCU_POINTER=y sparse identified references which did not
specificy __rcu in ip_vti.c

Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1181412c 17-Jul-2012 Saurabh <saurabh.mohan@vyatta.com>

net/ipv4: VTI support new module for ip_vti.

New VTI tunnel kernel module, Kconfig and Makefile changes.

Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
Reviewed-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>