History log of /linux-master/net/l2tp/l2tp_ip.c
Revision Date Author Comments
# a3522a2e 09-Feb-2024 Guillaume Nault <gnault@redhat.com>

ipv4: Set the routing scope properly in ip_route_output_ports().

Set scope automatically in ip_route_output_ports() (using the socket
SOCK_LOCALROUTE flag). This way, callers don't have to overload the
tos with the RTO_ONLINK flag, like RT_CONN_FLAGS() does.

For callers that don't pass a struct sock, this doesn't change anything
as the scope is still set to RT_SCOPE_UNIVERSE when sk is NULL.

Callers that passed a struct sock and used RT_CONN_FLAGS(sk) or
RT_CONN_FLAGS_TOS(sk, tos) for the tos are modified to use
ip_sock_tos(sk) and RT_TOS(tos) respectively, as overloading tos with
the RTO_ONLINK flag now becomes unnecessary.

In drivers/net/amt.c, all ip_route_output_ports() calls use a 0 tos
parameter, ignoring the SOCK_LOCALROUTE flag of the socket. But the sk
parameter is a kernel socket, which doesn't have any configuration path
for setting SOCK_LOCALROUTE anyway. Therefore, ip_route_output_ports()
will continue to initialise scope with RT_SCOPE_UNIVERSE and amt.c
doesn't need to be modified.

Also, remove RT_CONN_FLAGS() and RT_CONN_FLAGS_TOS() from route.h as
these macros are now unused.

The objective is to eventually remove RTO_ONLINK entirely to allow
converting ->flowi4_tos to dscp_t. This will ensure proper isolation
between the DSCP and ECN bits, thus minimising the risk of introducing
bugs where TOS values interfere with ECN.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/dacfd2ab40685e20959ab7b53c427595ba229e7d.1707496938.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c274af22 16-Aug-2023 Eric Dumazet <edumazet@google.com>

inet: introduce inet->inet_flags

Various inet fields are currently racy.

do_ip_setsockopt() and do_ip_getsockopt() are mostly holding
the socket lock, but some (fast) paths do not.

Use a new inet->inet_flags to hold atomic bits in the series.

Remove inet->cmsg_flags, and use instead 9 bits from inet_flags.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dc97391e 23-Jun-2023 David Howells <dhowells@redhat.com>

sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)

Remove ->sendpage() and ->sendpage_locked(). sendmsg() with
MSG_SPLICE_PAGES should be used instead. This allows multiple pages and
multipage folios to be passed through.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-afs@lists.infradead.org
cc: mptcp@lists.linux.dev
cc: rds-devel@oss.oracle.com
cc: tipc-discussion@lists.sourceforge.net
cc: virtualization@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20230623225513.2732256-16-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# e1d001fa 09-Jun-2023 Breno Leitao <leitao@debian.org>

net: ioctl: Use kernel memory on protocol ioctl callbacks

Most of the ioctls to net protocols operates directly on userspace
argument (arg). Usually doing get_user()/put_user() directly in the
ioctl callback. This is not flexible, because it is hard to reuse these
functions without passing userspace buffers.

Change the "struct proto" ioctls to avoid touching userspace memory and
operate on kernel buffers, i.e., all protocol's ioctl callbacks is
adapted to operate on a kernel memory other than on userspace (so, no
more {put,get}_user() and friends being called in the ioctl callback).

This changes the "struct proto" ioctl format in the following way:

int (*ioctl)(struct sock *sk, int cmd,
- unsigned long arg);
+ int *karg);

(Important to say that this patch does not touch the "struct proto_ops"
protocols)

So, the "karg" argument, which is passed to the ioctl callback, is a
pointer allocated to kernel space memory (inside a function wrapper).
This buffer (karg) may contain input argument (copied from userspace in
a prep function) and it might return a value/buffer, which is copied
back to userspace if necessary. There is not one-size-fits-all format
(that is I am using 'may' above), but basically, there are three type of
ioctls:

1) Do not read from userspace, returns a result to userspace
2) Read an input parameter from userspace, and does not return anything
to userspace
3) Read an input from userspace, and return a buffer to userspace.

The default case (1) (where no input parameter is given, and an "int" is
returned to userspace) encompasses more than 90% of the cases, but there
are two other exceptions. Here is a list of exceptions:

* Protocol RAW:
* cmd = SIOCGETVIFCNT:
* input and output = struct sioc_vif_req
* cmd = SIOCGETSGCNT
* input and output = struct sioc_sg_req
* Explanation: for the SIOCGETVIFCNT case, userspace passes the input
argument, which is struct sioc_vif_req. Then the callback populates
the struct, which is copied back to userspace.

* Protocol RAW6:
* cmd = SIOCGETMIFCNT_IN6
* input and output = struct sioc_mif_req6
* cmd = SIOCGETSGCNT_IN6
* input and output = struct sioc_sg_req6

* Protocol PHONET:
* cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE
* input int (4 bytes)
* Nothing is copied back to userspace.

For the exception cases, functions sock_sk_ioctl_inout() will
copy the userspace input, and copy it back to kernel space.

The wrapper that prepare the buffer and put the buffer back to user is
sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now
calls sk_ioctl(), which will handle all cases.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230609152800.830401-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 154e07c1 30-Mar-2023 Andrea Righi <andrea.righi@canonical.com>

l2tp: generate correct module alias strings

Commit 65b32f801bfb ("uapi: move IPPROTO_L2TP to in.h") moved the
definition of IPPROTO_L2TP from a define to an enum, but since
__stringify doesn't work properly with enums, we ended up breaking the
modalias strings for the l2tp modules:

$ modinfo l2tp_ip l2tp_ip6 | grep alias
alias: net-pf-2-proto-IPPROTO_L2TP
alias: net-pf-2-proto-2-type-IPPROTO_L2TP
alias: net-pf-10-proto-IPPROTO_L2TP
alias: net-pf-10-proto-2-type-IPPROTO_L2TP

Use the resolved number directly in MODULE_ALIAS_*() macros (as we
already do with SOCK_DGRAM) to fix the alias strings:

$ modinfo l2tp_ip l2tp_ip6 | grep alias
alias: net-pf-2-proto-115
alias: net-pf-2-proto-115-type-2
alias: net-pf-10-proto-115
alias: net-pf-10-proto-115-type-2

Moreover, fix the ordering of the parameters passed to
MODULE_ALIAS_NET_PF_PROTO_TYPE() by switching proto and type.

Fixes: 65b32f801bfb ("uapi: move IPPROTO_L2TP to in.h")
Link: https://lore.kernel.org/lkml/ZCQt7hmodtUaBlCP@righiandr-XPS-13-7390
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ff009403 13-May-2022 Eric Dumazet <edumazet@google.com>

l2tp: use add READ_ONCE() to fetch sk->sk_bound_dev_if

Use READ_ONCE() in paths not holding the socket lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ec095263 11-Apr-2022 Oliver Hartkopp <socketcan@hartkopp.net>

net: remove noblock parameter from recvmsg() entities

The internal recvmsg() functions have two parameters 'flags' and 'noblock'
that were merged inside skb_recv_datagram(). As a follow up patch to commit
f4b41f062c42 ("net: remove noblock parameter from skb_recv_datagram()")
this patch removes the separate 'noblock' parameter for recvmsg().

Analogue to the referenced patch for skb_recv_datagram() the 'flags' and
'noblock' parameters are unnecessarily split up with e.g.

err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
flags & ~MSG_DONTWAIT, &addr_len);

or in

err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg,
sk, msg, size, flags & MSG_DONTWAIT,
flags & ~MSG_DONTWAIT, &addr_len);

instead of simply using only flags all the time and check for MSG_DONTWAIT
where needed (to preserve for the formerly separated no(n)block condition).

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# f4b41f06 04-Apr-2022 Oliver Hartkopp <socketcan@hartkopp.net>

net: remove noblock parameter from skb_recv_datagram()

skb_recv_datagram() has two parameters 'flags' and 'noblock' that are
merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)'

As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags'
into 'flags' and 'noblock' with finally obsolete bit operations like this:

skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc);

And this is not even done consistently with the 'flags' parameter.

This patch removes the obsolete and costly splitting into two parameters
and only performs bit operations when really needed on the caller side.

One missing conversion thankfully reported by kernel test robot. I missed
to enable kunit tests to build the mctp code.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7f553ff2 07-Jun-2021 Zheng Yongjun <zhengyongjun3@huawei.com>

l2tp: Fix spelling mistakes

Fix some spelling mistakes in comments:
negociated ==> negotiated
dont ==> don't

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5796254e 17-May-2021 Yejune Deng <yejune.deng@gmail.com>

net: Remove the member netns_ok

Every protocol has the 'netns_ok' member and it is euqal to 1. The
'if (!prot->netns_ok)' always false in inet_add_protocol().

Signed-off-by: Yejune Deng <yejunedeng@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 45faeff1 03-Sep-2020 Tom Parkin <tparkin@katalix.com>

l2tp: make magic feather checks more useful

The l2tp tunnel and session structures contain a "magic feather" field
which was originally intended to help trace lifetime bugs in the code.

Since the introduction of the shared kernel refcount code in refcount.h,
and l2tp's porting to those APIs, we are covered by the refcount code's
checks and warnings. Duplicating those checks in the l2tp code isn't
useful.

However, magic feather checks are still useful to help to detect bugs
stemming from misuse/trampling of the sk_user_data pointer in struct
sock. The l2tp code makes extensive use of sk_user_data to stash
pointers to the tunnel and session structures, and if another subsystem
overwrites sk_user_data it's important to detect this.

As such, rework l2tp's magic feather checks to focus on validating the
tunnel and session data structures when they're extracted from
sk_user_data.

* Add a new accessor function l2tp_sk_to_tunnel which contains a magic
feather check, and is used by l2tp_core and l2tp_ip[6]
* Comment l2tp_udp_encap_recv which doesn't use this new accessor function
because of the specific nature of the codepath it is called in
* Drop l2tp_session_queue_purge's check on the session magic feather:
it is called from code which is walking the tunnel session list, and
hence doesn't need validation
* Drop l2tp_session_free's check on the tunnel magic feather: the
intention of this check is covered by refcount.h's reference count
sanity checking
* Add session magic validation in pppol2tp_ioctl. On failure return
-EBADF, which mirrors the approach in pppol2tp_[sg]etsockopt.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 12923365 22-Aug-2020 Tom Parkin <tparkin@katalix.com>

l2tp: don't log data frames

l2tp had logging to trace data frame receipt and transmission, including
code to dump packet contents. This was originally intended to aid
debugging of core l2tp packet handling, but is of limited use now that
code is stable.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ca7885db 28-Jul-2020 Tom Parkin <tparkin@katalix.com>

l2tp: tweak exports for l2tp_recv_common and l2tp_ioctl

All of the l2tp subsystem's exported symbols are exported using
EXPORT_SYMBOL_GPL, except for l2tp_recv_common and l2tp_ioctl.

These functions alone are not useful without the rest of the l2tp
infrastructure, so there's no practical benefit to these symbols using a
different export policy.

Change these exports to use EXPORT_SYMBOL_GPL for consistency with the
rest of l2tp.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 95075150 24-Jul-2020 Tom Parkin <tparkin@katalix.com>

l2tp: avoid multiple assignments

checkpatch warns about multiple assignments.

Update l2tp accordingly.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0febc7b3 22-Jul-2020 Tom Parkin <tparkin@katalix.com>

l2tp: cleanup comparisons to NULL

checkpatch warns about comparisons to NULL, e.g.

CHECK: Comparison to NULL could be written "!rt"
#474: FILE: net/l2tp/l2tp_ip.c:474:
+ if (rt == NULL) {

These sort of comparisons are generally clearer and more readable
the way checkpatch suggests, so update l2tp accordingly.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 20dcb110 22-Jul-2020 Tom Parkin <tparkin@katalix.com>

l2tp: cleanup comments

Modify some l2tp comments to better adhere to kernel coding style, as
reported by checkpatch.pl.

Add descriptive comments for the l2tp per-net spinlocks to document
their use.

Fix an incorrect comment in l2tp_recv_common:

RFC2661 section 5.4 states that:

"The LNS controls enabling and disabling of sequence numbers by sending a
data message with or without sequence numbers present at any time during
the life of a session."

l2tp handles this correctly in l2tp_recv_common, but the comment around
the code was incorrect and confusing. Fix up the comment accordingly.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b71a61cc 22-Jul-2020 Tom Parkin <tparkin@katalix.com>

l2tp: cleanup whitespace use

Fix up various whitespace issues as reported by checkpatch.pl:

* remove spaces around operators where appropriate,
* add missing blank lines following declarations,
* remove multiple blank lines, or trailing blank lines at the end of
functions.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b6238c04 17-Jul-2020 Christoph Hellwig <hch@lst.de>

net/ipv4: remove compat_ip_{get,set}sockopt

Handle the few cases that need special treatment in-line using
in_compat_syscall().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8c918ffb 17-Jul-2020 Christoph Hellwig <hch@lst.de>

net: remove compat_sock_common_{get,set}sockopt

Add the compat handling to sock_common_{get,set}sockopt instead,
keyed of in_compat_syscall(). This allow to remove the now unused
->compat_{get,set}sockopt methods from struct proto_ops.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 02c71b14 29-May-2020 Eric Dumazet <edumazet@google.com>

l2tp: do not use inet_hash()/inet_unhash()

syzbot recently found a way to crash the kernel [1]

Issue here is that inet_hash() & inet_unhash() are currently
only meant to be used by TCP & DCCP, since only these protocols
provide the needed hashinfo pointer.

L2TP uses a single list (instead of a hash table)

This old bug became an issue after commit 610236587600
("bpf: Add new cgroup attach type to enable sock modifications")
since after this commit, sk_common_release() can be called
while the L2TP socket is still considered 'hashed'.

general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 7063 Comm: syz-executor654 Not tainted 5.7.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:inet_unhash+0x11f/0x770 net/ipv4/inet_hashtables.c:600
Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e dd 04 00 00 48 8d 7d 08 44 8b 73 08 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 55 05 00 00 48 8d 7d 14 4c 8b 6d 08 48 b8 00 00
RSP: 0018:ffffc90001777d30 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff88809a6df940 RCX: ffffffff8697c242
RDX: 0000000000000001 RSI: ffffffff8697c251 RDI: 0000000000000008
RBP: 0000000000000000 R08: ffff88809f3ae1c0 R09: fffffbfff1514cc1
R10: ffffffff8a8a6607 R11: fffffbfff1514cc0 R12: ffff88809a6df9b0
R13: 0000000000000007 R14: 0000000000000000 R15: ffffffff873a4d00
FS: 0000000001d2b880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000006cd090 CR3: 000000009403a000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
sk_common_release+0xba/0x370 net/core/sock.c:3210
inet_create net/ipv4/af_inet.c:390 [inline]
inet_create+0x966/0xe00 net/ipv4/af_inet.c:248
__sock_create+0x3cb/0x730 net/socket.c:1428
sock_create net/socket.c:1479 [inline]
__sys_socket+0xef/0x200 net/socket.c:1521
__do_sys_socket net/socket.c:1530 [inline]
__se_sys_socket net/socket.c:1528 [inline]
__x64_sys_socket+0x6f/0xb0 net/socket.c:1528
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x441e29
Code: e8 fc b3 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffdce184148 EFLAGS: 00000246 ORIG_RAX: 0000000000000029
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441e29
RDX: 0000000000000073 RSI: 0000000000000002 RDI: 0000000000000002
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000402c30 R14: 0000000000000000 R15: 0000000000000000
Modules linked in:
---[ end trace 23b6578228ce553e ]---
RIP: 0010:inet_unhash+0x11f/0x770 net/ipv4/inet_hashtables.c:600
Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e dd 04 00 00 48 8d 7d 08 44 8b 73 08 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 55 05 00 00 48 8d 7d 14 4c 8b 6d 08 48 b8 00 00
RSP: 0018:ffffc90001777d30 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff88809a6df940 RCX: ffffffff8697c242
RDX: 0000000000000001 RSI: ffffffff8697c251 RDI: 0000000000000008
RBP: 0000000000000000 R08: ffff88809f3ae1c0 R09: fffffbfff1514cc1
R10: ffffffff8a8a6607 R11: fffffbfff1514cc0 R12: ffff88809a6df9b0
R13: 0000000000000007 R14: 0000000000000000 R15: ffffffff873a4d00
FS: 0000000001d2b880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000006cd090 CR3: 000000009403a000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Reported-by: syzbot+3610d489778b57cc8031@syzkaller.appspotmail.com


# 895b5c9f 29-Sep-2019 Florian Westphal <fw@strlen.de>

netfilter: drop bridge nf reset from nf_reset

commit 174e23810cd31
("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
recycle always drop skb extensions. The additional skb_ext_del() that is
performed via nf_reset on napi skb recycle is not needed anymore.

Most nf_reset() calls in the stack are there so queued skb won't block
'rmmod nf_conntrack' indefinitely.

This removes the skb_ext_del from nf_reset, and renames it to a more
fitting nf_reset_ct().

In a few selected places, add a call to skb_ext_reset to make sure that
no active extensions remain.

I am submitting this for "net", because we're still early in the release
cycle. The patch applies to net-next too, but I think the rename causes
needless divergence between those trees.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# c7cbdbf2 17-Apr-2019 Arnd Bergmann <arnd@arndb.de>

net: rework SIOCGSTAMP ioctl handling

The SIOCGSTAMP/SIOCGSTAMPNS ioctl commands are implemented by many
socket protocol handlers, and all of those end up calling the same
sock_get_timestamp()/sock_get_timestampns() helper functions, which
results in a lot of duplicate code.

With the introduction of 64-bit time_t on 32-bit architectures, this
gets worse, as we then need four different ioctl commands in each
socket protocol implementation.

To simplify that, let's add a new .gettstamp() operation in
struct proto_ops, and move ioctl implementation into the common
sock_ioctl()/compat_sock_ioctl_trans() functions that these all go
through.

We can reuse the sock_get_timestamp() implementation, but generalize
it so it can deal with both native and compat mode, as well as
timeval and timespec structures.

Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Link: https://lore.kernel.org/lkml/CAK8P3a038aDQQotzua_QtKGhq8O9n+rdiz2=WDCp82ys8eUT+A@mail.gmail.com/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4522a70d 29-Jan-2019 Jacob Wen <jian.w.wen@oracle.com>

l2tp: fix reading optional fields of L2TPv3

Use pskb_may_pull() to make sure the optional fields are in skb linear
parts, so we can safely read them later.

It's easy to reproduce the issue with a net driver that supports paged
skb data. Just create a L2TPv3 over IP tunnel and then generates some
network traffic.
Once reproduced, rx err in /sys/kernel/debug/l2tp/tunnels will increase.

Changes in v4:
1. s/l2tp_v3_pull_opt/l2tp_v3_ensure_opt_in_linear/
2. s/tunnel->version != L2TP_HDR_VER_2/tunnel->version == L2TP_HDR_VER_3/
3. Add 'Fixes' in commit messages.

Changes in v3:
1. To keep consistency, move the code out of l2tp_recv_common.
2. Use "net" instead of "net-next", since this is a bug fix.

Changes in v2:
1. Only fix L2TPv3 to make code simple.
To fix both L2TPv3 and L2TPv2, we'd better refactor l2tp_recv_common.
It's complicated to do so.
2. Reloading pointers after pskb_may_pull

Fixes: f7faffa3ff8e ("l2tp: Add L2TPv3 protocol support")
Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support")
Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 01e28b92 10-Aug-2018 Guillaume Nault <g.nault@alphalink.fr>

l2tp: split l2tp_session_get()

l2tp_session_get() is used for two different purposes. If 'tunnel' is
NULL, the session is searched globally in the supplied network
namespace. Otherwise it is searched exclusively in the tunnel context.

Callers always know the context in which they need to search the
session. But some of them do provide both a namespace and a tunnel,
making the semantic of the call unclear.

This patch defines l2tp_tunnel_get_session() for lookups done in a
tunnel and restricts l2tp_session_get() to namespace searches.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2b139e6b 25-Jul-2018 Guillaume Nault <g.nault@alphalink.fr>

l2tp: remove ->recv_payload_hook

The tunnel reception hook is only used by l2tp_ppp for skipping PPP
framing bytes. This is a session specific operation, but once a PPP
session sets ->recv_payload_hook on its tunnel, all frames received by
the tunnel will enter pppol2tp_recv_payload_hook(), including those
targeted at Ethernet sessions (an L2TPv3 tunnel can multiplex PPP and
Ethernet sessions).

So this mechanism is wrong, and uselessly complex. Let's just move this
functionality to the pppol2tp rx handler and drop ->recv_payload_hook.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a11e1d43 28-Jun-2018 Linus Torvalds <torvalds@linux-foundation.org>

Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL

The poll() changes were not well thought out, and completely
unexplained. They also caused a huge performance regression, because
"->poll()" was no longer a trivial file operation that just called down
to the underlying file operations, but instead did at least two indirect
calls.

Indirect calls are sadly slow now with the Spectre mitigation, but the
performance problem could at least be largely mitigated by changing the
"->get_poll_head()" operation to just have a per-file-descriptor pointer
to the poll head instead. That gets rid of one of the new indirections.

But that doesn't fix the new complexity that is completely unwarranted
for the regular case. The (undocumented) reason for the poll() changes
was some alleged AIO poll race fixing, but we don't make the common case
slower and more complex for some uncommon special case, so this all
really needs way more explanations and most likely a fundamental
redesign.

[ This revert is a revert of about 30 different commits, not reverted
individually because that would just be unnecessarily messy - Linus ]

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# db5051ea 09-Apr-2018 Christoph Hellwig <hch@lst.de>

net: convert datagram_poll users tp ->poll_mask

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d00fa9ad 23-Feb-2018 James Chapman <jchapman@katalix.com>

l2tp: fix races with tunnel socket close

The tunnel socket tunnel->sock (struct sock) is accessed when
preparing a new ppp session on a tunnel at pppol2tp_session_init. If
the socket is closed by a thread while another is creating a new
session, the threads race. In pppol2tp_connect, the tunnel object may
be created if the pppol2tp socket is associated with the special
session_id 0 and the tunnel socket is looked up using the provided
fd. When handling this, pppol2tp_connect cannot sock_hold the tunnel
socket to prevent it being destroyed during pppol2tp_connect since
this may itself may race with the socket being destroyed. Doing
sockfd_lookup in pppol2tp_connect isn't sufficient to prevent
tunnel->sock going away either because a given tunnel socket fd may be
reused between calls to pppol2tp_connect. Instead, have
l2tp_tunnel_create sock_hold the tunnel socket before it does
sockfd_put. This ensures that the tunnel's socket is always extant
while the tunnel object exists. Hold a ref on the socket until the
tunnel is destroyed and ensure that all tunnel destroy paths go
through a common function (l2tp_tunnel_delete) since this will do the
final sock_put to release the tunnel socket.

Since the tunnel's socket is now guaranteed to exist if the tunnel
exists, we no longer need to use sockfd_lookup via l2tp_sock_to_tunnel
to derive the tunnel from the socket since this is always
sk_user_data.

Also, sessions no longer sock_hold the tunnel socket since sessions
already hold a tunnel ref and the tunnel sock will not be freed until
the tunnel is freed. Removing these sock_holds in
l2tp_session_register avoids a possible sock leak in the
pppol2tp_connect error path if l2tp_session_register succeeds but
attaching a ppp channel fails. The pppol2tp_connect error path could
have been fixed instead and have the sock ref dropped when the session
is freed, but doing a sock_put of the tunnel socket when the session
is freed would require a new session_free callback. It is simpler to
just remove the sock_hold of the tunnel socket in
l2tp_session_register, now that the tunnel socket lifetime is
guaranteed.

Finally, some init code in l2tp_tunnel_create is reordered to ensure
that the new tunnel object's refcount is set and the tunnel socket ref
is taken before the tunnel socket destructor callbacks are set.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 4360 Comm: syzbot_19c09769 Not tainted 4.16.0-rc2+ #34
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:pppol2tp_session_init+0x1d6/0x500
RSP: 0018:ffff88001377fb40 EFLAGS: 00010212
RAX: dffffc0000000000 RBX: ffff88001636a940 RCX: ffffffff84836c1d
RDX: 0000000000000045 RSI: 0000000055976744 RDI: 0000000000000228
RBP: ffff88001377fb60 R08: ffffffff84836bc8 R09: 0000000000000002
R10: ffff88001377fab8 R11: 0000000000000001 R12: 0000000000000000
R13: ffff88001636aac8 R14: ffff8800160f81c0 R15: 1ffff100026eff76
FS: 00007ffb3ea66700(0000) GS:ffff88001a400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020e77000 CR3: 0000000016261000 CR4: 00000000000006f0
Call Trace:
pppol2tp_connect+0xd18/0x13c0
? pppol2tp_session_create+0x170/0x170
? __might_fault+0x115/0x1d0
? lock_downgrade+0x860/0x860
? __might_fault+0xe5/0x1d0
? security_socket_connect+0x8e/0xc0
SYSC_connect+0x1b6/0x310
? SYSC_bind+0x280/0x280
? __do_page_fault+0x5d1/0xca0
? up_read+0x1f/0x40
? __do_page_fault+0x3c8/0xca0
SyS_connect+0x29/0x30
? SyS_accept+0x40/0x40
do_syscall_64+0x1e0/0x730
? trace_hardirqs_off_thunk+0x1a/0x1c
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7ffb3e376259
RSP: 002b:00007ffeda4f6508 EFLAGS: 00000202 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000020e77012 RCX: 00007ffb3e376259
RDX: 000000000000002e RSI: 0000000020e77000 RDI: 0000000000000004
RBP: 00007ffeda4f6540 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000400b60
R13: 00007ffeda4f6660 R14: 0000000000000000 R15: 0000000000000000
Code: 80 3d b0 ff 06 02 00 0f 84 07 02 00 00 e8 13 d6 db fc 49 8d bc 24 28 02 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 f
a 48 c1 ea 03 <80> 3c 02 00 0f 85 ed 02 00 00 4d 8b a4 24 28 02 00 00 e8 13 16

Fixes: 80d84ef3ff1dd ("l2tp: prevent l2tp_tunnel_delete racing with userspace close")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9b2c45d4 12-Feb-2018 Denys Vlasenko <dvlasenk@redhat.com>

net: make getname() functions return length rather than use int* parameter

Changes since v1:
Added changes in these files:
drivers/infiniband/hw/usnic/usnic_transport.c
drivers/staging/lustre/lnet/lnet/lib-socket.c
drivers/target/iscsi/iscsi_target_login.c
drivers/vhost/net.c
fs/dlm/lowcomms.c
fs/ocfs2/cluster/tcp.c
security/tomoyo/network.c

Before:
All these functions either return a negative error indicator,
or store length of sockaddr into "int *socklen" parameter
and return zero on success.

"int *socklen" parameter is awkward. For example, if caller does not
care, it still needs to provide on-stack storage for the value
it does not need.

None of the many FOO_getname() functions of various protocols
ever used old value of *socklen. They always just overwrite it.

This change drops this parameter, and makes all these functions, on success,
return length of sockaddr. It's always >= 0 and can be differentiated
from an error.

Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

rpc_sockname() lost "int buflen" parameter, since its only use was
to be passed to kernel_getsockname() as &buflen and subsequently
not used in any way.

Userspace API is not changed.

text data bss dec hex filename
30108430 2633624 873672 33615726 200ef6e vmlinux.before.o
30108109 2633612 873672 33615393 200ee21 vmlinux.o

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-bluetooth@vger.kernel.org
CC: linux-decnet-user@lists.sourceforge.net
CC: linux-wireless@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: linux-sctp@vger.kernel.org
CC: linux-nfs@vger.kernel.org
CC: linux-x25@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f7dc9ae 03-Nov-2017 Guillaume Nault <g.nault@alphalink.fr>

l2tp: don't use l2tp_tunnel_find() in l2tp_ip and l2tp_ip6

Using l2tp_tunnel_find() in l2tp_ip_recv() is wrong for two reasons:

* It doesn't take a reference on the returned tunnel, which makes the
call racy wrt. concurrent tunnel deletion.

* The lookup is only based on the tunnel identifier, so it can return
a tunnel that doesn't match the packet's addresses or protocol.

For example, a packet sent to an L2TPv3 over IPv6 tunnel can be
delivered to an L2TPv2 over UDPv4 tunnel. This is worse than a simple
cross-talk: when delivering the packet to an L2TP over UDP tunnel, the
corresponding socket is UDP, where ->sk_backlog_rcv() is NULL. Calling
sk_receive_skb() will then crash the kernel by trying to execute this
callback.

And l2tp_tunnel_find() isn't even needed here. __l2tp_ip_bind_lookup()
properly checks the socket binding and connection settings. It was used
as a fallback mechanism for finding tunnels that didn't have their data
path registered yet. But it's not limited to this case and can be used
to replace l2tp_tunnel_find() in the general case.

Fix l2tp_ip6 in the same way.

Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support")
Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a4346210 31-Oct-2017 Guillaume Nault <g.nault@alphalink.fr>

l2tp: remove ->ref() and ->deref()

The ->ref() and ->deref() callbacks are unused since PPP stopped using
them in ee40fb2e1eb5 ("l2tp: protect sock pointer of struct pppol2tp_session with RCU").

We can thus remove them from struct l2tp_session and drop the do_ref
parameter of l2tp_session_get*().

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 61b9a047 31-Mar-2017 Guillaume Nault <g.nault@alphalink.fr>

l2tp: fix race in l2tp_recv_common()

Taking a reference on sessions in l2tp_recv_common() is racy; this
has to be done by the callers.

To this end, a new function is required (l2tp_session_get()) to
atomically lookup a session and take a reference on it. Callers then
have to manually drop this reference.

Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 94d7ee0b 29-Mar-2017 Guillaume Nault <g.nault@alphalink.fr>

l2tp: hold tunnel socket when handling control frames in l2tp_ip and l2tp_ip6

The code following l2tp_tunnel_find() expects that a new reference is
held on sk. Either sk_receive_skb() or the discard_put error path will
drop a reference from the tunnel's socket.

This issue exists in both l2tp_ip and l2tp_ip6.

Fixes: a3c18422a4b4 ("l2tp: hold socket before dropping lock in l2tp_ip{, 6}_recv()")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 51fb60eb 26-Feb-2017 Paul Hüber <phueber@kernsp.in>

l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv

l2tp_ip_backlog_recv may not return -1 if the packet gets dropped.
The return value is passed up to ip_local_deliver_finish, which treats
negative values as an IP protocol number for resubmission.

Signed-off-by: Paul Hüber <phueber@kernsp.in>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 72fb96e7 09-Feb-2017 Eric Dumazet <edumazet@google.com>

l2tp: do not use udp_ioctl()

udp_ioctl(), as its name suggests, is used by UDP protocols,
but is also used by L2TP :(

L2TP should use its own handler, because it really does not
look the same.

SIOCINQ for instance should not assume UDP checksum or headers.

Thanks to Andrey and syzkaller team for providing the report
and a nice reproducer.

While crashes only happen on recent kernels (after commit
7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")), this
probably needs to be backported to older kernels.

Fixes: 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")
Fixes: 85584672012e ("udp: Fix udp_poll() and ioctl()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c5fdae04 06-Jan-2017 Guillaume Nault <g.nault@alphalink.fr>

l2tp: rework socket comparison in __l2tp_ip*_bind_lookup()

Split conditions, so that each test becomes clearer.

Also, for l2tp_ip, check if "laddr" is 0. This prevents a socket from
binding to the unspecified address when other sockets are already bound
using the same device (if any), connection ID and namespace.

Same thing for l2tp_ip6: add ipv6_addr_any(laddr) and
ipv6_addr_any(raddr) tests to ensure that an IPv6 unspecified address
passed as parameter is properly treated a wildcard.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 986f7cbc 06-Jan-2017 Guillaume Nault <g.nault@alphalink.fr>

l2tp: remove useless NULL check in __l2tp_ip*_bind_lookup()

If "l2tp" was NULL, that'd mean "sk" is NULL too. This can't happen
since "sk" is returned by sk_for_each_bound().

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bb39b0bd 06-Jan-2017 Guillaume Nault <g.nault@alphalink.fr>

l2tp: make __l2tp_ip*_bind_lookup() parameters 'const'

Add const qualifier wherever possible for __l2tp_ip_bind_lookup() and
__l2tp_ip6_bind_lookup().

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8cf2f704 06-Jan-2017 Guillaume Nault <g.nault@alphalink.fr>

l2tp: remove redundant addr_len check in l2tp_ip_bind()

addr_len's value has already been verified at this point.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a9b2dff8 30-Dec-2016 Guillaume Nault <g.nault@alphalink.fr>

l2tp: take remote address into account in l2tp_ip and l2tp_ip6 socket lookups

For connected sockets, __l2tp_ip{,6}_bind_lookup() needs to check the
remote IP when looking for a matching socket. Otherwise a connected
socket can receive traffic not originating from its peer.

Drop l2tp_ip_bind_lookup() and l2tp_ip6_bind_lookup() instead of
updating their prototype, as these functions aren't used.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# df90e688 29-Nov-2016 Guillaume Nault <g.nault@alphalink.fr>

l2tp: fix lookup for sockets not bound to a device in l2tp_ip

When looking up an l2tp socket, we must consider a null netdevice id as
wild card. There are currently two problems caused by
__l2tp_ip_bind_lookup() not considering 'dif' as wild card when set to 0:

* A socket bound to a device (i.e. with sk->sk_bound_dev_if != 0)
never receives any packet. Since __l2tp_ip_bind_lookup() is called
with dif == 0 in l2tp_ip_recv(), sk->sk_bound_dev_if is always
different from 'dif' so the socket doesn't match.

* Two sockets, one bound to a device but not the other, can be bound
to the same address. If the first socket binding to the address is
the one that is also bound to a device, the second socket can bind
to the same address without __l2tp_ip_bind_lookup() noticing the
overlap.

To fix this issue, we need to consider that any null device index, be
it 'sk->sk_bound_dev_if' or 'dif', matches with any other value.
We also need to pass the input device index to __l2tp_ip_bind_lookup()
on reception so that sockets bound to a device never receive packets
from other devices.

This patch fixes l2tp_ip6 in the same way.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d5e3a190 29-Nov-2016 Guillaume Nault <g.nault@alphalink.fr>

l2tp: fix racy socket lookup in l2tp_ip and l2tp_ip6 bind()

It's not enough to check for sockets bound to same address at the
beginning of l2tp_ip{,6}_bind(): even if no socket is found at that
time, a socket with the same address could be bound before we take
the l2tp lock again.

This patch moves the lookup right before inserting the new socket, so
that no change can ever happen to the list between address lookup and
socket insertion.

Care is taken to avoid side effects on the socket in case of failure.
That is, modifications of the socket are done after the lookup, when
binding is guaranteed to succeed, and before releasing the l2tp lock,
so that concurrent lookups will always see fully initialised sockets.

For l2tp_ip, 'ret' is set to -EINVAL before checking the SOCK_ZAPPED
bit. Error code was mistakenly set to -EADDRINUSE on error by commit
32c231164b76 ("l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()").
Using -EINVAL restores original behaviour.

For l2tp_ip6, the lookup is now always done with the correct bound
device. Before this patch, when binding to a link-local address, the
lookup was done with the original sk->sk_bound_dev_if, which was later
overwritten with addr->l2tp_scope_id. Lookup is now performed with the
final sk->sk_bound_dev_if value.

Finally, the (addr_len >= sizeof(struct sockaddr_in6)) check has been
dropped: addr is a sockaddr_l2tpip6 not sockaddr_in6 and addr_len has
already been checked at this point (this part of the code seems to have
been copy-pasted from net/ipv6/raw.c).

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a3c18422 29-Nov-2016 Guillaume Nault <g.nault@alphalink.fr>

l2tp: hold socket before dropping lock in l2tp_ip{, 6}_recv()

Socket must be held while under the protection of the l2tp lock; there
is no guarantee that sk remains valid after the read_unlock_bh() call.

Same issue for l2tp_ip and l2tp_ip6.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0382a25a 29-Nov-2016 Guillaume Nault <g.nault@alphalink.fr>

l2tp: lock socket before checking flags in connect()

Socket flags aren't updated atomically, so the socket must be locked
while reading the SOCK_ZAPPED flag.

This issue exists for both l2tp_ip and l2tp_ip6. For IPv6, this patch
also brings error handling for __ip6_datagram_connect() failures.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 32c23116 18-Nov-2016 Guillaume Nault <g.nault@alphalink.fr>

l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()

Lock socket before checking the SOCK_ZAPPED flag in l2tp_ip6_bind().
Without lock, a concurrent call could modify the socket flags between
the sock_flag(sk, SOCK_ZAPPED) test and the lock_sock() call. This way,
a socket could be inserted twice in l2tp_ip6_bind_table. Releasing it
would then leave a stale pointer there, generating use-after-free
errors when walking through the list or modifying adjacent entries.

BUG: KASAN: use-after-free in l2tp_ip6_close+0x22e/0x290 at addr ffff8800081b0ed8
Write of size 8 by task syz-executor/10987
CPU: 0 PID: 10987 Comm: syz-executor Not tainted 4.8.0+ #39
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
ffff880031d97838 ffffffff829f835b ffff88001b5a1640 ffff8800081b0ec0
ffff8800081b15a0 ffff8800081b6d20 ffff880031d97860 ffffffff8174d3cc
ffff880031d978f0 ffff8800081b0e80 ffff88001b5a1640 ffff880031d978e0
Call Trace:
[<ffffffff829f835b>] dump_stack+0xb3/0x118 lib/dump_stack.c:15
[<ffffffff8174d3cc>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
[< inline >] print_address_description mm/kasan/report.c:194
[<ffffffff8174d666>] kasan_report_error+0x1f6/0x4d0 mm/kasan/report.c:283
[< inline >] kasan_report mm/kasan/report.c:303
[<ffffffff8174db7e>] __asan_report_store8_noabort+0x3e/0x40 mm/kasan/report.c:329
[< inline >] __write_once_size ./include/linux/compiler.h:249
[< inline >] __hlist_del ./include/linux/list.h:622
[< inline >] hlist_del_init ./include/linux/list.h:637
[<ffffffff8579047e>] l2tp_ip6_close+0x22e/0x290 net/l2tp/l2tp_ip6.c:239
[<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
[<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
[<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
[<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
[<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
[<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
[<ffffffff813774f9>] task_work_run+0xf9/0x170
[<ffffffff81324aae>] do_exit+0x85e/0x2a00
[<ffffffff81326dc8>] do_group_exit+0x108/0x330
[<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
[<ffffffff811b49af>] do_signal+0x7f/0x18f0
[<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
[< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
[<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
[<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Object at ffff8800081b0ec0, in cache L2TP/IPv6 size: 1448
Allocated:
PID = 10987
[ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
[ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
[ 1116.897025] [<ffffffff8174c9ad>] kasan_kmalloc+0xad/0xe0
[ 1116.897025] [<ffffffff8174cee2>] kasan_slab_alloc+0x12/0x20
[ 1116.897025] [< inline >] slab_post_alloc_hook mm/slab.h:417
[ 1116.897025] [< inline >] slab_alloc_node mm/slub.c:2708
[ 1116.897025] [< inline >] slab_alloc mm/slub.c:2716
[ 1116.897025] [<ffffffff817476a8>] kmem_cache_alloc+0xc8/0x2b0 mm/slub.c:2721
[ 1116.897025] [<ffffffff84c4f6a9>] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1326
[ 1116.897025] [<ffffffff84c58ac8>] sk_alloc+0x38/0xae0 net/core/sock.c:1388
[ 1116.897025] [<ffffffff851ddf67>] inet6_create+0x2d7/0x1000 net/ipv6/af_inet6.c:182
[ 1116.897025] [<ffffffff84c4af7b>] __sock_create+0x37b/0x640 net/socket.c:1153
[ 1116.897025] [< inline >] sock_create net/socket.c:1193
[ 1116.897025] [< inline >] SYSC_socket net/socket.c:1223
[ 1116.897025] [<ffffffff84c4b46f>] SyS_socket+0xef/0x1b0 net/socket.c:1203
[ 1116.897025] [<ffffffff85e4d685>] entry_SYSCALL_64_fastpath+0x23/0xc6
Freed:
PID = 10987
[ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
[ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
[ 1116.897025] [<ffffffff8174cf61>] kasan_slab_free+0x71/0xb0
[ 1116.897025] [< inline >] slab_free_hook mm/slub.c:1352
[ 1116.897025] [< inline >] slab_free_freelist_hook mm/slub.c:1374
[ 1116.897025] [< inline >] slab_free mm/slub.c:2951
[ 1116.897025] [<ffffffff81748b28>] kmem_cache_free+0xc8/0x330 mm/slub.c:2973
[ 1116.897025] [< inline >] sk_prot_free net/core/sock.c:1369
[ 1116.897025] [<ffffffff84c541eb>] __sk_destruct+0x32b/0x4f0 net/core/sock.c:1444
[ 1116.897025] [<ffffffff84c5aca4>] sk_destruct+0x44/0x80 net/core/sock.c:1452
[ 1116.897025] [<ffffffff84c5ad33>] __sk_free+0x53/0x220 net/core/sock.c:1460
[ 1116.897025] [<ffffffff84c5af23>] sk_free+0x23/0x30 net/core/sock.c:1471
[ 1116.897025] [<ffffffff84c5cb6c>] sk_common_release+0x28c/0x3e0 ./include/net/sock.h:1589
[ 1116.897025] [<ffffffff8579044e>] l2tp_ip6_close+0x1fe/0x290 net/l2tp/l2tp_ip6.c:243
[ 1116.897025] [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
[ 1116.897025] [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
[ 1116.897025] [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
[ 1116.897025] [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
[ 1116.897025] [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
[ 1116.897025] [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
[ 1116.897025] [<ffffffff813774f9>] task_work_run+0xf9/0x170
[ 1116.897025] [<ffffffff81324aae>] do_exit+0x85e/0x2a00
[ 1116.897025] [<ffffffff81326dc8>] do_group_exit+0x108/0x330
[ 1116.897025] [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
[ 1116.897025] [<ffffffff811b49af>] do_signal+0x7f/0x18f0
[ 1116.897025] [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
[ 1116.897025] [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
[ 1116.897025] [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
[ 1116.897025] [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Memory state around the buggy address:
ffff8800081b0d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8800081b0e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8800081b0e80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
^
ffff8800081b0f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8800081b0f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

==================================================================

The same issue exists with l2tp_ip_bind() and l2tp_ip_bind_table.

Fixes: c51ce49735c1 ("l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case")
Reported-by: Baozeng Ding <sploving1@gmail.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 286c72de 20-Oct-2016 Eric Dumazet <edumazet@google.com>

udp: must lock the socket in udp_disconnect()

Baozeng Ding reported KASAN traces showing uses after free in
udp_lib_get_port() and other related UDP functions.

A CONFIG_DEBUG_PAGEALLOC=y kernel would eventually crash.

I could write a reproducer with two threads doing :

static int sock_fd;
static void *thr1(void *arg)
{
for (;;) {
connect(sock_fd, (const struct sockaddr *)arg,
sizeof(struct sockaddr_in));
}
}

static void *thr2(void *arg)
{
struct sockaddr_in unspec;

for (;;) {
memset(&unspec, 0, sizeof(unspec));
connect(sock_fd, (const struct sockaddr *)&unspec,
sizeof(unspec));
}
}

Problem is that udp_disconnect() could run without holding socket lock,
and this was causing list corruptions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5745b823 03-Apr-2016 Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

ipv4: l2tp: fix a potential issue in l2tp_ip_recv

pskb_may_pull() can change skb->data, so we have to load ptr/optr at the
right place.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 163c2e25 23-Sep-2015 stephen hemminger <stephen@networkplumber.org>

l2tp: auto load IP modules

When creating a IP encapsulated tunnel the necessary l2tp module
should be loaded. It already works for UDP encapsulation, it just
doesn't work for direct IP encap.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1b784140 02-Mar-2015 Ying Xue <ying.xue@windriver.com>

net: Remove iocb argument from sendmsg and recvmsg

After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.

Cc: Christoph Hellwig <hch@lst.de>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6ce8e9ce 06-Apr-2014 Al Viro <viro@zeniv.linux.org.uk>

new helper: memcpy_from_msg()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 51f3d02b 05-Nov-2014 David S. Miller <davem@davemloft.net>

net: Add and use skb_copy_datagram_msg() helper.

This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".

When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.

Having a helper like this means there will be less places to touch
during that transformation.

Based upon descriptions and patch from Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>


# b26ba202 23-May-2014 Tom Herbert <therbert@google.com>

net: Eliminate no_check from protosw

It doesn't seem like an protocols are setting anything other
than the default, and allowing to arbitrarily disable checksums
for a whole protocol seems dangerous. This can be done on a per
socket basis.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b0270e91 14-Apr-2014 Eric Dumazet <edumazet@google.com>

ipv4: add a sock pointer to ip_queue_xmit()

ip_queue_xmit() assumes the skb it has to transmit is attached to an
inet socket. Commit 31c70d5956fc ("l2tp: keep original skb ownership")
changed l2tp to not change skb ownership and thus broke this assumption.

One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
so that we do not assume skb->sk points to the socket used by l2tp
tunnel.

Fixes: 31c70d5956fc ("l2tp: keep original skb ownership")
Reported-by: Zhan Jianyu <nasa4836@gmail.com>
Tested-by: Zhan Jianyu <nasa4836@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 342dfc30 17-Jan-2014 Steffen Hurrle <steffen@hurrle.net>

net: add build-time checks for msg->msg_name size

This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
handler msg_name and msg_namelen logic").

DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.

Signed-off-by: Steffen Hurrle <steffen@hurrle.net>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bceaa902 17-Nov-2013 Hannes Frederic Sowa <hannes@stressinduktion.org>

inet: prevent leakage of uninitialized memory to user in recv syscalls

Only update *addr_len when we actually fill in sockaddr, otherwise we
can return uninitialized memory from the stack to the caller in the
recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
checks because we only get called with a valid addr_len pointer either
from sock_common_recvmsg or inet_recvmsg.

If a blocking read waits on a socket which is concurrently shut down we
now return zero and set msg_msgnamelen to 0.

Reported-by: mpb <mpb.mail@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 93606317 19-Mar-2013 Tom Parkin <tparkin@katalix.com>

l2tp: close sessions in ip socket destroy callback

l2tp_core hooks UDP's .destroy handler to gain advance warning of a tunnel
socket being closed from userspace. We need to do the same thing for
IP-encapsulation sockets.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b67bfe0d 27-Feb-2013 Sasha Levin <sasha.levin@oracle.com>

hlist: drop the node parameter from iterators

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9d6ddb19 05-Feb-2013 David S. Miller <davem@davemloft.net>

l2tp: Make ipv4 protocol handler namespace aware.

The infrastructure is already pretty much entirely there
to allow this conversion.

The tunnel and session lookups have per-namespace tables,
and the ipv4 bind lookup includes the namespace in the
lookup key.

Set netns_ok in l2tp_ip_protocol.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 4399a4df 08-Jun-2012 Eric Dumazet <edumazet@google.com>

l2tp: fix a race in l2tp_ip_sendmsg()

Commit 081b1b1bb27f (l2tp: fix l2tp_ip_sendmsg() route handling) added
a race, in case IP route cache is disabled.

In this case, we should not do the dst_release(&rt->dst), since it'll
free the dst immediately, instead of waiting a RCU grace period.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Cc: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c51ce497 28-May-2012 James Chapman <jchapman@katalix.com>

l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case

An application may call connect() to disconnect a socket using an
address with family AF_UNSPEC. The L2TP IP sockets were not handling
this case when the socket is not bound and an attempt to connect()
using AF_UNSPEC in such cases would result in an oops. This patch
addresses the problem by protecting the sk_prot->disconnect() call
against trying to unhash the socket before it is bound.

The L2TP IPv4 and IPv6 sockets have the same problem. Both are fixed
by this patch.

The patch also adds more checks that the sockaddr supplied to bind()
and connect() calls is valid.

RIP: 0010:[<ffffffff82e133b0>] [<ffffffff82e133b0>] inet_unhash+0x50/0xd0
RSP: 0018:ffff88001989be28 EFLAGS: 00010293
Stack:
ffff8800407a8000 0000000000000000 ffff88001989be78 ffffffff82e3a249
ffffffff82e3a050 ffff88001989bec8 ffff88001989be88 ffff8800407a8000
0000000000000010 ffff88001989bec8 ffff88001989bea8 ffffffff82e42639
Call Trace:
[<ffffffff82e3a249>] udp_disconnect+0x1f9/0x290
[<ffffffff82e42639>] inet_dgram_connect+0x29/0x80
[<ffffffff82d012fc>] sys_connect+0x9c/0x100

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a4ca44fa 16-May-2012 Joe Perches <joe@perches.com>

net: l2tp: Standardize logging styles

Use more current logging styles.

Add pr_fmt to prefix output appropriately.
Convert printks to pr_<level>.
Convert PRINTK macros to new l2tp_<level> macros.
Neaten some <foo>_refcount debugging macros.
Use print_hex_dump_bytes instead of hand-coded loops.
Coalesce formats and align arguments.

Some KERN_DEBUG output is not now emitted unless
dynamic_debugging is enabled.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 84768edb 01-May-2012 Sasha Levin <levinsasha928@gmail.com>

net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg

l2tp_ip_sendmsg could return without releasing socket lock, making it all the
way to userspace, and generating the following warning:

[ 130.891594] ================================================
[ 130.894569] [ BUG: lock held when returning to user space! ]
[ 130.897257] 3.4.0-rc5-next-20120501-sasha #104 Tainted: G W
[ 130.900336] ------------------------------------------------
[ 130.902996] trinity/8384 is leaving the kernel with locks still held!
[ 130.906106] 1 lock held by trinity/8384:
[ 130.907924] #0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff82b9503f>] l2tp_ip_sendmsg+0x2f/0x550

Introduced by commit 2f16270 ("l2tp: Fix locking in l2tp_ip.c").

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c8657fd5 29-Apr-2012 James Chapman <jchapman@katalix.com>

l2tp: remove unused stats from l2tp_ip socket

The l2tp_ip socket currently maintains packet/byte stats in its
private socket structure. But these counters aren't exposed to
userspace and so serve no purpose. The counters were also
smp-unsafe. So this patch just gets rid of the stats.

While here, change a couple of internal __u32 variables to u32.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# de3c7a18 29-Apr-2012 James Chapman <jchapman@katalix.com>

l2tp: Use ip4_datagram_connect() in l2tp_ip_connect()

Cleanup the l2tp_ip code to make use of an existing ipv4 support function.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c9be48dc 09-Apr-2012 James Chapman <jchapman@katalix.com>

l2tp: don't overwrite source address in l2tp_ip_bind()

Applications using L2TP/IP sockets want to be able to bind() an L2TP/IP
socket to set the local tunnel id while leaving the auto-assigned source
address alone. So if no source address is supplied, don't overwrite
the address already stored in the socket.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d1f224ae 09-Apr-2012 James Chapman <jchapman@katalix.com>

l2tp: fix refcount leak in l2tp_ip sockets

The l2tp_ip socket close handler does not update the module refcount
correctly which prevents module unload after the first bind() call on
an L2TPv3 IP encapulation socket.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 68315801 24-Jan-2012 James Chapman <jchapman@katalix.com>

l2tp: l2tp_ip - fix possible oops on packet receive

When a packet is received on an L2TP IP socket (L2TPv3 IP link
encapsulation), the l2tpip socket's backlog_rcv function calls
xfrm4_policy_check(). This is not necessary, since it was called
before the skb was added to the backlog. With CONFIG_NET_NS enabled,
xfrm4_policy_check() will oops if skb->dev is null, so this trivial
patch removes the call.

This bug has always been present, but only when CONFIG_NET_NS is
enabled does it cause problems. Most users are probably using UDP
encapsulation for L2TP, hence the problem has only recently
surfaced.

EIP: 0060:[<c12bb62b>] EFLAGS: 00210246 CPU: 0
EIP is at l2tp_ip_recvmsg+0xd4/0x2a7
EAX: 00000001 EBX: d77b5180 ECX: 00000000 EDX: 00200246
ESI: 00000000 EDI: d63cbd30 EBP: d63cbd18 ESP: d63cbcf4
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Call Trace:
[<c1218568>] sock_common_recvmsg+0x31/0x46
[<c1215c92>] __sock_recvmsg_nosec+0x45/0x4d
[<c12163a1>] __sock_recvmsg+0x31/0x3b
[<c1216828>] sock_recvmsg+0x96/0xab
[<c10b2693>] ? might_fault+0x47/0x81
[<c10b2693>] ? might_fault+0x47/0x81
[<c1167fd0>] ? _copy_from_user+0x31/0x115
[<c121e8c8>] ? copy_from_user+0x8/0xa
[<c121ebd6>] ? verify_iovec+0x3e/0x78
[<c1216604>] __sys_recvmsg+0x10a/0x1aa
[<c1216792>] ? sock_recvmsg+0x0/0xab
[<c105a99b>] ? __lock_acquire+0xbdf/0xbee
[<c12d5a99>] ? do_page_fault+0x193/0x375
[<c10d1200>] ? fcheck_files+0x9b/0xca
[<c10d1259>] ? fget_light+0x2a/0x9c
[<c1216bbb>] sys_recvmsg+0x2b/0x43
[<c1218145>] sys_socketcall+0x16d/0x1a5
[<c11679f0>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c100305f>] sysenter_do_call+0x12/0x38
Code: c6 05 8c ea a8 c1 01 e8 0c d4 d9 ff 85 f6 74 07 3e ff 86 80 00 00 00 b9 17 b6 2b c1 ba 01 00 00 00 b8 78 ed 48 c1 e8 23 f6 d9 ff <ff> 76 0c 68 28 e3 30 c1 68 2d 44 41 c1 e8 89 57 01 00 83 c4 0c

Signed-off-by: James Chapman <jchapman@katalix.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 081b1b1b 11-Jun-2011 Eric Dumazet <eric.dumazet@gmail.com>

l2tp: fix l2tp_ip_sendmsg() route handling

l2tp_ip_sendmsg() in non connected mode incorrectly calls
sk_setup_caps(). Subsequent send() calls send data to wrong destination.

We can also avoid changing dst refcount in connected mode, using
appropriate rcu locking. Once output route lookups can also be done
under rcu, sendto() calls wont change dst refcounts too.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@conan.davemloft.net>


# d9d8da80 06-May-2011 David S. Miller <davem@davemloft.net>

inet: Pass flowi to ->queue_xmit().

This allows us to acquire the exact route keying information from the
protocol, however that might be managed.

It handles all of the possibilities, from the simplest case of storing
the key in inet->cork.fl to the more complex setup SCTP has where
individual transports determine the flow.

Signed-off-by: David S. Miller <davem@davemloft.net>


# fdbb0f07 08-May-2011 David S. Miller <davem@davemloft.net>

l2tp: Use cork flow in l2tp_ip_connect() and l2tp_ip_sendmsg()

Now that the socket is consistently locked in these two routines,
this transformation is legal.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 2f16270f 08-May-2011 David S. Miller <davem@davemloft.net>

l2tp: Fix locking in l2tp_ip.c

Both l2tp_ip_connect() and l2tp_ip_sendmsg() must take the socket
lock. They both modify socket state non-atomically, and in particular
l2tp_ip_sendmsg() increments socket private counters without using
atomic operations.
Signed-off-by: David S. Miller <davem@davemloft.net>


# 31e4543d 03-May-2011 David S. Miller <davem@davemloft.net>

ipv4: Make caller provide on-stack flow key to ip_route_output_ports().

Signed-off-by: David S. Miller <davem@davemloft.net>


# ad227425 29-Apr-2011 David S. Miller <davem@davemloft.net>

ipv4: Get route daddr from flow key in l2tp_ip_connect().

Now that output route lookups update the flow with
destination address selection, we can fetch it from
fl4->daddr instead of rt->rt_dst

Signed-off-by: David S. Miller <davem@davemloft.net>


# 44901666 29-Apr-2011 David S. Miller <davem@davemloft.net>

ipv4: Fetch route saddr from flow key in l2tp_ip_connect().

Now that output route lookups update the flow with
source address selection, we can fetch it from
fl4->saddr instead of rt->rt_src

Signed-off-by: David S. Miller <davem@davemloft.net>


# 778865a5 28-Apr-2011 David S. Miller <davem@davemloft.net>

l2tp: Fix inet_opt conversion.

We don't actually hold the socket lock at this point, so the
rcu_dereference_protected() isn't' correct. Thanks to Eric
Dumazet for pointing this out.

Thankfully, we're only interested in fetching the faddr value
if srr is enabled, so we can simply make this an RCU sequence
and use plain rcu_dereference().

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f6d8bd05 21-Apr-2011 Eric Dumazet <eric.dumazet@gmail.com>

inet: add RCU protection to inet->opt

We lack proper synchronization to manipulate inet->opt ip_options

Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.

Another thread can change inet->opt pointer and free old one under us.

Use RCU to protect inet->opt (changed to inet->inet_opt).

Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.

We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2d7192d6 26-Apr-2011 David S. Miller <davem@davemloft.net>

ipv4: Sanitize and simplify ip_route_{connect,newports}()

These functions are used together as a unit for route resolution
during connect(). They address the chicken-and-egg problem that
exists when ports need to be allocated during connect() processing,
yet such port allocations require addressing information from the
routing code.

It's currently more heavy handed than it needs to be, and in
particular we allocate and initialize a flow object twice.

Let the callers provide the on-stack flow object. That way we only
need to initialize it once in the ip_route_connect() call.

Later, if ip_route_newports() needs to do anything, it re-uses that
flow object as-is except for the ports which it updates before the
route re-lookup.

Also, describe why this set of facilities are needed and how it works
in a big comment.

Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>


# e9c54999 27-Apr-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Revert wrong fixes for common misspellings

These changes were incorrectly fixed by codespell. They were now
manually corrected.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 78fbfd8a 11-Mar-2011 David S. Miller <davem@davemloft.net>

ipv4: Create and use route lookup helpers.

The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.

Signed-off-by: David S. Miller <davem@davemloft.net>


# b23dd4fe 02-Mar-2011 David S. Miller <davem@davemloft.net>

ipv4: Make output route lookup return rtable directly.

Instead of on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 273447b3 01-Mar-2011 David S. Miller <davem@davemloft.net>

ipv4: Kill can_sleep arg to ip_route_output_flow()

This boolean state is now available in the flow flags.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 420d44da 01-Mar-2011 David S. Miller <davem@davemloft.net>

ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep"

Since that is what the current vague "flags" argument means.

Signed-off-by: David S. Miller <davem@davemloft.net>


# abdf7e72 01-Mar-2011 David S. Miller <davem@davemloft.net>

ipv4: Can final ip_route_connect() arg to boolean "can_sleep".

Since that's what the current vague "flags" thing means.

Signed-off-by: David S. Miller <davem@davemloft.net>


# e8d34a88 05-Dec-2010 Michal Marek <mmarek@suse.cz>

l2tp: Fix modalias of l2tp_ip

Using the SOCK_DGRAM enum results in
"net-pf-2-proto-SOCK_DGRAM-type-115", so use the numeric value like it
is done in net/dccp.

Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5811662b 12-Nov-2010 Changli Gao <xiaosuo@gmail.com>

net: use the macros defined for the members of flowi

Use the macros defined for the members of flowi to clean the code up.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fc130840 21-Oct-2010 stephen hemminger <shemminger@vyatta.com>

l2tp: make local function static

Also moved the refcound inlines from l2tp_core.h to l2tp_core.c
since only used in that one file.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e83726b0 21-Oct-2010 Eric Dumazet <eric.dumazet@gmail.com>

l2tp: small cleanup

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d8d1f30b 11-Jun-2010 Changli Gao <xiaosuo@gmail.com>

net-next: remove useless union keyword

remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4e15ed4d 15-Apr-2010 Shan Wei <shanwei@cn.fujitsu.com>

net: replace ipfragok with skb->local_df

As Herbert Xu said: we should be able to simply replace ipfragok
with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function)
has droped ipfragok and set local_df value properly.

The patch kills the ipfragok parameter of .queue_xmit().

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0d76751f 02-Apr-2010 James Chapman <jchapman@katalix.com>

l2tp: Add L2TPv3 IP encapsulation (no UDP) support

This patch adds a new L2TPIP socket family and modifies the core to
handle the case where there is no UDP header in the L2TP
packet. L2TP/IP uses IP protocol 115. Since L2TP/UDP and L2TP/IP
packets differ in layout, the datapath packet handling code needs
changes too. Userspace uses an L2TPIP socket instead of a UDP socket
when IP encapsulation is required.

We can't use raw sockets for this because the semantics of raw sockets
don't lend themselves to the socket-per-tunnel model - we need to

Signed-off-by: David S. Miller <davem@davemloft.net>