History log of /linux-master/drivers/net/tap.c
Revision Date Author Comments
# 55c90047 27-Oct-2023 Jakub Kicinski <kuba@kernel.org>

net: fill in MODULE_DESCRIPTION()s under drivers/net/

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().

Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 92272ec4 02-Aug-2023 Jakub Kicinski <kuba@kernel.org>

eth: add missing xdp.h includes in drivers

Handful of drivers currently expect to get xdp.h by virtue
of including netdevice.h. This will soon no longer be the case
so add explicit includes.

Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-2-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>


# 37dfe5b8 01-Aug-2023 Eric Dumazet <edumazet@google.com>

net: tap: change tap_alloc_skb() to allow bigger paged allocations

tap_alloc_skb() is currently calling sock_alloc_send_pskb()
forcing order-0 page allocations.

Switch to PAGE_ALLOC_COSTLY_ORDER, to increase max size by 8x.

Also add logic to increase the linear part if needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tahsin Erdogan <trdgn@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230801205254.400094-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5c9241f3 31-Jul-2023 Laszlo Ersek <lersek@redhat.com>

net: tap_open(): set sk_uid from current_fsuid()

Commit 66b2c338adce initializes the "sk_uid" field in the protocol socket
(struct sock) from the "/dev/tapX" device node's owner UID. Per original
commit 86741ec25462 ("net: core: Add a UID field to struct sock.",
2016-11-04), that's wrong: the idea is to cache the UID of the userspace
process that creates the socket. Commit 86741ec25462 mentions socket() and
accept(); with "tap", the action that creates the socket is
open("/dev/tapX").

Therefore the device node's owner UID is irrelevant. In most cases,
"/dev/tapX" will be owned by root, so in practice, commit 66b2c338adce has
no observable effect:

- before, "sk_uid" would be zero, due to undefined behavior
(CVE-2023-1076),

- after, "sk_uid" would be zero, due to "/dev/tapX" being owned by root.

What matters is the (fs)UID of the process performing the open(), so cache
that in "sk_uid".

Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pietro Borrello <borrello@diag.uniroma1.it>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 66b2c338adce ("tap: tap_open(): correctly initialize socket uid")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d457a0e3 08-Jun-2023 Eric Dumazet <edumazet@google.com>

net: move gso declarations and functions to their own files

Move declarations into include/net/gso.h and code into net/core/gso.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 4063384e 09-May-2023 Eric Dumazet <edumazet@google.com>

net: add vlan_get_protocol_and_depth() helper

Before blamed commit, pskb_may_pull() was used instead
of skb_header_pointer() in __vlan_get_protocol() and friends.

Few callers depended on skb->head being populated with MAC header,
syzbot caught one of them (skb_mac_gso_segment())

Add vlan_get_protocol_and_depth() to make the intent clearer
and use it where sensible.

This is a more generic fix than commit e9d3f80935b6
("net/af_packet: make sure to pull mac header") which was
dealing with a similar issue.

kernel BUG at include/linux/skbuff.h:2655 !
invalid opcode: 0000 [#1] SMP KASAN
CPU: 0 PID: 1441 Comm: syz-executor199 Not tainted 6.1.24-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023
RIP: 0010:__skb_pull include/linux/skbuff.h:2655 [inline]
RIP: 0010:skb_mac_gso_segment+0x68f/0x6a0 net/core/gro.c:136
Code: fd 48 8b 5c 24 10 44 89 6b 70 48 c7 c7 c0 ae 0d 86 44 89 e6 e8 a1 91 d0 00 48 c7 c7 00 af 0d 86 48 89 de 31 d2 e8 d1 4a e9 ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
RSP: 0018:ffffc90001bd7520 EFLAGS: 00010286
RAX: ffffffff8469736a RBX: ffff88810f31dac0 RCX: ffff888115a18b00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90001bd75e8 R08: ffffffff84697183 R09: fffff5200037adf9
R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000012
R13: 000000000000fee5 R14: 0000000000005865 R15: 000000000000fed7
FS: 000055555633f300(0000) GS:ffff8881f6a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000000 CR3: 0000000116fea000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
[<ffffffff847018dd>] __skb_gso_segment+0x32d/0x4c0 net/core/dev.c:3419
[<ffffffff8470398a>] skb_gso_segment include/linux/netdevice.h:4819 [inline]
[<ffffffff8470398a>] validate_xmit_skb+0x3aa/0xee0 net/core/dev.c:3725
[<ffffffff84707042>] __dev_queue_xmit+0x1332/0x3300 net/core/dev.c:4313
[<ffffffff851a9ec7>] dev_queue_xmit+0x17/0x20 include/linux/netdevice.h:3029
[<ffffffff851b4a82>] packet_snd net/packet/af_packet.c:3111 [inline]
[<ffffffff851b4a82>] packet_sendmsg+0x49d2/0x6470 net/packet/af_packet.c:3142
[<ffffffff84669a12>] sock_sendmsg_nosec net/socket.c:716 [inline]
[<ffffffff84669a12>] sock_sendmsg net/socket.c:736 [inline]
[<ffffffff84669a12>] __sys_sendto+0x472/0x5f0 net/socket.c:2139
[<ffffffff84669c75>] __do_sys_sendto net/socket.c:2151 [inline]
[<ffffffff84669c75>] __se_sys_sendto net/socket.c:2147 [inline]
[<ffffffff84669c75>] __x64_sys_sendto+0xe5/0x100 net/socket.c:2147
[<ffffffff8551d40f>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff8551d40f>] do_syscall_64+0x2f/0x50 arch/x86/entry/common.c:80
[<ffffffff85600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 469aceddfa3e ("vlan: consolidate VLAN parsing code and limit max parsing depth")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f758bfec 07-Mar-2023 Jens Axboe <axboe@kernel.dk>

tap: add support for IOCB_NOWAIT

The tap driver already supports passing in nonblocking state based
on O_NONBLOCK, add support for checking IOCB_NOWAIT as well. With that
done, we can flag it with FMODE_NOWAIT as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/8f859870-e6e2-09ca-9c0f-a2aa7c984fb2@kernel.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 66b2c338 04-Feb-2023 Pietro Borrello <borrello@diag.uniroma1.it>

tap: tap_open(): correctly initialize socket uid

sock_init_data() assumes that the `struct socket` passed in input is
contained in a `struct socket_alloc` allocated with sock_alloc().
However, tap_open() passes a `struct socket` embedded in a `struct
tap_queue` allocated with sk_alloc().
This causes a type confusion when issuing a container_of() with
SOCK_INODE() in sock_init_data() which results in assigning a wrong
sk_uid to the `struct sock` in input.
On default configuration, the type confused field overlaps with
padding bytes between `int vnet_hdr_sz` and `struct tap_dev __rcu
*tap` in `struct tap_queue`, which makes the uid of all tap sockets 0,
i.e., the root one.
Fix the assignment by using sock_init_data_uid().

Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.")
Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 399e0827 07-Dec-2022 Andrew Melnychenko <andrew@daynix.com>

driver/net/tun: Added features for USO.

Added support for USO4 and USO6.
For now, to "enable" USO, it's required to set both USO4 and USO6 simultaneously.
USO enables NETIF_F_GSO_UDP_L4.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4f61f133 03-Aug-2022 Cezar Bulinaru <cbulinaru@gmail.com>

net: tap: NULL pointer derefence in dev_parse_header_protocol when skb->dev is null

Fixes a NULL pointer derefence bug triggered from tap driver.
When tap_get_user calls virtio_net_hdr_to_skb the skb->dev is null
(in tap.c skb->dev is set after the call to virtio_net_hdr_to_skb)
virtio_net_hdr_to_skb calls dev_parse_header_protocol which
needs skb->dev field to be valid.

The line that trigers the bug is in dev_parse_header_protocol
(dev is at offset 0x10 from skb and is stored in RAX register)
if (!dev->header_ops || !dev->header_ops->parse_protocol)
22e1: mov 0x10(%rbx),%rax
22e5: mov 0x230(%rax),%rax

Setting skb->dev before the call in tap.c fixes the issue.

BUG: kernel NULL pointer dereference, address: 0000000000000230
RIP: 0010:virtio_net_hdr_to_skb.constprop.0+0x335/0x410 [tap]
Code: c0 0f 85 b7 fd ff ff eb d4 41 39 c6 77 cf 29 c6 48 89 df 44 01 f6 e8 7a 79 83 c1 48 85 c0 0f 85 d9 fd ff ff eb b7 48 8b 43 10 <48> 8b 80 30 02 00 00 48 85 c0 74 55 48 8b 40 28 48 85 c0 74 4c 48
RSP: 0018:ffffc90005c27c38 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888298f25300 RCX: 0000000000000010
RDX: 0000000000000005 RSI: ffffc90005c27cb6 RDI: ffff888298f25300
RBP: ffffc90005c27c80 R08: 00000000ffffffea R09: 00000000000007e8
R10: ffff88858ec77458 R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000000014 R14: ffffc90005c27e08 R15: ffffc90005c27cb6
FS: 0000000000000000(0000) GS:ffff88858ec40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000230 CR3: 0000000281408006 CR4: 00000000003706e0
Call Trace:
tap_get_user+0x3f1/0x540 [tap]
tap_sendmsg+0x56/0x362 [tap]
? get_tx_bufs+0xc2/0x1e0 [vhost_net]
handle_tx_copy+0x114/0x670 [vhost_net]
handle_tx+0xb0/0xe0 [vhost_net]
handle_tx_kick+0x15/0x20 [vhost_net]
vhost_worker+0x7b/0xc0 [vhost]
? vhost_vring_call_reset+0x40/0x40 [vhost]
kthread+0xfa/0x120
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30

Fixes: 924a9bc362a5 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct")
Signed-off-by: Cezar Bulinaru <cbulinaru@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 736f16de 04-Mar-2022 Dongli Zhang <dongli.zhang@oracle.com>

net: tap: track dropped skb via kfree_skb_reason()

The TAP can be used as vhost-net backend. E.g., the tap_handle_frame() is
the interface to forward the skb from TAP to vhost-net/virtio-net.

However, there are many "goto drop" in the TAP driver. Therefore, the
kfree_skb_reason() is involved at each "goto drop" to help userspace
ftrace/ebpf to track the reason for the loss of packets.

The below reasons are introduced:

- SKB_DROP_REASON_SKB_CSUM
- SKB_DROP_REASON_SKB_GSO_SEG
- SKB_DROP_REASON_SKB_UCOPY_FAULT
- SKB_DROP_REASON_DEV_HDR
- SKB_DROP_REASON_FULL_RING

Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 74a335a0 02-Mar-2022 Harold Huang <baymaxhuang@gmail.com>

tuntap: add sanity checks about msg_controllen in sendmsg

In patch [1], tun_msg_ctl was added to allow pass batched xdp buffers to
tun_sendmsg. Although we donot use msg_controllen in this path, we should
check msg_controllen to make sure the caller pass a valid msg_ctl.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe8dd45bb7556246c6b76277b1ba4296c91c2505

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220303022441.383865-1-baymaxhuang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 3b23a32a 11-Feb-2021 Cong Wang <cong.wang@bytedance.com>

net: fix dev_ifsioc_locked() race condition

dev_ifsioc_locked() is called with only RCU read lock, so when
there is a parallel writer changing the mac address, it could
get a partially updated mac address, as shown below:

Thread 1 Thread 2
// eth_commit_mac_addr_change()
memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
// dev_ifsioc_locked()
memcpy(ifr->ifr_hwaddr.sa_data,
dev->dev_addr,...);

Close this race condition by guarding them with a RW semaphore,
like netdev_get_name(). We can not use seqlock here as it does not
allow blocking. The writers already take RTNL anyway, so this does
not affect the slow path. To avoid bothering existing
dev_set_mac_address() callers in drivers, introduce a new wrapper
just for user-facing callers on ioctl and rtnetlink paths.

Note, bonding also changes slave mac addresses but that requires
a separate patch due to the complexity of bonding code.

Fixes: 3710becf8a58 ("net: RCU locking for simple ioctl()")
Reported-by: "Gong, Sishuai" <sishuai@purdue.edu>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b69df260 14-Jan-2021 Menglong Dong <dong.menglong@zte.com.cn>

net: tap: check vlan with eth_type_vlan() method

Replace some checks for ETH_P_8021Q and ETH_P_8021AD in
drivers/net/tap.c with eth_type_vlan.

Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Link: https://lore.kernel.org/r/20210115023238.4681-1-dong.menglong@zte.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 9ee5e5ad 06-Jan-2021 Jonathan Lemon <jonathan.lemon@gmail.com>

tap/tun: add skb_zcopy_init() helper for initialization.

Replace direct assignments with skb_zcopy_init() for zerocopy
cases where a new skb is initialized, without changing the
reference counts.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 06b4feb3 06-Jan-2021 Jonathan Lemon <jonathan.lemon@gmail.com>

net: group skb_shinfo zerocopy related bits together.

In preparation for expanded zerocopy (TX and RX), move
the zerocopy related bits out of tx_flags into their own
flag word.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 36177832 06-Jan-2021 Jonathan Lemon <jonathan.lemon@gmail.com>

skbuff: Add skb parameter to the ubuf zerocopy callback

Add an optional skb parameter to the zerocopy callback parameter,
which is passed down from skb_zcopy_clear(). This gives access
to the original skb, which is needed for upcoming RX zero-copy
error handling.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5643a552 08-Jan-2020 Jason A. Donenfeld <Jason@zx2c4.com>

net: tap: use skb_list_walk_safe helper for gso segments

This is a straight-forward conversion case for the new function, and
while we're at it, we can remove a null write to skb->next by replacing
it with skb_mark_not_on_list.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 407e9ef7 11-Sep-2018 Arnd Bergmann <arnd@arndb.de>

compat_ioctl: move drivers to compat_ptr_ioctl

Each of these drivers has a copy of the same trivial helper function to
convert the pointer argument and then call the native ioctl handler.

We now have a generic implementation of that, so use it.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# faeacb6d 27-Sep-2019 Colin Ian King <colin.king@canonical.com>

net: tap: clean up an indentation issue

There is a statement that is indented too deeply, remove
the extraneous tab.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 333f7909 05-Jul-2019 Al Viro <viro@zeniv.linux.org.uk>

coallocate socket_wq with socket itself

socket->wq is assign-once, set when we are initializing both
struct socket it's in and struct socket_wq it points to. As the
matter of fact, the only reason for separate allocation was the
ability to RCU-delay freeing of socket_wq. RCU-delaying the
freeing of socket itself gets rid of that need, so we can just
fold struct socket_wq into the end of struct socket and simplify
the life both for sock_alloc_inode() (one allocation instead of
two) and for tun/tap oddballs, where we used to embed struct socket
and struct socket_wq into the same structure (now - embedding just
the struct socket).

Note that reference to struct socket_wq in struct sock does remain
a reference - that's unchanged.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 09c434b8 19-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Add SPDX license identifier for more missed files

Add SPDX license identifiers to all files which:

- Have no license information of any form

- Have MODULE_LICENCE("GPL*") inside which was used in the initial
scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d2aa125d 20-Feb-2019 Maxim Mikityanskiy <maximmi@mellanox.com>

net: Don't set transport offset to invalid value

If the socket was created with socket(AF_PACKET, SOCK_RAW, 0),
skb->protocol will be unset, __skb_flow_dissect() will fail, and
skb_probe_transport_header() will fall back to the offset_hint, making
the resulting skb_transport_offset incorrect.

If, however, there is no transport header in the packet,
transport_header shouldn't be set to an arbitrary value.

Fix it by leaving the transport offset unset if it couldn't be found, to
be explicit rather than to fill it with some wrong value. It changes the
behavior, but if some code relied on the old behavior, it would be
broken anyway, as the old one is incorrect.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8c76e77f 30-Dec-2018 Willem de Bruijn <willemb@google.com>

tap: call skb_probe_transport_header after setting skb->dev

The BPF flow dissector expects either skb->sk or skb->dev set on
all skbs. Delay flow dissection until after skb->dev is set.

This requires calling from within an rcu read-side critical section.
That is fine, see also the call from tun_xdp_one.

Fixes: d0e13a1488ad ("flow_dissector: lookup netns by skb->sk if skb->dev is NULL")
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3a37a963 13-Dec-2018 Petr Machata <petrm@mellanox.com>

net: dev: Add extack argument to dev_set_mac_address()

A follow-up patch will add a notifier type NETDEV_PRE_CHANGEADDR, which
allows vetoing of MAC address changes. One prominent path to that
notification is through dev_set_mac_address(). Therefore give this
function an extack argument, so that it can be packed together with the
notification. Thus a textual reason for rejection (or a warning) can be
communicated back to the user.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 144a6adf 20-Sep-2018 zhong jiang <zhongjiang@huawei.com>

net: tap: remove redundant null pointer check before kfree_skb

kfree_skb has taken the null pointer into account. hence it is safe
to remove the redundant null pointer check before kfree_skb.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0efac277 11-Sep-2018 Jason Wang <jasowang@redhat.com>

tap: accept an array of XDP buffs through sendmsg()

This patch implement TUN_MSG_PTR msg_control type. This type allows
the caller to pass an array of XDP buffs to tuntap through ptr field
of the tun_msg_control. Tap will build skb through those XDP buffers.

This will avoid lots of indirect calls thus improves the icache
utilization and allows to do XDP batched flushing when doing XDP
redirection.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fe8dd45b 11-Sep-2018 Jason Wang <jasowang@redhat.com>

tun: switch to new type of msg_control

This patch introduces to a new tun/tap specific msg_control:

#define TUN_MSG_UBUF 1
#define TUN_MSG_PTR 2
struct tun_msg_ctl {
int type;
void *ptr;
};

This allows us to pass different kinds of msg_control through
sendmsg(). The first supported type is ubuf (TUN_MSG_UBUF) which will
be used by the existed vhost_net zerocopy code. The second is XDP
buff, which allows vhost_net to pass XDP buff to TUN. This could be
used to implement accepting an array of XDP buffs from vhost_net in
the following patches.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fd3a8862 06-Jun-2018 Willem de Bruijn <willemb@google.com>

net: in virtio_net_hdr only add VLAN_HLEN to csum_start if payload holds vlan

Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr
to communicate packet metadata to userspace.

For skbuffs with vlan, the first two return the packet as it may have
existed on the wire, inserting the VLAN tag in the user buffer. Then
virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes.

Commit f09e2249c4f5 ("macvtap: restore vlan header on user read")
added this feature to macvtap. Commit 3ce9b20f1971 ("macvtap: Fix
csum_start when VLAN tags are present") then fixed up csum_start.

Virtio, packet and uml do not insert the vlan header in the user
buffer.

When introducing virtio_net_hdr_from_skb to deduplicate filling in
the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was
applied uniformly, breaking csum offset for packets with vlan on
virtio and packet.

Make insertion of VLAN_HLEN optional. Convert the callers to pass it
when needed.

Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
Fixes: 1276f24eeef2 ("packet: use common code for virtio_net_hdr and skb GSO conversion")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a9a08845 11-Feb-2018 Linus Torvalds <torvalds@linux-foundation.org>

vfs: do bulk POLL* -> EPOLL* replacement

This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 88fae873 25-Jan-2018 Michael S. Tsirkin <mst@redhat.com>

tap: fix use-after-free

Lockless access to __ptr_ring_full is only legal if ring is
never resized, otherwise it might cause use-after free errors.
Simply drop the lockless test, we'll drop the packet
a bit later when produce fails.

Fixes: 362899b8 ("macvtap: switch to use skb array")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5990a305 03-Jan-2018 Jason Wang <jasowang@redhat.com>

tun/tap: use ptr_ring instead of skb_array

This patch switches to use ptr_ring instead of skb_array. This will be
used to enqueue different types of pointers by encoding type into
lower bits.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 61d78537 01-Dec-2017 Wei Xu <wexu@redhat.com>

tap: free skb if flags error

tap_recvmsg() supports accepting skb by msg_control after
commit 3b4ba04acca8 ("tap: support receiving skb from msg_control"),
the skb if presented should be freed within the function, otherwise
it would be leaked.

Signed-off-by: Wei Xu <wexu@redhat.com>
Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# afc9a42b 03-Jul-2017 Al Viro <viro@zeniv.linux.org.uk>

the rest of drivers/*: annotate ->poll() instances

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 0c19f846 21-Nov-2017 Willem de Bruijn <willemb@google.com>

net: accept UFO datagrams from tuntap and packet

Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.

Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.

It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
("net: avoid skb_warn_bad_offload false positives on UFO").

(*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.

Tested
Booted a v4.13 guest kernel with QEMU. On a host kernel before this
patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
enabled, same as on a v4.13 host kernel.

A UFO packet sent from the guest appears on the tap device:
host:
nc -l -p -u 8000 &
tcpdump -n -i tap0

guest:
dd if=/dev/zero of=payload.txt bs=1 count=2000
nc -u 192.16.1.1 8000 < payload.txt

Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
packets arriving fragmented:

./with_tap_pair.sh ./tap_send_ufo tap0 tap1
(from https://github.com/wdebruij/kerneltools/tree/master/tests)

Changes
v1 -> v2
- simplified set_offload change (review comment)
- documented test procedure

Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 93161922 30-Oct-2017 Craig Gallek <kraig@google.com>

tun/tap: sanitize TUNSETSNDBUF input

Syzkaller found several variants of the lockup below by setting negative
values with the TUNSETSNDBUF ioctl. This patch adds a sanity check
to both the tun and tap versions of this ioctl.

watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [repro:2389]
Modules linked in:
irq event stamp: 329692056
hardirqs last enabled at (329692055): [<ffffffff824b8381>] _raw_spin_unlock_irqrestore+0x31/0x75
hardirqs last disabled at (329692056): [<ffffffff824b9e58>] apic_timer_interrupt+0x98/0xb0
softirqs last enabled at (35659740): [<ffffffff824bc958>] __do_softirq+0x328/0x48c
softirqs last disabled at (35659731): [<ffffffff811c796c>] irq_exit+0xbc/0xd0
CPU: 0 PID: 2389 Comm: repro Not tainted 4.14.0-rc7 #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880009452140 task.stack: ffff880006a20000
RIP: 0010:_raw_spin_lock_irqsave+0x11/0x80
RSP: 0018:ffff880006a27c50 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff10
RAX: ffff880009ac68d0 RBX: ffff880006a27ce0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff880006a27ce0 RDI: ffff880009ac6900
RBP: ffff880006a27c60 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 000000000063ff00 R12: ffff880009ac6900
R13: ffff880006a27cf8 R14: 0000000000000001 R15: ffff880006a27cf8
FS: 00007f4be4838700(0000) GS:ffff88000cc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020101000 CR3: 0000000009616000 CR4: 00000000000006f0
Call Trace:
prepare_to_wait+0x26/0xc0
sock_alloc_send_pskb+0x14e/0x270
? remove_wait_queue+0x60/0x60
tun_get_user+0x2cc/0x19d0
? __tun_get+0x60/0x1b0
tun_chr_write_iter+0x57/0x86
__vfs_write+0x156/0x1e0
vfs_write+0xf7/0x230
SyS_write+0x57/0xd0
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x7f4be4356df9
RSP: 002b:00007ffc18101c08 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4be4356df9
RDX: 0000000000000046 RSI: 0000000020101000 RDI: 0000000000000005
RBP: 00007ffc18101c40 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000293 R12: 0000559c75f64780
R13: 00007ffc18101d30 R14: 0000000000000000 R15: 0000000000000000

Fixes: 33dccbb050bb ("tun: Limit amount of queued packets per device")
Fixes: 20d29d7a916a ("net: macvtap driver")
Signed-off-by: Craig Gallek <kraig@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dea6e19f 27-Oct-2017 Girish Moodalbail <girish.moodalbail@oracle.com>

tap: reference to KVA of an unloaded module causes kernel panic

The commit 9a393b5d5988 ("tap: tap as an independent module") created a
separate tap module that implements tap functionality and exports
interfaces that will be used by macvtap and ipvtap modules to create
create respective tap devices.

However, that patch introduced a regression wherein the modules macvtap
and ipvtap can be removed (through modprobe -r) while there are
applications using the respective /dev/tapX devices. These applications
cause kernel to hold reference to /dev/tapX through 'struct cdev
macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap
modules respectively. So, when the application is later closed the
kernel panics because we are referencing KVA that is present in the
unloaded modules.

----------8<------- Example ----------8<----------
$ sudo ip li add name mv0 link enp7s0 type macvtap
$ sudo ip li show mv0 |grep mv0| awk -e '{print $1 $2}'
14:mv0@enp7s0:
$ cat /dev/tap14 &
$ lsmod |egrep -i 'tap|vlan'
macvtap 16384 0
macvlan 24576 1 macvtap
tap 24576 3 macvtap
$ sudo modprobe -r macvtap
$ fg
cat /dev/tap14
^C

<...system panics...>
BUG: unable to handle kernel paging request at ffffffffa038c500
IP: cdev_put+0xf/0x30
----------8<-----------------8<----------

The fix is to set cdev.owner to the module that creates the tap device
(either macvtap or ipvtap). With this set, the operations (in
fs/char_dev.c) on char device holds and releases the module through
cdev_get() and cdev_put() and will not allow the module to unload
prematurely.

Fixes: 9a393b5d5988ea4e (tap: tap as an independent module)
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 78e0ea67 25-Oct-2017 Girish Moodalbail <girish.moodalbail@oracle.com>

tap: double-free in error path in tap_open()

Double free of skb_array in tap module is causing kernel panic. When
tap_set_queue() fails we free skb_array right away by calling
skb_array_cleanup(). However, later on skb_array_cleanup() is called
again by tap_sock_destruct through sock_put(). This patch fixes that
issue.

Fixes: 362899b8725b35e3 (macvtap: switch to use skb array)
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6aa7de05 23-Oct-2017 Mark Rutland <mark.rutland@arm.com>

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()

Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 12039046 15-Aug-2017 stephen hemminger <stephen@networkplumber.org>

tun/tap: use paren's with sizeof

Although sizeof is an operator in C. The kernel coding style convention
is to always use it like a function and add parenthesis.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d17eb73b 12-Aug-2017 Colin Ian King <colin.king@canonical.com>

tap: make struct tap_fops static

The structure tap_fops is local to the source and does not need to
be in global scope, so make it static.

Cleans up sparse warning:
symbol 'tap_fops' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d591a1f3 03-Jul-2017 David S. Miller <davem@davemloft.net>

tun/tap: Remove references to NETIF_F_UFO.

It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>


# ffa423fb 10-Jul-2017 WANG Cong <xiyou.wangcong@gmail.com>

tap: convert a mutex to a spinlock

We are not allowed to block on the RCU reader side, so can't
just hold the mutex as before. As a quick fix, convert it to
a spinlock.

Fixes: d9f1f61c0801 ("tap: Extending tap device create/destroy APIs")
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3b4ba04a 16-May-2017 Jason Wang <jasowang@redhat.com>

tap: support receiving skb from msg_control

This patch makes tap_recvmsg() can receive from skb from its caller
through msg_control. Vhost_net will be the first user.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 49f96fd0 16-May-2017 Jason Wang <jasowang@redhat.com>

tap: export skb_array

This patch exports skb_array through tap_get_skb_array(). Caller can
then manipulate skb array directly.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c3edc401 02-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Move task_struct::signal and task_struct::sighand types and accessors into <linux/sched/signal.h>

task_struct::signal and task_struct::sighand are pointers, which would normally make it
straightforward to not define those types in sched.h.

That is not so, because the types are accompanied by a myriad of APIs (macros and inline
functions) that dereference them.

Split the types and the APIs out of sched.h and move them into a new header, <linux/sched/signal.h>.

With this change sched.h does not know about 'struct signal' and 'struct sighand' anymore,
trying to put accessors into sched.h as a test fails the following way:

./include/linux/sched.h: In function ‘test_signal_types’:
./include/linux/sched.h:2461:18: error: dereferencing pointer to incomplete type ‘struct signal_struct’
^

This reduces the size and complexity of sched.h significantly.

Update all headers and .c code that relied on getting the signal handling
functionality from <linux/sched.h> to include <linux/sched/signal.h>.

The list of affected files in the preparatory patch was partly generated by
grepping for the APIs, and partly by doing coverage build testing, both
all[yes|mod|def|no]config builds on 64-bit and 32-bit x86, and an array of
cross-architecture builds.

Nevertheless some (trivial) build breakage is still expected related to rare
Kconfig combinations and in-flight patches to various kernel code, but most
of it should be handled by this patch.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 9a393b5d 10-Feb-2017 Sainath Grandhi <sainath.grandhi@intel.com>

tap: tap as an independent module

This patch makes tap a separate module for other types of virtual interfaces, for example,
ipvlan to use.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d9f1f61c 10-Feb-2017 Sainath Grandhi <sainath.grandhi@intel.com>

tap: Extending tap device create/destroy APIs

Extending tap APIs get/free_minor and create/destroy_cdev to handle more than one
type of virtual interface.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6fe3faf8 10-Feb-2017 Sainath Grandhi <sainath.grandhi@intel.com>

tap: Abstract type of virtual interface from tap implementation

macvlan object is re-structured to hold tap related elements in a separate
entity, tap_dev. Upon NETDEV_REGISTER device_event, tap_dev is registered with
idr and fetched again on tap_open. Few of the tap functions are modified to
accepted tap_dev as argument. tap_dev object includes callbacks to be used by
underlying virtual interface to take care of tx and rx accounting.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ebc05ba7 10-Feb-2017 Sainath Grandhi <sainath.grandhi@intel.com>

tap: Tap character device creation/destroy API

This patch provides tap device create/destroy APIs in tap.c.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 635b8c8e 10-Feb-2017 Sainath Grandhi <sainath.grandhi@intel.com>

tap: Renaming tap related APIs, data structures, macros

Renaming tap related APIs, data structures and macros in tap.c from macvtap_.* to tap_.*

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a8e04698 10-Feb-2017 Sainath Grandhi <sainath.grandhi@intel.com>

tap: Refactoring macvtap.c

macvtap module has code for tap/queue management and link management. This patch splits
the code into macvtap_main.c for link management and tap.c for tap/queue management.
Functionality in tap.c can be re-used for implementing tap on other virtual interfaces.

Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>