#
17af4205 |
|
28-Mar-2024 |
Eric Dumazet <edumazet@google.com> |
erspan: make sure erspan_base_hdr is present in skb->head syzbot reported a problem in ip6erspan_rcv() [1] Issue is that ip6erspan_rcv() (and erspan_rcv()) no longer make sure erspan_base_hdr is present in skb linear part (skb->head) before getting @ver field from it. Add the missing pskb_may_pull() calls. v2: Reload iph pointer in erspan_rcv() after pskb_may_pull() because skb->head might have changed. [1] BUG: KMSAN: uninit-value in pskb_may_pull_reason include/linux/skbuff.h:2742 [inline] BUG: KMSAN: uninit-value in pskb_may_pull include/linux/skbuff.h:2756 [inline] BUG: KMSAN: uninit-value in ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline] BUG: KMSAN: uninit-value in gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610 pskb_may_pull_reason include/linux/skbuff.h:2742 [inline] pskb_may_pull include/linux/skbuff.h:2756 [inline] ip6erspan_rcv net/ipv6/ip6_gre.c:541 [inline] gre_rcv+0x11f8/0x1930 net/ipv6/ip6_gre.c:610 ip6_protocol_deliver_rcu+0x1d4c/0x2ca0 net/ipv6/ip6_input.c:438 ip6_input_finish net/ipv6/ip6_input.c:483 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] ip6_input+0x15d/0x430 net/ipv6/ip6_input.c:492 ip6_mc_input+0xa7e/0xc80 net/ipv6/ip6_input.c:586 dst_input include/net/dst.h:460 [inline] ip6_rcv_finish+0x955/0x970 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:314 [inline] ipv6_rcv+0xde/0x390 net/ipv6/ip6_input.c:310 __netif_receive_skb_one_core net/core/dev.c:5538 [inline] __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5652 netif_receive_skb_internal net/core/dev.c:5738 [inline] netif_receive_skb+0x58/0x660 net/core/dev.c:5798 tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1549 tun_get_user+0x5566/0x69e0 drivers/net/tun.c:2002 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2108 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0xb63/0x1520 fs/read_write.c:590 ksys_write+0x20f/0x4c0 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x93/0xe0 fs/read_write.c:652 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Uninit was created at: slab_post_alloc_hook mm/slub.c:3804 [inline] slab_alloc_node mm/slub.c:3845 [inline] kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577 __alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668 alloc_skb include/linux/skbuff.h:1318 [inline] alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504 sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795 tun_alloc_skb drivers/net/tun.c:1525 [inline] tun_get_user+0x209a/0x69e0 drivers/net/tun.c:1846 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2108 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0xb63/0x1520 fs/read_write.c:590 ksys_write+0x20f/0x4c0 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x93/0xe0 fs/read_write.c:652 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 CPU: 1 PID: 5045 Comm: syz-executor114 Not tainted 6.9.0-rc1-syzkaller-00021-g962490525cff #0 Fixes: cb73ee40b1b3 ("net: ip_gre: use erspan key field for tunnel lookup") Reported-by: syzbot+1c1cf138518bf0c53d68@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/000000000000772f2c0614b66ef7@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/20240328112248.1101491-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
9b5b3637 |
|
06-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ip_tunnel: use exit_batch_rtnl() method exit_batch_rtnl() is called while RTNL is held, and devices to be unregistered can be queued in the dev_kill_list. This saves one rtnl_lock()/rtnl_unlock() pair and one unregister_netdevice_many() call. This patch takes care of ipip, ip_vti, and ip_gre tunnels. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Antoine Tenart <atenart@kernel.org> Link: https://lore.kernel.org/r/20240206144313.2050392-15-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
b058a5d2 |
|
08-Feb-2024 |
Breno Leitao <leitao@debian.org> |
net: fill in MODULE_DESCRIPTION()s for ipv4 modules W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the IPv4 modules. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240208164244.3818498-7-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
80d875cf |
|
02-Dec-2023 |
Shigeru Yoshida <syoshida@redhat.com> |
ipv4: ip_gre: Avoid skb_pull() failure in ipgre_xmit() In ipgre_xmit(), skb_pull() may fail even if pskb_inet_may_pull() returns true. For example, applications can use PF_PACKET to create a malformed packet with no IP header. This type of packet causes a problem such as uninit-value access. This patch ensures that skb_pull() can pull the required size by checking the skb with pskb_network_may_pull() before skb_pull(). Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Suman Ghosh <sumang@marvell.com> Link: https://lore.kernel.org/r/20231202161441.221135-1-syoshida@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
aa7cb378 |
|
17-Jul-2023 |
Yuanjun Gong <ruc_gongyuanjun@163.com> |
ipv4: ip_gre: fix return value check in erspan_xmit() goto free_skb if an unexpected result is returned by pskb_tirm() in erspan_xmit(). Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
02d84f3e |
|
17-Jul-2023 |
Yuanjun Gong <ruc_gongyuanjun@163.com> |
ipv4: ip_gre: fix return value check in erspan_fb_xmit() goto err_free_skb if an unexpected result is returned by pskb_tirm() in erspan_fb_xmit(). Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a92fb5c0 |
|
01-Jun-2023 |
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> |
ip_gre: clean up some inconsistent indenting No functional modification involved. net/ipv4/ip_gre.c:192 ipgre_err() warn: inconsistent indenting. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5375 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e50ed77 |
|
20-Mar-2023 |
Eric Dumazet <edumazet@google.com> |
erspan: do not use skb_mac_header() in ndo_start_xmit() Drivers should not assume skb_mac_header(skb) == skb->data in their ndo_start_xmit(). Use skb_network_offset() and skb_transport_offset() which better describe what is needed in erspan_fb_xmit() and ip6erspan_tunnel_xmit() syzbot reported: WARNING: CPU: 0 PID: 5083 at include/linux/skbuff.h:2873 skb_mac_header include/linux/skbuff.h:2873 [inline] WARNING: CPU: 0 PID: 5083 at include/linux/skbuff.h:2873 ip6erspan_tunnel_xmit+0x1d9c/0x2d90 net/ipv6/ip6_gre.c:962 Modules linked in: CPU: 0 PID: 5083 Comm: syz-executor406 Not tainted 6.3.0-rc2-syzkaller-00866-gd4671cb96fa3 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 RIP: 0010:skb_mac_header include/linux/skbuff.h:2873 [inline] RIP: 0010:ip6erspan_tunnel_xmit+0x1d9c/0x2d90 net/ipv6/ip6_gre.c:962 Code: 04 02 41 01 de 84 c0 74 08 3c 03 0f 8e 1c 0a 00 00 45 89 b4 24 c8 00 00 00 c6 85 77 fe ff ff 01 e9 33 e7 ff ff e8 b4 27 a1 f8 <0f> 0b e9 b6 e7 ff ff e8 a8 27 a1 f8 49 8d bf f0 0c 00 00 48 b8 00 RSP: 0018:ffffc90003b2f830 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 000000000000ffff RCX: 0000000000000000 RDX: ffff888021273a80 RSI: ffffffff88e1bd4c RDI: 0000000000000003 RBP: ffffc90003b2f9d8 R08: 0000000000000003 R09: 000000000000ffff R10: 000000000000ffff R11: 0000000000000000 R12: ffff88802b28da00 R13: 00000000000000d0 R14: ffff88807e25b6d0 R15: ffff888023408000 FS: 0000555556a61300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055e5b11eb6e8 CR3: 0000000027c1b000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __netdev_start_xmit include/linux/netdevice.h:4900 [inline] netdev_start_xmit include/linux/netdevice.h:4914 [inline] __dev_direct_xmit+0x504/0x730 net/core/dev.c:4300 dev_direct_xmit include/linux/netdevice.h:3088 [inline] packet_xmit+0x20a/0x390 net/packet/af_packet.c:285 packet_snd net/packet/af_packet.c:3075 [inline] packet_sendmsg+0x31a0/0x5150 net/packet/af_packet.c:3107 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0xde/0x190 net/socket.c:747 __sys_sendto+0x23a/0x340 net/socket.c:2142 __do_sys_sendto net/socket.c:2154 [inline] __se_sys_sendto net/socket.c:2150 [inline] __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2150 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f123aaa1039 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc15d12058 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f123aaa1039 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000020000040 R09: 0000000000000014 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f123aa648c0 R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000 Fixes: 1baf5ebf8954 ("erspan: auto detect truncated packets.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230320163427.8096-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c4794d22 |
|
15-Nov-2022 |
Eric Dumazet <edumazet@google.com> |
ipv4: tunnels: use DEV_STATS_INC() Most of code paths in tunnels are lockless (eg NETIF_F_LLTX in tx). Adopt SMP safe DEV_STATS_INC() to update dev->stats fields. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1d997f10 |
|
28-Oct-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
rtnetlink: pass netlink message header and portid to rtnl_configure_link() This patch pass netlink message header and portid to rtnl_configure_link() All the functions in this call chain need to add the parameters so we can use them in the last call rtnl_notify(), and notify the userspace about the new link info if NLM_F_ECHO flag is set. - rtnl_configure_link() - __dev_notify_flags() - rtmsg_ifinfo() - rtmsg_ifinfo_event() - rtmsg_ifinfo_build_skb() - rtmsg_ifinfo_send() - rtnl_notify() Also move __dev_notify_flags() declaration to net/core/dev.h, as Jakub suggested. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ee496694 |
|
02-Dec-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
ip_gre: do not report erspan version on GRE interface Although the type I ERSPAN is based on the barebones IP + GRE encapsulation and no extra ERSPAN header. Report erspan version on GRE interface looks unreasonable. Fix this by separating the erspan and gre fill info. IPv6 GRE does not have this info as IPv6 only supports erspan version 1 and 2. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: f989d546a2d5 ("erspan: Add type I version 0 support.") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Link: https://lore.kernel.org/r/20221203032858.3130339-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
7ec9fce4 |
|
18-Aug-2022 |
Eyal Birger <eyal.birger@gmail.com> |
ip_tunnel: Respect tunnel key's "flow_flags" in IP tunnels Commit 451ef36bd229 ("ip_tunnels: Add new flow flags field to ip_tunnel_key") added a "flow_flags" member to struct ip_tunnel_key which was later used by the commit in the fixes tag to avoid dropping packets with sources that aren't locally configured when set in bpf_set_tunnel_key(). VXLAN and GENEVE were made to respect this flag, ip tunnels like IPIP and GRE were not. This commit fixes this omission by making ip_tunnel_init_flow() receive the flow flags from the tunnel key in the relevant collect_md paths. Fixes: b8fff748521c ("bpf: Set flow flag to allow any source IP in bpf_tunnel_key") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Paul Chaignon <paul@isovalent.com> Link: https://lore.kernel.org/bpf/20220818074118.726639-1-eyal.birger@gmail.com
|
#
301bd140 |
|
20-Jun-2022 |
Eric Dumazet <edumazet@google.com> |
erspan: do not assume transport header is always set Rewrite tests in ip6erspan_tunnel_xmit() and erspan_fb_xmit() to not assume transport header is set. syzbot reported: WARNING: CPU: 0 PID: 1350 at include/linux/skbuff.h:2911 skb_transport_header include/linux/skbuff.h:2911 [inline] WARNING: CPU: 0 PID: 1350 at include/linux/skbuff.h:2911 ip6erspan_tunnel_xmit+0x15af/0x2eb0 net/ipv6/ip6_gre.c:963 Modules linked in: CPU: 0 PID: 1350 Comm: aoe_tx0 Not tainted 5.19.0-rc2-syzkaller-00160-g274295c6e53f #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 RIP: 0010:skb_transport_header include/linux/skbuff.h:2911 [inline] RIP: 0010:ip6erspan_tunnel_xmit+0x15af/0x2eb0 net/ipv6/ip6_gre.c:963 Code: 0f 47 f0 40 88 b5 7f fe ff ff e8 8c 16 4b f9 89 de bf ff ff ff ff e8 a0 12 4b f9 66 83 fb ff 0f 85 1d f1 ff ff e8 71 16 4b f9 <0f> 0b e9 43 f0 ff ff e8 65 16 4b f9 48 8d 85 30 ff ff ff ba 60 00 RSP: 0018:ffffc90005daf910 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 000000000000ffff RCX: 0000000000000000 RDX: ffff88801f032100 RSI: ffffffff882e8d3f RDI: 0000000000000003 RBP: ffffc90005dafab8 R08: 0000000000000003 R09: 000000000000ffff R10: 000000000000ffff R11: 0000000000000000 R12: ffff888024f21d40 R13: 000000000000a288 R14: 00000000000000b0 R15: ffff888025a2e000 FS: 0000000000000000(0000) GS:ffff88802c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2e425000 CR3: 000000006d099000 CR4: 0000000000152ef0 Call Trace: <TASK> __netdev_start_xmit include/linux/netdevice.h:4805 [inline] netdev_start_xmit include/linux/netdevice.h:4819 [inline] xmit_one net/core/dev.c:3588 [inline] dev_hard_start_xmit+0x188/0x880 net/core/dev.c:3604 sch_direct_xmit+0x19f/0xbe0 net/sched/sch_generic.c:342 __dev_xmit_skb net/core/dev.c:3815 [inline] __dev_queue_xmit+0x14a1/0x3900 net/core/dev.c:4219 dev_queue_xmit include/linux/netdevice.h:2994 [inline] tx+0x6a/0xc0 drivers/block/aoe/aoenet.c:63 kthread+0x1e7/0x3b0 drivers/block/aoe/aoecmd.c:1229 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302 </TASK> Fixes: d5db21a3e697 ("erspan: auto detect truncated ipv6 packets.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d21e996 |
|
06-Jun-2022 |
Willem de Bruijn <willemb@google.com> |
ip_gre: test csum_start instead of transport header GRE with TUNNEL_CSUM will apply local checksum offload on CHECKSUM_PARTIAL packets. ipgre_xmit must validate csum_start after an optional skb_pull, else lco_csum may trigger an overflow. The original check was if (csum && skb_checksum_start(skb) < skb->data) return -EINVAL; This had false positives when skb_checksum_start is undefined: when ip_summed is not CHECKSUM_PARTIAL. A discussed refinement was straightforward if (csum && skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_start(skb) < skb->data) return -EINVAL; But was eventually revised more thoroughly: - restrict the check to the only branch where needed, in an uncommon GRE path that uses header_ops and calls skb_pull. - test skb_transport_header, which is set along with csum_start in skb_partial_csum_set in the normal header_ops datapath. Turns out skbs can arrive in this branch without the transport header set, e.g., through BPF redirection. Revise the check back to check csum_start directly, and only if CHECKSUM_PARTIAL. Do leave the check in the updated location. Check field regardless of whether TUNNEL_CSUM is configured. Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ Link: https://lore.kernel.org/all/20210902193447.94039-2-willemdebruijn.kernel@gmail.com/T/#u Fixes: 8a0ed250f911 ("ip_gre: validate csum_start only on pull") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20220606132107.3582565-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
020e8f60 |
|
28-Apr-2022 |
Peilin Ye <peilin.ye@bytedance.com> |
ip_gre: Make GRE and GRETAP devices always NETIF_F_LLTX Recently we made o_seqno atomic_t. Stop special-casing TUNNEL_SEQ, and always mark GRE[TAP] devices as NETIF_F_LLTX, since we no longer need the TX lock (&txq->_xmit_lock). Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
31c417c9 |
|
21-Apr-2022 |
Peilin Ye <peilin.ye@bytedance.com> |
ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md mode As pointed out by Jakub Kicinski, currently using TUNNEL_SEQ in collect_md mode is racy for [IP6]GRE[TAP] devices. Consider the following sequence of events: 1. An [IP6]GRE[TAP] device is created in collect_md mode using "ip link add ... external". "ip" ignores "[o]seq" if "external" is specified, so TUNNEL_SEQ is off, and the device is marked as NETIF_F_LLTX (i.e. it uses lockless TX); 2. Someone sets TUNNEL_SEQ on outgoing skb's, using e.g. bpf_skb_set_tunnel_key() in an eBPF program attached to this device; 3. gre_fb_xmit() or __gre6_xmit() processes these skb's: gre_build_header(skb, tun_hlen, flags, protocol, tunnel_id_to_key32(tun_info->key.tun_id), (flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) : 0); ^^^^^^^^^^^^^^^^^ Since we are not using the TX lock (&txq->_xmit_lock), multiple CPUs may try to do this tunnel->o_seqno++ in parallel, which is racy. Fix it by making o_seqno atomic_t. As mentioned by Eric Dumazet in commit b790e01aee74 ("ip_gre: lockless xmit"), making o_seqno atomic_t increases "chance for packets being out of order at receiver" when NETIF_F_LLTX is on. Maybe a better fix would be: 1. Do not ignore "oseq" in external mode. Users MUST specify "oseq" if they want the kernel to allow sequencing of outgoing packets; 2. Reject all outgoing TUNNEL_SEQ packets if the device was not created with "oseq". Unfortunately, that would break userspace. We could now make [IP6]GRE[TAP] devices always NETIF_F_LLTX, but let us do it in separate patches to keep this fix minimal. Suggested-by: Jakub Kicinski <kuba@kernel.org> Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.") Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ff827beb |
|
21-Apr-2022 |
Peilin Ye <peilin.ye@bytedance.com> |
ip_gre: Make o_seqno start from 0 in native mode For GRE and GRETAP devices, currently o_seqno starts from 1 in native mode. According to RFC 2890 2.2., "The first datagram is sent with a sequence number of 0." Fix it. It is worth mentioning that o_seqno already starts from 0 in collect_md mode, see gre_fb_xmit(), where tunnel->o_seqno is passed to gre_build_header() before getting incremented. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
db53cd3d |
|
13-Apr-2022 |
David Ahern <dsahern@kernel.org> |
net: Handle l3mdev in ip_tunnel_init_flow Ido reported that the commit referenced in the Fixes tag broke a gre use case with dummy devices. Add a check to ip_tunnel_init_flow to see if the oif is an l3mdev port and if so set the oif to 0 to avoid the oif comparison in fib_lookup_good_nhc. Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
f7716b31 |
|
10-Jan-2022 |
Guillaume Nault <gnault@redhat.com> |
gre: Don't accidentally set RTO_ONLINK in gre_fill_metadata_dst() Mask the ECN bits before initialising ->flowi4_tos. The tunnel key may have the last ECN bit set, which will interfere with the route lookup process as ip_route_output_key_hash() interpretes this bit specially (to restrict the route scope). Found by code inspection, compile tested only. Fixes: 962924fa2b7a ("ip_gre: Refactor collect metatdata mode tunnel xmit to ip_md_tunnel_xmit") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
5a1b7e1a |
|
12-Oct-2021 |
Jakub Kicinski <kuba@kernel.org> |
ip: use dev_addr_set() in tunnels Use dev_addr_set() instead of writing to netdev->dev_addr directly in ip tunnels drivers. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8a0ed250 |
|
05-Sep-2021 |
Willem de Bruijn <willemb@google.com> |
ip_gre: validate csum_start only on pull The GRE tunnel device can pull existing outer headers in ipge_xmit. This is a rare path, apparently unique to this device. The below commit ensured that pulling does not move skb->data beyond csum_start. But it has a false positive if ip_summed is not CHECKSUM_PARTIAL and thus csum_start is irrelevant. Refine to exclude this. At the same time simplify and strengthen the test. Simplify, by moving the check next to the offending pull, making it more self documenting and removing an unnecessary branch from other code paths. Strengthen, by also ensuring that the transport header is correct and therefore the inner headers will be after skb_reset_inner_headers. The transport header is set to csum_start in skb_partial_csum_set. Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ Fixes: 1d011c4803c7 ("ip_gre: add validation for csum_start") Reported-by: Ido Schimmel <idosch@idosch.org> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3e7a1c7c |
|
27-Jul-2021 |
Arnd Bergmann <arnd@arndb.de> |
ip_tunnel: use ndo_siocdevprivate The various ipv4 and ipv6 tunnel drivers each implement a set of 12 SIOCDEVPRIVATE commands for managing tunnels. These all work correctly in compat mode. Move them over to the new .ndo_siocdevprivate operation. Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Ahern <dsahern@kernel.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1d011c48 |
|
20-Aug-2021 |
Shreyansh Chouhan <chouhan.shreyansh630@gmail.com> |
ip_gre: add validation for csum_start Validate csum_start in gre_handle_offloads before we call _gre_xmit so that we do not crash later when the csum_start value is used in the lco_csum function call. This patch deals with ipv4 code. Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Reported-by: syzbot+ff8e1b9f2f36481e2efc@syzkaller.appspotmail.com Signed-off-by: Shreyansh Chouhan <chouhan.shreyansh630@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aab1e898 |
|
25-Jun-2021 |
Guillaume Nault <gnault@redhat.com> |
gre: let mac_header point to outer header only when necessary Commit e271c7b4420d ("gre: do not keep the GRE header around in collect medata mode") did reset the mac_header for the collect_md case. Let's extend this behaviour to classical gre devices as well. ipgre_header_parse() seems to be the only case that requires mac_header to point to the outer header. We can detect this case accurately by checking ->header_ops. For all other cases, we can reset mac_header. This allows to push an Ethernet header to ipgre packets and redirect them to an Ethernet device: $ tc filter add dev gre0 ingress matchall \ action vlan push_eth dst_mac 00:00:5e:00:53:01 \ src_mac 00:00:5e:00:53:00 \ action mirred egress redirect dev eth0 Before this patch, this worked only for collect_md gre devices. Now this works for regular gre devices as well. Only the special case of gre devices that use ipgre_header_ops isn't supported. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
98d7fc46 |
|
07-Nov-2020 |
Heiner Kallweit <hkallweit1@gmail.com> |
ipv4/ipv6: switch to dev_get_tstats64 Replace ip_tunnel_get_stats64() with the new identical core function dev_get_tstats64(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
fdafed45 |
|
12-Oct-2020 |
Cong Wang <xiyou.wangcong@gmail.com> |
ip_gre: set dev->hard_header_len and dev->needed_headroom properly GRE tunnel has its own header_ops, ipgre_header_ops, and sets it conditionally. When it is set, it assumes the outer IP header is already created before ipgre_xmit(). This is not true when we send packets through a raw packet socket, where L2 headers are supposed to be constructed by user. Packet socket calls dev_validate_header() to validate the header. But GRE tunnel does not set dev->hard_header_len, so that check can be simply bypassed, therefore uninit memory could be passed down to ipgre_xmit(). Similar for dev->needed_headroom. dev->hard_header_len is supposed to be the length of the header created by dev->header_ops->create(), so it should be used whenever header_ops is set, and dev->needed_headroom should be used when it is not set. Reported-and-tested-by: syzbot+4a2c52677a8a1aa283cb@syzkaller.appspotmail.com Cc: William Tu <u9012063@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
607259a6 |
|
19-May-2020 |
Christoph Hellwig <hch@lst.de> |
net: add a new ndo_tunnel_ioctl method This method is used to properly allow kernel callers of the IPv4 route management ioctls. The exsting ip_tunnel_ioctl helper is renamed to ip_tunnel_ctl to better reflect that it doesn't directly implement ioctls touching user memory, and is used for the guts of ndo_tunnel_ctl implementations. A new ip_tunnel_ioctl helper is added that can be wired up directly to the ndo_do_ioctl method and takes care of the copy to and from userspace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51fa960d |
|
12-May-2020 |
William Tu <u9012063@gmail.com> |
erspan: Check IFLA_GRE_ERSPAN_VER is set. Add a check to make sure the IFLA_GRE_ERSPAN_VER is provided by users. Fixes: f989d546a2d5 ("erspan: Add type I version 0 support.") Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f989d546 |
|
05-May-2020 |
William Tu <u9012063@gmail.com> |
erspan: Add type I version 0 support. The Type I ERSPAN frame format is based on the barebones IP + GRE(4-byte) encapsulation on top of the raw mirrored frame. Both type I and II use 0x88BE as protocol type. Unlike type II and III, no sequence number or key is required. To creat a type I erspan tunnel device: $ ip link add dev erspan11 type erspan \ local 172.16.1.100 remote 172.16.1.200 \ erspan_ver 0 Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
32ca98fe |
|
16-Mar-2020 |
Petr Machata <petrm@mellanox.com> |
net: ip_gre: Accept IFLA_INFO_DATA-less configuration The fix referenced below causes a crash when an ERSPAN tunnel is created without passing IFLA_INFO_DATA. Fix by validating passed-in data in the same way as ipgre does. Fixes: e1f8f78ffe98 ("net: ip_gre: Separate ERSPAN newlink / changelink callbacks") Reported-by: syzbot+1b4ebf4dae4e510dd219@syzkaller.appspotmail.com Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e1f8f78f |
|
13-Mar-2020 |
Petr Machata <petrm@mellanox.com> |
net: ip_gre: Separate ERSPAN newlink / changelink callbacks ERSPAN shares most of the code path with GRE and gretap code. While that helps keep the code compact, it is also error prone. Currently a broken userspace can turn a gretap tunnel into a de facto ERSPAN one by passing IFLA_GRE_ERSPAN_VER. There has been a similar issue in ip6gretap in the past. To prevent these problems in future, split the newlink and changelink code paths. Split the ERSPAN code out of ipgre_netlink_parms() into a new function erspan_netlink_parms(). Extract a piece of common logic from ipgre_newlink() and ipgre_changelink() into ipgre_newlink_encap_setup(). Add erspan_newlink() and erspan_changelink(). Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c593642c |
|
09-Dec-2019 |
Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> |
treewide: Use sizeof_field() macro Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
|
#
c0d59da7 |
|
19-Nov-2019 |
wenxu <wenxu@ucloud.cn> |
ip_gre: Make none-tun-dst gre tunnel store tunnel info as metadat_dst in recv Currently collect_md gre tunnel will store the tunnel info(metadata_dst) to skb_dst. And now the non-tun-dst gre tunnel already can add tunnel header through lwtunnel. When received a arp_request on the non-tun-dst gre tunnel. The packet of arp response will send through the non-tun-dst tunnel without tunnel info which will lead the arp response packet to be dropped. If the non-tun-dst gre tunnel also store the tunnel info as metadata_dst, The arp response packet will set the releted tunnel info in the iptunnel_metadata_reply. The following is the test script: ip netns add cl ip l add dev vethc type veth peer name eth0 netns cl ifconfig vethc 172.168.0.7/24 up ip l add dev tun1000 type gretap key 1000 ip link add user1000 type vrf table 1 ip l set user1000 up ip l set dev tun1000 master user1000 ifconfig tun1000 10.0.1.1/24 up ip netns exec cl ifconfig eth0 172.168.0.17/24 up ip netns exec cl ip l add dev tun type gretap local 172.168.0.17 remote 172.168.0.7 key 1000 ip netns exec cl ifconfig tun 10.0.1.7/24 up ip r r 10.0.1.7 encap ip id 1000 dst 172.168.0.17 key dev tun1000 table 1 With this patch ip netns exec cl ping 10.0.1.1 can success Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2eb8d6d2 |
|
28-Oct-2019 |
Xin Long <lucien.xin@gmail.com> |
erspan: fix the tun_info options_len check for erspan The check for !md doens't really work for ip_tunnel_info_opts(info) which only does info + 1. Also to avoid out-of-bounds access on info, it should ensure options_len is not less than erspan_metadata in both erspan_xmit() and ip6erspan_tunnel_xmit(). Fixes: 1a66a836da ("gre: add collect_md mode to ERSPAN tunnel") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0e141f75 |
|
27-Sep-2019 |
Haishuang Yan <yanhaishuang@cmss.chinamobile.com> |
erspan: remove the incorrect mtu limit for erspan erspan driver calls ether_setup(), after commit 61e84623ace3 ("net: centralize net_device min/max MTU checking"), the range of mtu is [min_mtu, max_mtu], which is [68, 1500] by default. It causes the dev mtu of the erspan device to not be greater than 1500, this limit value is not correct for ipgre tap device. Tested: Before patch: # ip link set erspan0 mtu 1600 Error: mtu greater than device maximum. After patch: # ip link set erspan0 mtu 1600 # ip -d link show erspan0 21: erspan0@NONE: <BROADCAST,MULTICAST> mtu 1600 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 0 Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2874c5fd |
|
27-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
492b67e2 |
|
06-Apr-2019 |
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> |
net: ip_gre: fix possible use-after-free in erspan_rcv erspan tunnels run __iptunnel_pull_header on received skbs to remove gre and erspan headers. This can determine a possible use-after-free accessing pkt_md pointer in erspan_rcv since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device). Fix it resetting pkt_md pointer after __iptunnel_pull_header Fixes: 1d7e2ed22f8d ("net: erspan: refactor existing erspan code") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
24ba1440 |
|
23-Feb-2019 |
wenxu <wenxu@ucloud.cn> |
route: Add multipath_hash in flowi_common to make user-define hash Current fib_multipath_hash_policy can make hash based on the L3 or L4. But it only work on the outer IP. So a specific tunnel always has the same hash value. But a specific tunnel may contain so many inner connections. This patch provide a generic multipath_hash in floi_common. It can make a user-define hash which can mix with L3 or L4 hash. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2bdf700e |
|
19-Feb-2019 |
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> |
net: ip_gre: do not report erspan_ver for gre or gretap Report erspan version field to userspace in ipgre_fill_info just for erspan tunnels. The issue can be triggered with the following reproducer: $ip link add name gre1 type gre local 192.168.0.1 remote 192.168.1.1 $ip link set dev gre1 up $ip -d link sh gre1 13: gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/gre 192.168.0.1 peer 192.168.1.1 promiscuity 0 minmtu 0 maxmtu 0 gre remote 192.168.1.1 local 192.168.0.1 ttl inherit erspan_ver 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 Fixes: f551c91de262 ("net: erspan: introduce erspan v2 for ip_gre") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
feaf5c79 |
|
28-Jan-2019 |
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> |
net: ip_gre: always reports o_key to userspace Erspan protocol (version 1 and 2) relies on o_key to configure session id header field. However TUNNEL_KEY bit is cleared in erspan_xmit since ERSPAN protocol does not set the key field of the external GRE header and so the configured o_key is not reported to userspace. The issue can be triggered with the following reproducer: $ip link add erspan1 type erspan local 192.168.0.1 remote 192.168.0.2 \ key 1 seq erspan_ver 1 $ip link set erspan1 up $ip -d link sh erspan1 erspan1@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UNKNOWN mode DEFAULT link/ether 52:aa:99:95:9a:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500 erspan remote 192.168.0.2 local 192.168.0.1 ttl inherit ikey 0.0.0.1 iseq oseq erspan_index 0 Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in ipgre_fill_info Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
962924fa |
|
22-Jan-2019 |
wenxu <wenxu@ucloud.cn> |
ip_gre: Refactor collect metatdata mode tunnel xmit to ip_md_tunnel_xmit Refactor collect metatdata mode tunnel xmit to the generic xmit function ip_md_tunnel_xmit. It makes codes more generic and support more feture such as pmtu_update through ip_md_tunnel_xmit Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb73ee40 |
|
17-Jan-2019 |
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> |
net: ip_gre: use erspan key field for tunnel lookup Use ERSPAN key header field as tunnel key in gre_parse_header routine since ERSPAN protocol sets the key field of the external GRE header to 0 resulting in a tunnel lookup fail in ip6gre_err. In addition remove key field parsing and pskb_may_pull check in erspan_rcv and ip6erspan_rcv Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
20704bd1 |
|
14-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
erspan: build the header with the right proto according to erspan_ver As said in draft-foschiano-erspan-03#section4: Different frame variants known as "ERSPAN Types" can be distinguished based on the GRE "Protocol Type" field value: Type I and II's value is 0x88BE while Type III's is 0x22EB [ETYPES]. So set it properly in erspan_xmit() according to erspan_ver. While at it, also remove the unused parameter 'proto' in erspan_fb_xmit(). Fixes: 94d7d8f29287 ("ip6_gre: add erspan v2 support") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb9f1b78 |
|
30-Dec-2018 |
Willem de Bruijn <willemb@google.com> |
ip: validate header length on virtual device xmit KMSAN detected read beyond end of buffer in vti and sit devices when passing truncated packets with PF_PACKET. The issue affects additional ip tunnel devices. Extend commit 76c0ddd8c3a6 ("ip6_tunnel: be careful when accessing the inner header") and commit ccfec9e5cb2d ("ip_tunnel: be careful when accessing the inner header"). Move the check to a separate helper and call at the start of each ndo_start_xmit function in net/ipv4 and net/ipv6. Minor changes: - convert dev_kfree_skb to kfree_skb on error path, as dev_kfree_skb calls consume_skb which is not for error paths. - use pskb_network_may_pull even though that is pedantic here, as the same as pskb_may_pull for devices without llheaders. - do not cache ipv6 hdrs if used only once (unsafe across pskb_may_pull, was more relevant to earlier patch) Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0621e6fc |
|
20-Nov-2018 |
Oz Shlomo <ozsh@mellanox.com> |
net: Add netif_is_gretap()/netif_is_ip6gretap() Changed the is_gretap_dev and is_ip6gretap_dev logic from structure comparison to string comparison of the rtnl_link_ops kind field. This approach aligns with the current identification methods and function names of vxlan and geneve network devices. Convert mlxsw to use these helpers and use them in downstream mlx5 patch. Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
32bbd879 |
|
07-Nov-2018 |
Stefano Brivio <sbrivio@redhat.com> |
net: Convert protocol error handlers from void to int We'll need this to handle ICMP errors for tunnels without a sending socket (i.e. FoU and GUE). There, we might have to look up different types of IP tunnels, registered as network protocols, before we get a match, so we want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both inet_protos and inet6_protos. These error codes will be used in the next patch. For consistency, return sensible error codes in protocol error handlers whenever handlers can't handle errors because, even if valid, they don't match a protocol or any of its states. This has no effect on existing error handling paths. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d0522f1c |
|
06-Nov-2018 |
David Ahern <dsahern@gmail.com> |
net: Add extack argument to rtnl_create_link Add extack arg to rtnl_create_link and add messages for invalid number of Tx or Rx queues. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1042caa7 |
|
25-Sep-2018 |
Maciej Żenczykowski <maze@google.com> |
net-ipv4: remove 2 always zero parameters from ipv4_redirect() (the parameters in question are mark and flow_flags) Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d888f396 |
|
25-Sep-2018 |
Maciej Żenczykowski <maze@google.com> |
net-ipv4: remove 2 always zero parameters from ipv4_update_pmtu() (the parameters in question are mark and flow_flags) Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b0350d51 |
|
13-Sep-2018 |
Haishuang Yan <yanhaishuang@cmss.chinamobile.com> |
ip_gre: fix parsing gre header in ipgre_err gre_parse_header stops parsing when csum_err is encountered, which means tpi->key is undefined and ip_tunnel_lookup will return NULL improperly. This patch introduce a NULL pointer as csum_err parameter. Even when csum_err is encountered, it won't return error and continue parsing gre header as expected. Fixes: 9f57c67c379d ("gre: Remove support for sharing GRE protocol hook.") Reported-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51dc63e3 |
|
10-Sep-2018 |
Haishuang Yan <yanhaishuang@cmss.chinamobile.com> |
erspan: fix error handling for erspan tunnel When processing icmp unreachable message for erspan tunnel, tunnel id should be erspan_net_id instead of ipgre_net_id. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a64506b |
|
10-Sep-2018 |
Haishuang Yan <yanhaishuang@cmss.chinamobile.com> |
erspan: return PACKET_REJECT when the appropriate tunnel is not found If erspan tunnel hasn't been established, we'd better send icmp port unreachable message after receive erspan packets. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
84581bda |
|
27-Aug-2018 |
Xin Long <lucien.xin@gmail.com> |
erspan: set erspan_ver to 1 by default when adding an erspan dev After erspan_ver is introudced, if erspan_ver is not set in iproute, its value will be left 0 by default. Since Commit 02f99df1875c ("erspan: fix invalid erspan version."), it has broken the traffic due to the version check in erspan_xmit if users are not aware of 'erspan_ver' param, like using an old version of iproute. To fix this compatibility problem, it sets erspan_ver to 1 by default when adding an erspan dev in erspan_setup. Note that we can't do it in ipgre_netlink_parms, as this function is also used by ipgre_changelink. Fixes: 02f99df1875c ("erspan: fix invalid erspan version.") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1296ee8f |
|
31-Jul-2018 |
YueHaibing <yuehaibing@huawei.com> |
ip_gre: remove redundant variables t_hlen After commit ffc2b6ee4174 ("ip_gre: fix IFLA_MTU ignored on NEWLINK") variable t_hlen is assigned values that are never read, hence they are redundant and can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
256c87c1 |
|
26-Jun-2018 |
Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> |
net: check tunnel option type in tunnel flags Check the tunnel option type stored in tunnel flags when creating options for tunnels. Thereby ensuring we do not set geneve, vxlan or erspan tunnel options on interfaces that are not associated with them. Make sure all users of the infrastructure set correct flags, for the BPF helper we have to set all bits to keep backward compatibility. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
02f99df1 |
|
16-May-2018 |
William Tu <u9012063@gmail.com> |
erspan: fix invalid erspan version. ERSPAN only support version 1 and 2. When packets send to an erspan device which does not have proper version number set, drop the packet. In real case, we observe multicast packets sent to the erspan pernet device, erspan0, which does not have erspan version configured. Reported-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d5db21a3 |
|
11-May-2018 |
William Tu <u9012063@gmail.com> |
erspan: auto detect truncated ipv6 packets. Currently the truncated bit is set only when 1) the mirrored packet is larger than mtu and 2) the ipv4 packet tot_len is larger than the actual skb->len. This patch adds another case for detecting whether ipv6 packet is truncated or not, by checking the ipv6 header payload_len and the skb->len. Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1baf5ebf |
|
27-Apr-2018 |
William Tu <u9012063@gmail.com> |
erspan: auto detect truncated packets. Currently the truncated bit is set only when the mirrored packet is larger than mtu. For certain cases, the packet might already been truncated before sending to the erspan tunnel. In this case, the patch detect whether the IP header's total length is larger than the actual skb->len. If true, this indicated that the mirrored packet is truncated and set the erspan truncate bit. I tested the patch using bpf_skb_change_tail helper function to shrink the packet size and send to erspan tunnel. Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1cc5954f |
|
09-Apr-2018 |
Sabrina Dubroca <sd@queasysnail.net> |
ip_gre: clear feature flags when incompatible o_flags are set Commit dd9d598c6657 ("ip_gre: add the support for i/o_flags update via netlink") added the ability to change o_flags, but missed that the GSO/LLTX features are disabled by default, and only enabled some gre features are unused. Thus we also need to disable the GSO/LLTX features on the device when the TUNNEL_SEQ or TUNNEL_CSUM flags are set. These two examples should result in the same features being set: ip link add gre_none type gre local 192.168.0.10 remote 192.168.0.20 ttl 255 key 0 ip link set gre_none type gre seq ip link add gre_seq type gre local 192.168.0.10 remote 192.168.0.20 ttl 255 key 1 seq Fixes: dd9d598c6657 ("ip_gre: add the support for i/o_flags update via netlink") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f635cee |
|
27-Mar-2018 |
Kirill Tkhai <ktkhai@virtuozzo.com> |
net: Drop pernet_operations::async Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
15746394 |
|
21-Mar-2018 |
Colin Ian King <colin.king@canonical.com> |
gre: fix TUNNEL_SEQ bit check on sequence numbering The current logic of flags | TUNNEL_SEQ is always non-zero and hence sequence numbers are always incremented no matter the setting of the TUNNEL_SEQ bit. Fix this by using & instead of |. Detected by CoverityScan, CID#1466039 ("Operands don't affect result") Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
77a5196a |
|
01-Mar-2018 |
William Tu <u9012063@gmail.com> |
gre: add sequence number for collect md mode. Currently GRE sequence number can only be used in native tunnel mode. This patch adds sequence number support for gre collect metadata mode. RFC2890 defines GRE sequence number to be specific to the traffic flow identified by the key. However, this patch does not implement per-key seqno. The sequence number is shared in the same tunnel device. That is, different tunnel keys using the same collect_md tunnel share single sequence number. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d1b2a6c4 |
|
27-Feb-2018 |
Petr Machata <petrm@mellanox.com> |
net: GRE: Add is_gretap_dev, is_ip6gretap_dev Determining whether a device is a GRE device is easily done by inspecting struct net_device.type. However, for the tap variants, the type is just ARPHRD_ETHER. Therefore introduce two predicate functions that use netdev_ops to tell the tap devices. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ffc2b6ee |
|
27-Feb-2018 |
Xin Long <lucien.xin@gmail.com> |
ip_gre: fix IFLA_MTU ignored on NEWLINK It's safe to remove the setting of dev's needed_headroom and mtu in __gre_tunnel_init, as discussed in [1], ip_tunnel_newlink can do it properly. Now Eric noticed that it could cover the mtu value set in do_setlink when creating a ip_gre dev. It makes IFLA_MTU param not take effect. So this patch is to remove them to make IFLA_MTU work, as in other ipv4 tunnels. [1]: https://patchwork.ozlabs.org/patch/823504/ Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Reported-by: Eric Garver <e@erig.me> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31502104 |
|
26-Feb-2018 |
Kirill Tkhai <ktkhai@virtuozzo.com> |
net: Convert ipgre_net_ops, ipgre_tap_net_ops, erspan_net_ops, vti_net_ops and ipip_net_ops These pernet_operations are similar to bond_net_ops. Exit methods unregisters all net ipgre/ipgre_tap/erspan/vti/ipip devices, and it looks like another pernet_operations are not interested in foreign net ipgre/ipgre_tap/erspan/vti/ipip list. Init method also does not intersect with something pernet-specific. So, it's possible to mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
39f57f67 |
|
05-Feb-2018 |
William Tu <u9012063@gmail.com> |
net: erspan: fix erspan config overwrite When an erspan tunnel device receives an erpsan packet with different tunnel metadata (ex: version, index, hwid, direction), existing code overwrites the tunnel device's erspan configuration with the received packet's metadata. The patch fixes it. Fixes: 1a66a836da63 ("gre: add collect_md mode to ERSPAN tunnel") Fixes: f551c91de262 ("net: erspan: introduce erspan v2 for ip_gre") Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode") Fixes: 94d7d8f29287 ("ip6_gre: add erspan v2 support") Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3df19283 |
|
05-Feb-2018 |
William Tu <u9012063@gmail.com> |
net: erspan: fix metadata extraction Commit d350a823020e ("net: erspan: create erspan metadata uapi header") moves the erspan 'version' in front of the 'struct erspan_md2' for later extensibility reason. This breaks the existing erspan metadata extraction code because the erspan_md2 then has a 4-byte offset to between the erspan_metadata and erspan_base_hdr. This patch fixes it. Fixes: 1a66a836da63 ("gre: add collect_md mode to ERSPAN tunnel") Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode") Fixes: 1d7e2ed22f8d ("net: erspan: refactor existing erspan code") Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c69de58b |
|
25-Jan-2018 |
William Tu <u9012063@gmail.com> |
net: erspan: use bitfield instead of mask and offset Originally the erspan fields are defined as a group into a __be16 field, and use mask and offset to access each field. This is more costly due to calling ntohs/htons. The patch changes it to use bitfields. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
214bb1c7 |
|
21-Dec-2017 |
William Tu <u9012063@gmail.com> |
net: erspan: remove md NULL check The 'md' is allocated from 'tun_dst = ip_tun_rx_dst' and since we've checked 'tun_dst', 'md' will never be NULL. The patch removes it at both ipv4 and ipv6 erspan. Fixes: afb4c97d90e6 ("ip6_gre: fix potential memory leak in ip6erspan_rcv") Fixes: 50670b6ee9bc ("ip_gre: fix potential memory leak in erspan_rcv") Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
50670b6e |
|
19-Dec-2017 |
Haishuang Yan <yanhaishuang@cmss.chinamobile.com> |
ip_gre: fix potential memory leak in erspan_rcv If md is NULL, tun_dst must be freed, otherwise it will cause memory leak. Fixes: 1a66a836da6 ("gre: add collect_md mode to ERSPAN tunnel") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dd8d5b8c |
|
19-Dec-2017 |
Haishuang Yan <yanhaishuang@cmss.chinamobile.com> |
ip_gre: fix error path when erspan_rcv failed When erspan_rcv call return PACKET_REJECT, we shoudn't call ipgre_rcv to process packets again, instead send icmp unreachable message in error path. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Acked-by: William Tu <u9012063@gmail.com> Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cfddd4c3 |
|
17-Dec-2017 |
Xin Long <lucien.xin@gmail.com> |
ip_gre: remove the incorrect mtu limit for ipgre tap ipgre tap driver calls ether_setup(), after commit 61e84623ace3 ("net: centralize net_device min/max MTU checking"), the range of mtu is [min_mtu, max_mtu], which is [68, 1500] by default. It causes the dev mtu of the ipgre tap device to not be greater than 1500, this limit value is not correct for ipgre tap device. Besides, it's .change_mtu already does the right check. So this patch is just to set max_mtu as 0, and leave the check to it's .change_mtu. Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d91e8db5 |
|
15-Dec-2017 |
William Tu <u9012063@gmail.com> |
net: erspan: reload pointer after pskb_may_pull pskb_may_pull() can change skb->data, so we need to re-load pkt_md and ershdr at the right place. Fixes: 94d7d8f29287 ("ip6_gre: add erspan v2 support") Fixes: f551c91de262 ("net: erspan: introduce erspan v2 for ip_gre") Signed-off-by: William Tu <u9012063@gmail.com> Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae3e1337 |
|
15-Dec-2017 |
William Tu <u9012063@gmail.com> |
net: erspan: fix wrong return value If pskb_may_pull return failed, return PACKET_REJECT instead of -ENOMEM. Fixes: 94d7d8f29287 ("ip6_gre: add erspan v2 support") Fixes: f551c91de262 ("net: erspan: introduce erspan v2 for ip_gre") Signed-off-by: William Tu <u9012063@gmail.com> Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c05fad57 |
|
14-Dec-2017 |
Haishuang Yan <yanhaishuang@cmss.chinamobile.com> |
ip_gre: fix wrong return value of erspan_rcv If pskb_may_pull return failed, return PACKET_REJECT instead of -ENOMEM. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f551c91d |
|
13-Dec-2017 |
William Tu <u9012063@gmail.com> |
net: erspan: introduce erspan v2 for ip_gre The patch adds support for erspan version 2. Not all features are supported in this patch. The SGT (security group tag), GRA (timestamp granularity), FT (frame type) are set to fixed value. Only hardware ID and direction are configurable. Optional subheader is also not supported. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1d7e2ed2 |
|
13-Dec-2017 |
William Tu <u9012063@gmail.com> |
net: erspan: refactor existing erspan code The patch refactors the existing erspan implementation in order to support erspan version 2, which has additional metadata. So, in stead of having one 'struct erspanhdr' holding erspan version 1, breaks it into 'struct erspan_base_hdr' and 'struct erspan_metadata'. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a3222dc9 |
|
30-Nov-2017 |
William Tu <u9012063@gmail.com> |
ip_gre: Refector the erpsan tunnel code. Move two erspan functions to header file, erspan.h, so ipv6 erspan implementation can use it. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a0efab67 |
|
07-Nov-2017 |
Xin Long <lucien.xin@gmail.com> |
ip_gre: add the support for i/o_flags update via ioctl As patch 'ip_gre: add the support for i/o_flags update via netlink' did for netlink, we also need to do the same job for these update via ioctl. This patch is to update i/o_flags and call ipgre_link_update to recalculate these gre properties after ip_tunnel_ioctl does the common update. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dd9d598c |
|
07-Nov-2017 |
Xin Long <lucien.xin@gmail.com> |
ip_gre: add the support for i/o_flags update via netlink Now ip_gre is using ip_tunnel_changelink to update it's properties, but ip_tunnel_changelink in ip_tunnel doesn't update i/o_flags as a common function. o_flags updates would cause that tunnel->tun_hlen / hlen and dev->mtu / needed_headroom need to be recalculated, and dev->(hw_)features need to be updated as well. Therefore, we can't just add the update into ip_tunnel_update called in ip_tunnel_changelink, and it's also better not to touch ip_tunnel codes. This patch updates i/o_flags and calls ipgre_link_update to recalculate these gre properties after ip_tunnel_changelink does the common update. Note that since ipgre_link_update doesn't know the lower dev, it will update gre->hlen, dev->mtu and dev->needed_headroom with the value of 'new tun_hlen - old tun_hlen'. In this way, we can avoid many redundant codes, unlike ip6_gre. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f192970d |
|
05-Oct-2017 |
William Tu <u9012063@gmail.com> |
ip_gre: check packet length and mtu correctly in erspan tx Similarly to early patch for erspan_xmit(), the ARPHDR_ETHER device is the length of the whole ether packet. So skb->len should subtract the dev->hard_header_len. Fixes: 1a66a836da63 ("gre: add collect_md mode to ERSPAN tunnel") Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: William Tu <u9012063@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: David Laight <David.Laight@aculab.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c84bed44 |
|
01-Oct-2017 |
Xin Long <lucien.xin@gmail.com> |
ip_gre: erspan device should keep dst The patch 'ip_gre: ipgre_tap device should keep dst' fixed the issue ipgre_tap dev mtu couldn't be updated in tx path. The same fix is needed for erspan as well. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c122fda2 |
|
01-Oct-2017 |
Xin Long <lucien.xin@gmail.com> |
ip_gre: set tunnel hlen properly in erspan_tunnel_init According to __gre_tunnel_init, tunnel->hlen should be set as the headers' length between inner packet and outer iphdr. It would be used especially to calculate a proper mtu when updating mtu in tnl_update_pmtu. Now without setting it, a bigger mtu value than expected would be updated, which hurts performance a lot. This patch is to fix it by setting tunnel->hlen with: tunnel->tun_hlen + tunnel->encap_hlen + sizeof(struct erspanhdr) Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5513d08d |
|
01-Oct-2017 |
Xin Long <lucien.xin@gmail.com> |
ip_gre: check packet length and mtu correctly in erspan_xmit As a ARPHRD_ETHER device, skb->len in erspan_xmit is the length of the whole ether packet. So before checking if a packet size exceeds the mtu, skb->len should subtract dev->hard_header_len. Otherwise, all packets with max size according to mtu would be trimmed to be truncated packet. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
935a9749 |
|
01-Oct-2017 |
Xin Long <lucien.xin@gmail.com> |
ip_gre: get key from session_id correctly in erspan_rcv erspan only uses the first 10 bits of session_id as the key to look up the tunnel. But in erspan_rcv, it missed 'session_id & ID_MASK' when getting the key from session_id. If any other flag is also set in session_id in a packet, it would fail to find the tunnel due to incorrect key in erspan_rcv. This patch is to add 'session_id & ID_MASK' there and also remove the unnecessary variable session_id. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d51711c0 |
|
27-Sep-2017 |
Xin Long <lucien.xin@gmail.com> |
ip_gre: ipgre_tap device should keep dst Without keeping dst, the tunnel will not update any mtu/pmtu info, since it does not have a dst on the skb. Reproducer: client(ipgre_tap1 - eth1) <-----> (eth1 - ipgre_tap1)server After reducing eth1's mtu on client, then perforamnce became 0. This patch is to netif_keep_dst in gre_tap_init, as ipgre does. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
64bc1781 |
|
19-Sep-2017 |
Eric Dumazet <edumazet@google.com> |
ipv4: speedup ipv6 tunnels dismantle Implement exit_batch() method to dismantle more devices per round. (rtnl_lock() ... unregister_netdevice_many() ... rtnl_unlock()) Tested: $ cat add_del_unshare.sh for i in `seq 1 40` do (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) & done wait ; grep net_namespace /proc/slabinfo Before patch : $ time ./add_del_unshare.sh net_namespace 126 282 5504 1 2 : tunables 8 4 0 : slabdata 126 282 0 real 1m38.965s user 0m0.688s sys 0m37.017s After patch: $ time ./add_del_unshare.sh net_namespace 135 291 5504 1 2 : tunables 8 4 0 : slabdata 135 291 0 real 0m22.117s user 0m0.728s sys 0m35.328s Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a66a836 |
|
25-Aug-2017 |
William Tu <u9012063@gmail.com> |
gre: add collect_md mode to ERSPAN tunnel Similar to gre, vxlan, geneve, ipip tunnels, allow ERSPAN tunnels to operate in 'collect metadata' mode. bpf_skb_[gs]et_tunnel_key() helpers can make use of it right away. OVS can use it as well in the future. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
862a03c3 |
|
25-Aug-2017 |
William Tu <u9012063@gmail.com> |
gre: refactor the gre_fb_xmit The patch refactors the gre_fb_xmit function, by creating prepare_fb_xmit function for later ERSPAN collect_md mode patch. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
60890e04 |
|
22-Aug-2017 |
Colin Ian King <colin.king@canonical.com> |
gre: remove duplicated assignment of iph iph is being assigned the same value twice; remove the redundant first assignment. (Thanks to Nikolay Aleksandrov for pointing out that the first asssignment should be removed and not the second) Fixes warning: net/ipv4/ip_gre.c:265:2: warning: Value stored to 'iph' is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e3d0328c |
|
22-Aug-2017 |
William Tu <u9012063@gmail.com> |
gre: fix goto statement typo Fix typo: pnet_tap_faied. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
84e54fe0 |
|
22-Aug-2017 |
William Tu <u9012063@gmail.com> |
gre: introduce native tunnel support for ERSPAN The patch adds ERSPAN type II tunnel support. The implementation is based on the draft at [1]. One of the purposes is for Linux box to be able to receive ERSPAN monitoring traffic sent from the Cisco switch, by creating a ERSPAN tunnel device. In addition, the patch also adds ERSPAN TX, so Linux virtual switch can redirect monitored traffic to the ERSPAN tunnel device. The traffic will be encapsulated into ERSPAN and sent out. The implementation reuses tunnel key as ERSPAN session ID, and field 'erspan' as ERSPAN Index fields: ./ip link add dev ers11 type erspan seq key 100 erspan 123 \ local 172.16.1.200 remote 172.16.1.100 To use the above device as ERSPAN receiver, configure Nexus 5000 switch as below: monitor session 100 type erspan-source erspan-id 123 vrf default destination ip 172.16.1.200 source interface Ethernet1/11 both source interface Ethernet1/12 both no shut monitor erspan origin ip-address 172.16.1.100 global [1] https://tools.ietf.org/html/draft-foschiano-erspan-01 [2] iproute2 patch: http://marc.info/?l=linux-netdev&m=150306086924951&w=2 [3] test script: http://marc.info/?l=linux-netdev&m=150231021807304&w=2 Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Meenakshi Vohra <mvohra@vmware.com> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a8b8a889 |
|
25-Jun-2017 |
Matthias Schiffer <mschiffer@universe-factory.net> |
net: add netlink_ext_ack argument to rtnl_link_ops.validate Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ad744b22 |
|
25-Jun-2017 |
Matthias Schiffer <mschiffer@universe-factory.net> |
net: add netlink_ext_ack argument to rtnl_link_ops.changelink Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7a3f4a18 |
|
25-Jun-2017 |
Matthias Schiffer <mschiffer@universe-factory.net> |
net: add netlink_ext_ack argument to rtnl_link_ops.newlink Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d58ff351 |
|
16-Jun-2017 |
Johannes Berg <johannes.berg@intel.com> |
networking: make skb_push & __skb_push return void pointers It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) @@ expression SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - fn(SKB, LEN)[0] + *(u8 *)fn(SKB, LEN) Note that the last part there converts from push(...)[0] to the more idiomatic *(u8 *)push(...). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9830ad4c |
|
18-Apr-2017 |
Craig Gallek <kraig@google.com> |
ip_tunnel: Allow policy-based routing through tunnels This feature allows the administrator to set an fwmark for packets traversing a tunnel. This allows the use of independent routing tables for tunneled packets without the use of iptables. There is no concept of per-packet routing decisions through IPv4 tunnels, so this implementation does not need to work with per-packet route lookups as the v6 implementation may (with IP6_TNL_F_USE_ORIG_FWMARK). Further, since the v4 tunnel ioctls share datastructures (which can not be trivially modified) with the kernel's internal tunnel configuration structures, the mark attribute must be stored in the tunnel structure itself and passed as a parameter when creating or changing tunnel attributes. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7c0f6ba6 |
|
24-Dec-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c7d03a00 |
|
16-Nov-2016 |
Alexey Dobriyan <adobriyan@gmail.com> |
netns: make struct pernet_operations::id unsigned int Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d817f432 |
|
08-Sep-2016 |
Amir Vadai <amir@vadai.me> |
net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id() Add utility functions to convert a 32 bits key into a 64 bits tunnel and vice versa. These functions will be used instead of cloning code in GRE and VXLAN, and in tc act_iptunnel which will be introduced in a following patch in this patchset. Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d7b3320 |
|
15-Aug-2016 |
Simon Horman <simon.horman@netronome.com> |
gre: set inner_protocol on xmit Ensure that the inner_protocol is set on transmit so that GSO segmentation, which relies on that field, works correctly. This is achieved by setting the inner_protocol in gre_build_header rather than each caller of that function. It ensures that the inner_protocol is set when gre_fb_xmit() is used to transmit GRE which was not previously the case. I have observed this is not the case when OvS transmits GRE using lwtunnel metadata (which it always does). Fixes: 38720352412a ("gre: Use inner_proto to obtain inner header protocol") Cc: Pravin Shelar <pshelar@ovn.org> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
20e1954f |
|
18-Jun-2016 |
Eric Dumazet <edumazet@google.com> |
ipv6: RFC 4884 partial support for SIT/GRE tunnels When receiving an ICMPv4 message containing extensions as defined in RFC 4884, and translating it to ICMPv6 at SIT or GRE tunnel, we need some extra manipulation in order to properly forward the extensions. This patch only takes care of Time Exceeded messages as they are the ones that typically carry information from various routers in a fabric during a traceroute session. It also avoids complex skb logic if the data_len is not a multiple of 8. RFC states : The "original datagram" field MUST contain at least 128 octets. If the original datagram did not contain 128 octets, the "original datagram" field MUST be zero padded to 128 octets. In practice routers use 128 bytes of original datagram, not more. Initial translation was added in commit ca15a078bd90 ("sit: generate icmpv6 error when receiving icmpv4 error") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Oussama Ghorbel <ghorbel@pivasoftware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b8c6d7b |
|
18-Jun-2016 |
Eric Dumazet <edumazet@google.com> |
gre: better support for ICMP messages for gre+ipv6 ipgre_err() can call ip6_err_gen_icmpv6_unreach() for proper support of ipv4+gre+icmp+ipv6+... frames, used for example by traceroute/mtr. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e582615ad |
|
15-Jun-2016 |
Eric Dumazet <edumazet@google.com> |
gre: fix error handler 1) gre_parse_header() can be called from gre_err() At this point transport header points to ICMP header, not the inner header. 2) We can not really change transport header as ipgre_err() will later assume transport header still points to ICMP header (using icmp_hdr()) 3) pskb_may_pull() logic in gre_parse_header() really works if we are interested at zone pointed by skb->data 4) As Jiri explained in commit b7f8fe251e46 ("gre: do not pull header in ICMP error processing") we should not pull headers in error handler. So this fix : A) changes gre_parse_header() to use skb->data instead of skb_transport_header() B) Adds a nhs parameter to gre_parse_header() so that we can skip the not pulled IP header from error path. This offset is 0 for normal receive path. C) remove obsolete IPV6 includes Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
22a59be8 |
|
14-Jun-2016 |
Philip Prindeville <philipp@redfish-solutions.com> |
net: ipv4: Add ability to have GRE ignore DF bit in IPv4 payloads In the presence of firewalls which improperly block ICMP Unreachable (including Fragmentation Required) messages, Path MTU Discovery is prevented from working. A workaround is to handle IPv4 payloads opaquely, ignoring the DF bit--as is done for other payloads like AppleTalk--and doing transparent fragmentation and reassembly. Redux includes the enforcement of mutual exclusion between this feature and Path MTU Discovery as suggested by Alexander Duyck. Cc: Alexander Duyck <alexander.duyck@gmail.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
da6f1da8 |
|
13-Jun-2016 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ovs/gre: fix rtnl notifications on iface deletion The function gretap_fb_dev_create() (only used by ovs) never calls rtnl_configure_link(). The consequence is that dev->rtnl_link_state is never set to RTNL_LINK_INITIALIZED. During the deletion phase, the function rollback_registered_many() sends a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED. Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport") CC: Thomas Graf <tgraf@suug.ch> CC: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
106da663 |
|
13-Jun-2016 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ovs/gre,geneve: fix error path when creating an iface After ipgre_newlink()/geneve_configure() call, the netdev is registered. Fixes: 7e059158d57b ("vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices") CC: David Wragg <david@weave.works> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
da73b4e9 |
|
11-May-2016 |
Haishuang Yan <yanhaishuang@cmss.chinamobile.com> |
gre: Fix wrong tpi->proto in WCCP When dealing with WCCP in gre6 tunnel, it sets the wrong tpi->protocol, that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e271c7b4 |
|
11-May-2016 |
Jiri Benc <jbenc@redhat.com> |
gre: do not keep the GRE header around in collect medata mode For ipgre interface in collect metadata mode, it doesn't make sense for the interface to be of ARPHRD_IPGRE type. The outer header of received packets is not needed, as all the information from it is present in metadata_dst. We already don't set ipgre_header_ops for collect metadata interfaces, which is the only consumer of mac_header pointing to the outer IP header. Just set the interface type to ARPHRD_NONE in collect metadata mode for ipgre (not gretap, that still correctly stays ARPHRD_ETHER) and reset mac_header. Fixes: a64b04d86d14 ("gre: do not assign header_ops in collect metadata mode") Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
125372fa |
|
03-May-2016 |
Jiri Benc <jbenc@redhat.com> |
gre: receive also TEB packets for lwtunnels For ipgre interfaces in collect metadata mode, receive also traffic with encapsulated Ethernet headers. The lwtunnel users are supposed to sort this out correctly. This allows to have mixed Ethernet + L3-only traffic on the same lwtunnel interface. This is the same way as VXLAN-GPE behaves. To keep backwards compatibility and prevent any surprises, gretap interfaces have priority in receiving packets with Ethernet headers. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
244a797b |
|
03-May-2016 |
Jiri Benc <jbenc@redhat.com> |
gre: move iptunnel_pull_header down to ipgre_rcv This will allow to make the pull dependent on the tunnel type. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f132ae7c |
|
03-May-2016 |
Jiri Benc <jbenc@redhat.com> |
gre: change gre_parse_header to return the header length It's easier for gre_parse_header to return the header length instead of filing it into a parameter. That way, the callers that don't care about the header length can just check whether the returned value is lower than zero. In gre_err, the tunnel header must not be pulled. See commit b7f8fe251e46 ("gre: do not pull header in ICMP error processing") for details. This patch reduces the conflict between the mentioned commit and commit 95f5c64c3c13 ("gre: Move utility functions to common headers"). Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
182a352d |
|
29-Apr-2016 |
Tom Herbert <tom@herbertland.com> |
gre: Create common functions for transmit Create common functions for both IPv4 and IPv6 GRE in transmit. These are put into gre.h. Common functions are for: - GRE checksum calculation. Move gre_checksum to gre.h. - Building a GRE header. Move GRE build_header and rename gre_build_header. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
95f5c64c |
|
29-Apr-2016 |
Tom Herbert <tom@herbertland.com> |
gre: Move utility functions to common headers Several of the GRE functions defined in net/ipv4/ip_gre.c are usable for IPv6 GRE implementation (that is they are protocol agnostic). These include: - GRE flag handling functions are move to gre.h - GRE build_header is moved to gre.h and renamed gre_build_header - parse_gre_header is moved to gre_demux.c and renamed gre_parse_header - iptunnel_pull_header is taken out of gre_parse_header. This is now done by caller. The header length is returned from gre_parse_header in an int* argument. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b7f8fe25 |
|
29-Apr-2016 |
Jiri Benc <jbenc@redhat.com> |
gre: do not pull header in ICMP error processing iptunnel_pull_header expects that IP header was already pulled; with this expectation, it pulls the tunnel header. This is not true in gre_err. Furthermore, ipv4_update_pmtu and ipv4_redirect expect that skb->data points to the IP header. We cannot pull the tunnel header in this path. It's just a matter of not calling iptunnel_pull_header - we don't need any of its effects. Fixes: bda7bb463436 ("gre: Allow multiple protocol listener for gre protocol.") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
946b636f |
|
27-Apr-2016 |
Jiri Benc <jbenc@redhat.com> |
gre: reject GUE and FOU in collect metadata mode The collect metadata mode does not support GUE nor FOU. This might be implemented later; until then, we should reject such config. I think this is okay to be changed. It's unlikely anyone has such configuration (as it doesn't work anyway) and we may need a way to distinguish whether it's supported or not by the kernel later. For backwards compatibility with iproute2, it's not possible to just check the attribute presence (iproute2 always includes the attribute), the actual value has to be checked, too. Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2090714e |
|
27-Apr-2016 |
Jiri Benc <jbenc@redhat.com> |
gre: build header correctly for collect metadata tunnels In ipgre (i.e. not gretap) + collect metadata mode, the skb was assumed to contain Ethernet header and was encapsulated as ETH_P_TEB. This is not the case, the interface is ARPHRD_IPGRE and the protocol to be used for encapsulation is skb->protocol. Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a64b04d8 |
|
27-Apr-2016 |
Jiri Benc <jbenc@redhat.com> |
gre: do not assign header_ops in collect metadata mode In ipgre mode (i.e. not gretap) with collect metadata flag set, the tunnel is incorrectly assumed to be mGRE in NBMA mode (see commit 6a5f44d7a048c). This is not the case, we're controlling the encapsulation addresses by lwtunnel metadata. And anyway, assigning dev->header_ops in collect metadata mode does not make sense. Although it would be more user firendly to reject requests that specify both the collect metadata flag and a remote/local IP address, this would break current users of gretap or introduce ugly code and differences in handling ipgre and gretap configuration. Keep the current behavior of remote/local IP address being ignored in such case. v3: Back to v1, added explanation paragraph. v2: Reject configuration specifying both remote/local address and collect metadata flag. Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aed069df |
|
14-Apr-2016 |
Alexander Duyck <aduyck@mirantis.com> |
ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skb This patch updates the IP tunnel core function iptunnel_handle_offloads so that we return an int and do not free the skb inside the function. This actually allows us to clean up several paths in several tunnels so that we can free the skb at one point in the path without having to have a secondary path if we are supporting tunnel offloads. In addition it should resolve some double-free issues I have found in the tunnels paths as I believe it is possible for us to end up triggering such an event in the case of fou or gue. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a0ca153f |
|
05-Apr-2016 |
Alexander Duyck <aduyck@mirantis.com> |
GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU This patch fixes an issue I found in which we were dropping frames if we had enabled checksums on GRE headers that were encapsulated by either FOU or GUE. Without this patch I was barely able to get 1 Gb/s of throughput. With this patch applied I am now at least getting around 6 Gb/s. The issue is due to the fact that with FOU or GUE applied we do not provide a transport offset pointing to the GRE header, nor do we offload it in software as the GRE header is completely skipped by GSO and treated like a VXLAN or GENEVE type header. As such we need to prevent the stack from generating it and also prevent GRE from generating it via any interface we create. Fixes: c3483384ee511 ("gro: Allow tunnel stacking in the case of FOU/GUE") Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
db3c6139 |
|
04-Mar-2016 |
Daniel Borkmann <daniel@iogearbox.net> |
bpf, vxlan, geneve, gre: fix usage of dst_cache on xmit The assumptions from commit 0c1d70af924b ("net: use dst_cache for vxlan device"), 468dfffcd762 ("geneve: add dst caching support") and 3c1cb4d2604c ("net/ipv4: add dst cache support for gre lwtunnels") on dst_cache usage when ip_tunnel_info is used is unfortunately not always valid as assumed. While it seems correct for ip_tunnel_info front-ends such as OVS, eBPF however can fill in ip_tunnel_info for consumers like vxlan, geneve or gre with different remote dsts, tos, etc, therefore they cannot be assumed as packet independent. Right now vxlan, geneve, gre would cache the dst for eBPF and every packet would reuse the same entry that was first created on the initial route lookup. eBPF doesn't store/cache the ip_tunnel_info, so each skb may have a different one. Fix it by adding a flag that checks the ip_tunnel_info. Also the !tos test in vxlan needs to be handeled differently in this context as it is currently inferred from ip_tunnel_info as well if present. ip_tunnel_dst_cache_usable() helper is added for the three tunnel cases, which checks if we can use dst cache. Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device") Fixes: 468dfffcd762 ("geneve: add dst caching support") Fixes: 3c1cb4d2604c ("net/ipv4: add dst cache support for gre lwtunnels") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d13b161c |
|
17-Feb-2016 |
Jiri Benc <jbenc@redhat.com> |
gre: clear IFF_TX_SKB_SHARING ether_setup sets IFF_TX_SKB_SHARING but this is not supported by gre as it modifies the skb on xmit. Also, clean up whitespace in ipgre_tap_setup when we're already touching it. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7f290c94 |
|
18-Feb-2016 |
Jiri Benc <jbenc@redhat.com> |
iptunnel: scrub packet in iptunnel_pull_header Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call skb_scrub_packet directly instead. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3c1cb4d2 |
|
12-Feb-2016 |
Paolo Abeni <pabeni@redhat.com> |
net/ipv4: add dst cache support for gre lwtunnels In case of UDP traffic with datagram length below MTU this gives about 4% performance increase Signed-off-by: Paolo Abeni <pabeni@redhat.com> Suggested-and-Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6fa79666 |
|
11-Feb-2016 |
Edward Cree <ecree@solarflare.com> |
net: ip_tunnel: remove 'csum_help' argument to iptunnel_handle_offloads All users now pass false, so we can remove it, and remove the code that was conditional upon it. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
53936107 |
|
11-Feb-2016 |
Edward Cree <ecree@solarflare.com> |
net: gre: Implement LCO for GRE over IPv4 Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7e059158 |
|
09-Feb-2016 |
David Wragg <david@weave.works> |
vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could transmit vxlan packets of any size, constrained only by the ability to send out the resulting packets. 4.3 introduced netdevs corresponding to tunnel vports. These netdevs have an MTU, which limits the size of a packet that can be successfully encapsulated. The default MTU values are low (1500 or less), which is awkwardly small in the context of physical networks supporting jumbo frames, and leads to a conspicuous change in behaviour for userspace. Instead, set the MTU on openvswitch-created netdevs to be the relevant maximum (i.e. the maximum IP packet size minus any relevant overhead), effectively restoring the behaviour prior to 4.3. Signed-off-by: David Wragg <david@weave.works> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
039f5062 |
|
24-Dec-2015 |
Pravin B Shelar <pshelar@nicira.com> |
ip_tunnel: Move stats update to iptunnel_xmit() By moving stats update into iptunnel_xmit(), we can simplify iptunnel_xmit() usage. With this change there is no need to call another function (iptunnel_xmit_stats()) to update stats in tunnel xmit code path. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dfc3b0e8 |
|
26-Nov-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: remove unnecessary mroute.h includes It looks like many files are including mroute.h unnecessarily, so remove the include. Most importantly remove it from ipv6. CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fc4099f1 |
|
22-Oct-2015 |
Pravin B Shelar <pshelar@nicira.com> |
openvswitch: Fix egress tunnel info. While transitioning to netdev based vport we broke OVS feature which allows user to retrieve tunnel packet egress information for lwtunnel devices. Following patch fixes it by introducing ndo operation to get the tunnel egress info. Same ndo operation can be used for lwtunnel devices and compat ovs-tnl-vport devices. So after adding such device operation we can remove similar operation from ovs-vport. Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device"). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7f9562a1 |
|
28-Aug-2015 |
Jiri Benc <jbenc@redhat.com> |
ip_tunnels: record IP version in tunnel info There's currently nothing preventing directing packets with IPv6 encapsulation data to IPv4 tunnels (and vice versa). If this happens, IPv6 addresses are incorrectly interpreted as IPv4 ones. Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this in ip_tunnel_info. Reject packets at appropriate places if they are supposed to be encapsulated into an incompatible protocol. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
46fa062a |
|
28-Aug-2015 |
Jiri Benc <jbenc@redhat.com> |
ip_tunnels: convert the mode field of ip_tunnel_info to flags The mode field holds a single bit of information only (whether the ip_tunnel_info struct is for rx or tx). Change the mode field to bit flags. This allows more mode flags to be added. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c29a70d2 |
|
27-Aug-2015 |
Pravin B Shelar <pshelar@nicira.com> |
tunnel: introduce udp_tun_rx_dst() Introduce function udp_tun_rx_dst() to initialize tunnel dst on receive path. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Reviewed-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
61adedf3 |
|
20-Aug-2015 |
Jiri Benc <jbenc@redhat.com> |
route: move lwtunnel state to dst_entry Currently, the lwtunnel state resides in per-protocol data. This is a problem if we encapsulate ipv6 traffic in an ipv4 tunnel (or vice versa). The xmit function of the tunnel does not know whether the packet has been routed to it by ipv4 or ipv6, yet it needs the lwtstate data. Moving the lwtstate data to dst_entry makes such inter-protocol tunneling possible. As a bonus, this brings a nice diffstat. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7c383fb2 |
|
20-Aug-2015 |
Jiri Benc <jbenc@redhat.com> |
ip_tunnels: use tos and ttl fields also for IPv6 Rename the ipv4_tos and ipv4_ttl fields to just 'tos' and 'ttl', as they'll be used with IPv6 tunnels, too. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c1ea5d67 |
|
20-Aug-2015 |
Jiri Benc <jbenc@redhat.com> |
ip_tunnels: add IPv6 addresses to ip_tunnel_key Add the IPv6 addresses as an union with IPv4 ones. When using IPv4, the newly introduced padding after the IPv4 addresses needs to be zeroed out. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9f57c67c |
|
08-Aug-2015 |
Pravin B Shelar <pshelar@nicira.com> |
gre: Remove support for sharing GRE protocol hook. Support for sharing GREPROTO_CISCO port was added so that OVS gre port and kernel GRE devices can co-exist. After flow-based tunneling patches OVS GRE protocol processing is completely moved to ip_gre module. so there is no need for GRE protocol hook. Following patch consolidates GRE protocol related functions into ip_gre module. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b2acd1dc |
|
08-Aug-2015 |
Pravin B Shelar <pshelar@nicira.com> |
openvswitch: Use regular GRE net_device instead of vport Using GRE tunnel meta data collection feature, we can implement OVS GRE vport. This patch removes all of the OVS specific GRE code and make OVS use a ip_gre net_device. Minimal GRE vport is kept to handle compatibility with current userspace application. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e15ea39 |
|
08-Aug-2015 |
Pravin B Shelar <pshelar@nicira.com> |
ip_gre: Add support to collect tunnel metadata. Following patch create new tunnel flag which enable tunnel metadata collection on given device. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51456b29 |
|
03-Apr-2015 |
Ian Morris <ipm@chirality.org.uk> |
ipv4: coding style: comparison for equality with NULL The ipv4 code uses a mixture of coding styles. In some instances check for NULL pointer is done as x == NULL and sometimes as !x. !x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1e99584b |
|
02-Apr-2015 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipip,gre,vti,sit: implement ndo_get_iflink Don't use dev->iflink anymore. CC: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
67b61f6c |
|
29-Mar-2015 |
Jiri Benc <jbenc@redhat.com> |
netlink: implement nla_get_in_addr and nla_get_in6_addr Those are counterparts to nla_put_in_addr and nla_put_in6_addr. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
930345ea |
|
29-Mar-2015 |
Jiri Benc <jbenc@redhat.com> |
netlink: implement nla_put_in_addr and nla_put_in6_addr IP addresses are often stored in netlink attributes. Add generic functions to do that. For nla_put_in_addr, it would be nicer to pass struct in_addr but this is not used universally throughout the kernel, in way too many places __be32 is used to store IPv4 address. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3e97fa70 |
|
06-Feb-2015 |
Sabrina Dubroca <sd@queasysnail.net> |
gre/ipip: use be16 variants of netlink functions encap.sport and encap.dport are __be16, use nla_{get,put}_be16 instead of nla_{get,put}_u16. Fixes the sparse warnings: warning: incorrect type in assignment (different base types) expected restricted __be32 [addressable] [usertype] o_key got restricted __be16 [addressable] [usertype] i_flags warning: incorrect type in assignment (different base types) expected restricted __be16 [usertype] sport got unsigned short warning: incorrect type in assignment (different base types) expected restricted __be16 [usertype] dport got unsigned short warning: incorrect type in argument 3 (different base types) expected unsigned short [unsigned] [usertype] value got restricted __be16 [usertype] sport warning: incorrect type in argument 3 (different base types) expected unsigned short [unsigned] [usertype] value got restricted __be16 [usertype] dport Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1728d4fa |
|
15-Jan-2015 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
tunnels: advertise link netns via netlink Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is added to rtnetlink messages. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bec94d430 |
|
27-Dec-2014 |
stephen hemminger <shemming@brocade.com> |
gre: allow live address change The GRE tap device supports Ethernet over GRE, but doesn't care about the source address of the tunnel, therefore it can be changed without bring device down. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a0033a9 |
|
15-Dec-2014 |
Timo Teräs <timo.teras@iki.fi> |
gre: fix the inner mac header in nbma tunnel xmit path The NBMA GRE tunnels temporarily push GRE header that contain the per-packet NBMA destination on the skb via header ops early in xmit path. It is the later pulled before the real GRE header is constructed. The inner mac was thus set differently in nbma case: the GRE header has been pushed by neighbor layer, and mac header points to beginning of the temporary gre header (set by dev_queue_xmit). Now that the offloads expect mac header to point to the gre payload, fix the xmit patch to: - pull first the temporary gre header away - and reset mac header to point to gre payload This fixes tso to work again with nbma tunnels. Fixes: 14051f0452a2 ("gre: Use inner mac length when computing tunnel length") Signed-off-by: Timo Teräs <timo.teras@iki.fi> Cc: Tom Herbert <therbert@google.com> Cc: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e1b2cb65 |
|
05-Nov-2014 |
Tom Herbert <therbert@google.com> |
fou: Fix typo in returning flags in netlink When filling netlink info, dport is being returned as flags. Fix instances to return correct value. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
02875878 |
|
05-Oct-2014 |
Eric Dumazet <edumazet@google.com> |
net: better IFF_XMIT_DST_RELEASE support Testing xmit_more support with netperf and connected UDP sockets, I found strange dst refcount false sharing. Current handling of IFF_XMIT_DST_RELEASE is not optimal. Dropping dst in validate_xmit_skb() is certainly too late in case packet was queued by cpu X but dequeued by cpu Y The logical point to take care of drop/force is in __dev_queue_xmit() before even taking qdisc lock. As Julian Anastasov pointed out, need for skb_dst() might come from some packet schedulers or classifiers. This patch adds new helper to cleanly express needs of various drivers or qdiscs/classifiers. Drivers that need skb_dst() in their ndo_start_xmit() should call following helper in their setup instead of the prior : dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; -> netif_keep_dst(dev); Instead of using a single bit, we use two bits, one being eventually rebuilt in bonding/team drivers. The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being rebuilt in bonding/team. Eventually, we could add something smarter later. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
54bc9bac |
|
29-Sep-2014 |
Tom Herbert <therbert@google.com> |
gre: Set inner protocol in v4 and v6 GRE transmit Call skb_set_inner_protocol to set inner Ethernet protocol to protocol being encapsulation by GRE before tunnel_xmit. This is needed for GSO if UDP encapsulation (fou) is being done. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4565e991 |
|
17-Sep-2014 |
Tom Herbert <therbert@google.com> |
gre: Setup and TX path for gre/UDP foo-over-udp encapsulation Added netlink attrs to configure FOU encapsulation for GRE, netlink handling of these flags, and properly adjust MTU for encapsulation. ip_tunnel_encap is called from ip_tunnel_xmit to actually perform FOU encapsulation. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f8c1b7ce |
|
06-Jun-2014 |
stephen hemminger <stephen@networkplumber.org> |
gre: allow changing mac address when device is up There is no need to require forcing device down on a Ethernet GRE (gretap) tunnel to change the MAC address. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b57708ad |
|
22-Apr-2014 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
gre: add x-netns support This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a455275 |
|
11-Apr-2014 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
gre: don't allow to add the same tunnel twice Before the patch, it was possible to add two times the same tunnel: ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249 ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249 It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the argument dev->type, which was set only later (when calling ndo_init handler in register_netdevice()). Let's set this type in the setup handler, which is called before newlink handler. Introduced by commit c54419321455 ("GRE: Refactor GRE tunneling code."). CC: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c0c0c50f |
|
27-Jan-2014 |
Duan Jiong <duanj.fnst@cn.fujitsu.com> |
net: gre: use icmp_hdr() to get inner ip header When dealing with icmp messages, the skb->data points the ip header that triggered the sending of the icmp message. In gre_cisco_err(), the parse_gre_header() is called, and the iptunnel_pull_header() is called to pull the skb at the end of the parse_gre_header(), so the skb->data doesn't point the inner ip header. Unfortunately, the ipgre_err still needs those ip addresses in inner ip header to look up tunnel by ip_tunnel_lookup(). So just use icmp_hdr() to get inner ip header instead of skb->data. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3acfa1e7 |
|
18-Jan-2014 |
Eric Dumazet <edumazet@google.com> |
ipv4: be friend with drop monitor Replace some dev_kfree_skb() with kfree_skb() calls when we drop one skb, this might help bug tracking. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0e3da5bb |
|
16-Dec-2013 |
Timo Teräs <timo.teras@iki.fi> |
ip_gre: fix msg_name parsing for recvfrom/recvmsg ipgre_header_parse() needs to parse the tunnel's ip header and it uses mac_header to locate the iphdr. This got broken when gre tunneling was refactored as mac_header is no longer updated to point to iphdr. Introduce skb_pop_mac_header() helper to do the mac_header assignment and use it in ipgre_rcv() to fix msg_name parsing. Bug introduced in commit c54419321455 (GRE: Refactor GRE tunneling code.) Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c742e71 |
|
13-Aug-2013 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipip: add x-netns support This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
77a482bd |
|
06-Aug-2013 |
Timo Teräs <timo.teras@iki.fi> |
ip_gre: fix ipgre_header to return correct offset Fix ipgre_header() (header_ops->create) to return the correct amount of bytes pushed. Most callers of dev_hard_header() seem to care only if it was success, but af_packet.c uses it as offset to the skb to copy from userspace only once. In practice this fixes packet socket sendto()/sendmsg() to gre tunnels. Regression introduced in c54419321455631079c7d6e60bc732dd0c5914c5 ("GRE: Refactor GRE tunneling code.") Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Timo Teräs <timo.teras@iki.fi> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c734fb8 |
|
28-Jun-2013 |
Cong Wang <amwang@redhat.com> |
gre: fix a regression in ioctl When testing GRE tunnel, I got: # ip tunnel show get tunnel gre0 failed: Invalid argument get tunnel gre1 failed: Invalid argument This is a regression introduced by commit c54419321455631079c7d ("GRE: Refactor GRE tunneling code.") because previously we only check the parameters for SIOCADDTUNNEL and SIOCCHGTUNNEL, after that commit, the check is moved for all commands. So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL. After this patch I got: # ip tunnel show gre0: gre/ip remote any local any ttl inherit nopmtudisc gre1: gre/ip remote 192.168.122.101 local 192.168.122.45 ttl inherit Cc: Pravin B Shelar <pshelar@nicira.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
45f2e997 |
|
17-Jun-2013 |
Pravin B Shelar <pshelar@nicira.com> |
gre: export gre_handle_offloads() function. This is required for OVS GRE offloading. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
752f36da |
|
17-Jun-2013 |
Pravin B Shelar <pshelar@nicira.com> |
gre: export gre_build_header() function. This is required for ovs gre module. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bda7bb46 |
|
17-Jun-2013 |
Pravin B Shelar <pshelar@nicira.com> |
gre: Allow multiple protocol listener for gre protocol. Currently there is only one user is allowed to register for gre protocol. Following patch adds de-multiplexer. So that multiple modules can listen on gre protocol e.g. kernel gre devices and ovs. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bf3d6a8f |
|
27-May-2013 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
iptunnel: specify protocol outside IP header Before this patch, ip_tunnel_xmit() was using the field protocol from the IP header passed into argument. There is no functional change, this patch prepares the support of IPv4 over IPv4 for module sit. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
96f5a846 |
|
18-May-2013 |
Eric Dumazet <edumazet@google.com> |
ip_gre: fix a possible crash in ipgre_err() Another fix needed in ipgre_err(), as parse_gre_header() might change skb->head. Bug added in commit c54419321455 (GRE: Refactor GRE tunneling code.) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2bac7cb3 |
|
22-Apr-2013 |
Chen Gang <gang.chen@asianux.com> |
net: ipv4: typo issue, remove erroneous semicolon Need remove erroneous semicolon, which is found by EXTRA_CFLAGS=-W, the related commit number: c54419321455631079c7d6e60bc732dd0c5914c5 ("GRE: Refactor GRE tunneling code") Signed-off-by: Chen Gang <gang.chen@asianux.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
22251c73 |
|
04-Apr-2013 |
Eric Dumazet <edumazet@google.com> |
ip_gre: fix a possible crash in parse_gre_header() pskb_may_pull() can change skb->head, so we must init iph/greh after calling it. Bug added in commit c54419321455 (GRE: Refactor GRE tunneling code.) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
537aadc3 |
|
30-Mar-2013 |
Antonio Quartulli <ordex@autistici.org> |
ip_gre: don't overwrite iflink during net_dev init iflink is currently set to 0 in __gre_tunnel_init(). This function is invoked in gre_tap_init() and ipgre_tunnel_init() which are both used to initialise the ndo_init field of the respective net_device_ops structs (ipgre.. and gre_tap..) used by GRE interfaces. However, in netdevice_register() iflink is first set to -1, then ndo_init is invoked and then iflink is assigned to a proper value if and only if it still was -1. Assigning 0 to iflink in ndo_init is therefore first preventing netdev_register() to correctly assign it a proper value and then breaking iflink at all since 0 has not correct meaning. Fix this by removing the iflink assignment in __gre_tunnel_init(). Introduced by c54419321455631079c7d6e60bc732dd0c5914c5 ("GRE: Refactor GRE tunneling code.") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Pravin B Shelar <pshelar@nicira.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c5441932 |
|
25-Mar-2013 |
Pravin B Shelar <pshelar@nicira.com> |
GRE: Refactor GRE tunneling code. Following patch refactors GRE code into ip tunneling code and GRE specific code. Common tunneling code is moved to ip_tunnel module. ip_tunnel module is written as generic library which can be used by different tunneling implementations. ip_tunnel module contains following components: - packet xmit and rcv generic code. xmit flow looks like (gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out. - hash table of all devices. - lookup for tunnel devices. - control plane operations like device create, destroy, ioctl, netlink operations code. - registration for tunneling modules, like gre, ipip etc. - define single pcpu_tstats dev->tstats. - struct tnl_ptk_info added to pass parsed tunnel packet parameters. ipip.h header is renamed to ip_tunnel.h Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8c6216d7 |
|
12-Mar-2013 |
Timo Teräs <timo.teras@iki.fi> |
Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally" This reverts commit 412ed94744d16806fbec3bd250fd94e71cde5a1f. The commit is wrong as tiph points to the outer IPv4 header which is installed at ipgre_header() and not the inner one which is protocol dependant. This commit broke succesfully opennhrp which use PF_PACKET socket with ETH_P_NHRP protocol. Additionally ssl_addr is set to the link-layer IPv4 address. This address is written by ipgre_header() to the skb earlier, and this is the IPv4 header tiph should point to - regardless of the inner protocol payload. Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6aed0c8b |
|
09-Mar-2013 |
Cong Wang <xiyou.wangcong@gmail.com> |
tunnel: use iptunnel_xmit() again With recent patches from Pravin, most tunnels can't use iptunnel_xmit() any more, due to ip_select_ident() and skb->ip_summed. But we can just move these operations out of iptunnel_xmit(), so that tunnels can use it again. This by the way fixes a bug in vxlan (missing nf_reset()) for net-next. Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7992ae6d |
|
24-Feb-2013 |
Pravin B Shelar <pshelar@nicira.com> |
Revert "ip_gre: propogate target device GSO capability to the tunnel device" This reverts commit eb6b9a8cad65e820b145547844b108117cece3a0. Above commit limits GSO capability of gre device to just TSO, but software GRE-GSO is capable of handling all GSO capabilities. This patch also fixes following panic which reverted commit introduced:- BUG: unable to handle kernel NULL pointer dereference at 00000000000000a2 IP: [<ffffffffa0680fd1>] ipgre_tunnel_bind_dev+0x161/0x1f0 [ip_gre] PGD 42bc19067 PUD 42bca9067 PMD 0 Oops: 0000 [#1] SMP Pid: 2636, comm: ip Tainted: GF 3.8.0+ #83 Dell Inc. PowerEdge R620/0KCKR5 RIP: 0010:[<ffffffffa0680fd1>] [<ffffffffa0680fd1>] ipgre_tunnel_bind_dev+0x161/0x1f0 [ip_gre] RSP: 0018:ffff88042bfcb708 EFLAGS: 00010246 RAX: 00000000000005b6 RBX: ffff88042d2fa000 RCX: 0000000000000044 RDX: 0000000000000018 RSI: 0000000000000078 RDI: 0000000000000060 RBP: ffff88042bfcb748 R08: 0000000000000018 R09: 000000000000000c R10: 0000000000000020 R11: 000000000101010a R12: ffff88042d2fa800 R13: 0000000000000000 R14: ffff88042d2fa800 R15: ffff88042cd7f650 FS: 00007fa784f55700(0000) GS:ffff88043fd20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a2 CR3: 000000042d8b9000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ip (pid: 2636, threadinfo ffff88042bfca000, task ffff88042d142a80) Stack: 0000000100000000 002f000000000000 0a01010100000000 000000000b010101 ffff88042d2fa800 ffff88042d2fa000 ffff88042bfcb858 ffff88042f418c00 ffff88042bfcb798 ffffffffa068199a ffff88042bfcb798 ffff88042d2fa830 Call Trace: [<ffffffffa068199a>] ipgre_newlink+0xca/0x160 [ip_gre] [<ffffffff8143b692>] rtnl_newlink+0x532/0x5f0 [<ffffffff8143b2fc>] ? rtnl_newlink+0x19c/0x5f0 [<ffffffff81438978>] rtnetlink_rcv_msg+0x2c8/0x340 [<ffffffff814386b0>] ? rtnetlink_rcv+0x40/0x40 [<ffffffff814560f9>] netlink_rcv_skb+0xa9/0xd0 [<ffffffff81438695>] rtnetlink_rcv+0x25/0x40 [<ffffffff81455ddc>] netlink_unicast+0x1ac/0x230 [<ffffffff81456a45>] netlink_sendmsg+0x265/0x380 [<ffffffff814138c0>] sock_sendmsg+0xb0/0xe0 [<ffffffff8141141e>] ? move_addr_to_kernel+0x4e/0x90 [<ffffffff81420445>] ? verify_iovec+0x85/0xf0 [<ffffffff81414ffd>] __sys_sendmsg+0x3fd/0x420 [<ffffffff8114b701>] ? handle_mm_fault+0x251/0x3b0 [<ffffffff8114f39f>] ? vma_link+0xcf/0xe0 [<ffffffff81415239>] sys_sendmsg+0x49/0x90 [<ffffffff814ffd19>] system_call_fastpath+0x16/0x1b CC: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8f10098f |
|
24-Feb-2013 |
Pravin B Shelar <pshelar@nicira.com> |
IP_GRE: Fix GRE_CSUM case. commit "ip_gre: allow CSUM capable devices to handle packets" aa0e51cdda005cd37e2, broke GRE_CSUM case. GRE_CSUM needs checksum computed for inner packet. Therefore csum-calculation can not be offloaded if tunnel device requires GRE_CSUM. Following patch fixes it by computing inner packet checksum for GRE_CSUM type, for all other type of GRE devices csum is offloaded. CC: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
490ab081 |
|
22-Feb-2013 |
Pravin B Shelar <pshelar@nicira.com> |
IP_GRE: Fix IP-Identification. GRE-GSO generates ip fragments with id 0,2,3,4... for every GSO packet, which is not correct. Following patch fixes it by setting ip-header id unique id of fragments are allowed. As Eric Dumazet suggested it is optimized by using inner ip-header whenever inner packet is ipv4. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4aa896c4 |
|
18-Feb-2013 |
Eric Dumazet <edumazet@google.com> |
ip_gre: remove an extra dst_release() commit 68c331631143 (v4 GRE: Add TCP segmentation offload for GRE) introduced a bug in error path. dst is attached to skb, so will be released when skb is freed. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pravin B Shelar <pshelar@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb6b9a8c |
|
18-Feb-2013 |
Dmitry Kravkov <dmitry@broadcom.com> |
ip_gre: propogate target device GSO capability to the tunnel device Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aa0e51cd |
|
18-Feb-2013 |
Dmitry Kravkov <dmitry@broadcom.com> |
ip_gre: allow CSUM capable devices to handle packets If device is not able to handle checksumming it will be handled in dev_xmit Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
68c33163 |
|
14-Feb-2013 |
Pravin B Shelar <pshelar@nicira.com> |
v4 GRE: Add TCP segmentation offload for GRE Following patch adds GRE protocol offload handler so that skb_gso_segment() can segment GRE packets. SKB GSO CB is added to keep track of total header length so that skb_segment can push entire header. e.g. in case of GRE, skb_segment need to push inner and outer headers to every segment. New NETIF_F_GRE_GSO feature is added for devices which support HW GRE TSO offload. Currently none of devices support it therefore GRE GSO always fall backs to software GSO. [ Compute pkt_len before ip_local_out() invocation. -DaveM ] Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
040468a0 |
|
27-Jan-2013 |
David Ward <david.ward@ll.mit.edu> |
ip_gre: When TOS is inherited, use configured TOS value for non-IP packets A GRE tunnel can be configured so that outgoing tunnel packets inherit the value of the TOS field from the inner IP header. In doing so, when a non-IP packet is transmitted through the tunnel, the TOS field will always be set to 0. Instead, the user should be able to configure a different TOS value as the fallback to use for non-IP packets. This is helpful when the non-IP packets are all control packets and should be handled by routers outside the tunnel as having Internet Control precedence. One example of this is the NHRP packets that control a DMVPN-compatible mGRE tunnel; they are encapsulated directly by GRE and do not contain an inner IP header. Under the existing behavior, the IFLA_GRE_TOS parameter must be set to '1' for the TOS value to be inherited. Now, only the least significant bit of this parameter must be set to '1', and when a non-IP packet is sent through the tunnel, the upper 6 bits of this same parameter will be copied into the TOS field. (The ECN bits get masked off as before.) This behavior is backwards-compatible with existing configurations and iproute2 versions. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cef401de |
|
25-Jan-2013 |
Eric Dumazet <edumazet@google.com> |
net: fix possible wrong checksum generation Pravin Shelar mentioned that GSO could potentially generate wrong TX checksum if skb has fragments that are overwritten by the user between the checksum computation and transmit. He suggested to linearize skbs but this extra copy can be avoided for normal tcp skbs cooked by tcp_sendmsg(). This patch introduces a new SKB_GSO_SHARED_FRAG flag, set in skb_shinfo(skb)->gso_type if at least one frag can be modified by the user. Typical sources of such possible overwrites are {vm}splice(), sendfile(), and macvtap/tun/virtio_net drivers. Tested: $ netperf -H 7.7.8.84 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3959.52 $ netperf -H 7.7.8.84 -t TCP_SENDFILE TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3216.80 Performance of the SENDFILE is impacted by the extra allocation and copy, and because we use order-0 pages, while the TCP_STREAM uses bigger pages. Reported-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5465740a |
|
23-Jan-2013 |
Pravin B Shelar <pshelar@nicira.com> |
IP_GRE: Fix kernel panic in IP_GRE with GRE csum. Due to IP_GRE GSO support, GRE can recieve non linear skb which results in panic in case of GRE_CSUM. Following patch fixes it by using correct csum API. Bug introduced in commit 6b78f16e4bdde3936b (gre: add GSO support) Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
861aa6d5 |
|
24-Dec-2012 |
Isaku Yamahata <yamahata@valinux.co.jp> |
ipv4/ip_gre: set transport header correctly to gre header ipgre_tunnel_xmit() incorrectly sets transport header to inner payload instead of GRE header. It seems copy-and-pasted from ipip.c. So set transport header to gre header. (In ipip case the transport header is the inner ip header, so that's correct.) Found by inspection. In practice the incorrect transport header doesn't matter because the skb usually is sent to another net_device or socket, so the transport header isn't referenced. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f7e75ba1 |
|
20-Dec-2012 |
Eric Dumazet <edumazet@google.com> |
ip_gre: fix possible use after free Once skb_realloc_headroom() is called, tiph might point to freed memory. Cache tiph->ttl value before the reallocation, to avoid unexpected behavior. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
412ed947 |
|
20-Dec-2012 |
Isaku Yamahata <yamahata@valinux.co.jp> |
ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally ipgre_tunnel_xmit() parses network header as IP unconditionally. But transmitting packets are not always IP packet. For example such packet can be sent by packet socket with sockaddr_ll.sll_protocol set. So make the function check if skb->protocol is IP. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52e804c6 |
|
15-Nov-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Allow userns root to control ipv4 Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) and capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns, CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls. Settings that merely control a single network device are allowed. Either the network device is a logical network device where restrictions make no difference or the network device is hardware NIC that has been explicity moved from the initial network namespace. In general policy and network stack state changes are allowed while resource control is left unchanged. Allow creating raw sockets. Allow the SIOCSARP ioctl to control the arp cache. Allow the SIOCSIFFLAG ioctl to allow setting network device flags. Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address. Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address. Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address. Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask. Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for adding, changing and deleting gre tunnels. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for adding, changing and deleting ipip tunnels. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for adding, changing and deleting ipsec virtual tunnel interfaces. Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC, MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing sockets. Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and arbitrary ip options. Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option. Allow setting the IP_TRANSPARENT ipv4 socket option. Allow setting the TCP_REPAIR socket option. Allow setting the TCP_CONGESTION socket option. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e086cadc |
|
11-Nov-2012 |
Amerigo Wang <amwang@redhat.com> |
net: unify for_each_ip_tunnel_rcu() The defitions of for_each_ip_tunnel_rcu() are same, so unify it. Also, don't hide the parameter 't'. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aa0010f8 |
|
11-Nov-2012 |
Amerigo Wang <amwang@redhat.com> |
net: convert __IPTUNNEL_XMIT() to an inline function __IPTUNNEL_XMIT() is an ugly macro, convert it to a static inline function, so make it more readable. IPTUNNEL_XMIT() is unused, just remove it. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9fbef059 |
|
30-Sep-2012 |
stephen hemminger <shemminger@vyatta.com> |
gre: fix sparse warning Use be16 consistently when looking at flags. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
60769a5d |
|
26-Sep-2012 |
Eric Dumazet <edumazet@google.com> |
ipv4: gre: add GRO capability Add GRO capability to IPv4 GRE tunnels, using the gro_cells infrastructure. Tested using IPv4 and IPv6 TCP traffic inside this tunnel, and checking GRO is building large packets. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eccc1bb8 |
|
25-Sep-2012 |
stephen hemminger <shemminger@vyatta.com> |
tunnel: drop packet if ECN present with not-ECT Linux tunnels were written before RFC6040 and therefore never implemented the corner case of ECN getting set in the outer header and the inner header not being ready for it. Section 4.2. Default Tunnel Egress Behaviour. o If the inner ECN field is Not-ECT, the decapsulator MUST NOT propagate any other ECN codepoint onwards. This is because the inner Not-ECT marking is set by transports that rely on dropped packets as an indication of congestion and would not understand or respond to any other ECN codepoint [RFC4774]. Specifically: * If the inner ECN field is Not-ECT and the outer ECN field is CE, the decapsulator MUST drop the packet. * If the inner ECN field is Not-ECT and the outer ECN field is Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the outgoing packet with the ECN field cleared to Not-ECT. This patch moves the ECN decap logic out of the individual tunnels into a common place. It also adds logging to allow detecting broken systems that set ECN bits incorrectly when tunneling (or an intermediate router might be changing the header). Overloads rx_frame_error to keep track of ECN related error. Thanks to Chris Wright who caught this while reviewing the new VXLAN tunnel. This code was tested by injecting faulty logic in other end GRE to send incorrectly encapsulated packets. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0c5794a6 |
|
24-Sep-2012 |
stephen hemminger <shemminger@vyatta.com> |
gre: remove unnecessary rcu_read_lock/unlock The gre function pointers for receive and error handling are always called (from gre.c) with rcu_read_lock already held. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d2083287 |
|
24-Sep-2012 |
stephen hemminger <shemminger@vyatta.com> |
gre: fix handling of key 0 GRE driver incorrectly uses zero as a flag value. Zero is a perfectly valid value for key, and the tunnel should match packets with no key only with tunnels created without key, and vice versa. This is a slightly visible change since previously it might be possible to construct a working tunnel that sent key 0 and received only because of the key wildcard of zero. I.e the sender sent key of zero, but tunnel was defined without key. Note: using gre key 0 requires iproute2 utilities v3.2 or later. The original utility code was broken as well. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b78f16e |
|
13-Sep-2012 |
Eric Dumazet <edumazet@google.com> |
gre: add GSO support Add GSO support to GRE tunnels. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f8126f1d |
|
13-Jul-2012 |
David S. Miller <davem@davemloft.net> |
ipv4: Adjust semantics of rt->rt_gateway. In order to allow prefixed routes, we have to adjust how rt_gateway is set and interpreted. The new interpretation is: 1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr 2) rt_gateway != 0, destination requires a nexthop gateway Abstract the fetching of the proper nexthop value using a new inline helper, rt_nexthop(), as suggested by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
|
#
6700c270 |
|
17-Jul-2012 |
David S. Miller <davem@davemloft.net> |
net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
55be7a9c |
|
11-Jul-2012 |
David S. Miller <davem@davemloft.net> |
ipv4: Add redirect support to all protocol icmp error handlers. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
36393395 |
|
14-Jun-2012 |
David S. Miller <davem@davemloft.net> |
ipv4: Handle PMTU in all ICMP error handlers. With ip_rt_frag_needed() removed, we have to explicitly update PMTU information in every ICMP error handler. Create two helper functions to facilitate this. 1) ipv4_sk_update_pmtu() This updates the PMTU when we have a socket context to work with. 2) ipv4_update_pmtu() Raw version, used when no socket context is available. For this interface, we essentially just pass in explicit arguments for the flow identity information we would have extracted from the socket. And you'll notice that ipv4_sk_update_pmtu() is simply implemented in terms of ipv4_update_pmtu() Note that __ip_route_output_key() is used, rather than something like ip_route_output_flow() or ip_route_output_key(). This is because we absolutely do not want to end up with a route that does IPSEC encapsulation and the like. Instead, we only want the route that would get us to the node described by the outermost IP header. Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5e73ea1a |
|
14-Apr-2012 |
Daniel Baluta <dbaluta@ixiacom.com> |
ipv4: fix checkpatch errors Fix checkpatch errors of the following type: * ERROR: "foo * bar" should be "foo *bar" * ERROR: "(foo*)" should be "(foo *)" Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
87b6d218 |
|
12-Apr-2012 |
stephen hemminger <shemminger@vyatta.com> |
tunnel: implement 64 bits statistics Convert the per-cpu statistics kept for GRE, IPIP, and SIT tunnels to use 64 bit statistics. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3756b79 |
|
01-Apr-2012 |
David S. Miller <davem@davemloft.net> |
ipv4: Stop using NLA_PUT*(). These macros contain a hidden goto, and are thus extremely error prone and make code hard to audit. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
afd46503 |
|
12-Mar-2012 |
Joe Perches <joe@perches.com> |
net: ipv4: Standardize prefixes for message logging Add #define pr_fmt(fmt) as appropriate. Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files. Standardize on "UDPLite: " for appropriate uses. Some prefixes were previously "UDPLITE: " and "UDP-Lite: ". Add KBUILD_MODNAME ": " to icmp and gre. Remove embedded prefixes as appropriate. Add missing "\n" to pr_info in gre.c. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
058bd4d2 |
|
11-Mar-2012 |
Joe Perches <joe@perches.com> |
net: Convert printks to pr_<level> Use a more current kernel messaging style. Convert a printk block to print_hex_dump. Coalesce formats, align arguments. Use %s, __func__ instead of embedding function names. Some messages that were prefixed with <foo>_close are now prefixed with <foo>_fini. Some ah4 and esp messages are now not prefixed with "ip ". The intent of this patch is to later add something like #define pr_fmt(fmt) "IPv4: " fmt. to standardize the output messages. Text size is trivially reduced. (x86-32 allyesconfig) $ size net/ipv4/built-in.o* text data bss dec hex filename 887888 31558 249696 1169142 11d6f6 net/ipv4/built-in.o.new 887934 31558 249800 1169292 11d78c net/ipv4/built-in.o.old Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bff52857 |
|
24-Feb-2012 |
stephen hemminger <shemminger@vyatta.com> |
gre: fix spelling in comments The original spelling and bad word choice makes these comments hard to read. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f2cedb63 |
|
14-Feb-2012 |
Danny Kukawka <danny.kukawka@bisect.de> |
net: replace random_ether_addr() with eth_hw_addr_random() Replace usage of random_ether_addr() with eth_hw_addr_random() to set addr_assign_type correctly to NET_ADDR_RANDOM. Change the trivial cases. v2: adapt to renamed eth_hw_addr_random() Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0ec88662 |
|
27-Jan-2012 |
David S. Miller <davem@davemloft.net> |
ipv4: ip_gre: Convert to dst_neigh_lookup() The conversion is very similar to that made to ipv6's SIT code. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f2b3ee9e |
|
26-Jan-2012 |
Willem de Bruijn <willemb@google.com> |
ipv6: Fix ip_gre lockless xmits. Tunnel devices set NETIF_F_LLTX to bypass HARD_TX_LOCK. Sit and ipip set this unconditionally in ops->setup, but gre enables it conditionally after parameter passing in ops->newlink. This is not called during tunnel setup as below, however, so GRE tunnels are still taking the lock. modprobe ip_gre ip tunnel add test0 mode gre remote 10.5.1.1 dev lo ip link set test0 up ip addr add 10.6.0.1 dev test0 # cat /sys/class/net/test0/features # $DIR/test_tunnel_xmit 10 10.5.2.1 ip route add 10.5.2.0/24 dev test0 ip tunnel del test0 The newlink callback is only called in rtnl_netlink, and only if the device is new, as it calls register_netdevice internally. Gre tunnels are created at 'ip tunnel add' with ioctl SIOCADDTUNNEL, which calls ipgre_tunnel_locate, which calls register_netdev. rtnl_newlink is called at 'ip link set', but skips ops->newlink and the device is up with locking still enabled. The equivalent ipip tunnel works fine, btw (just substitute 'method gre' for 'method ipip'). On kernels before /sys/class/net/*/features was removed [1], the first commented out line returns 0x6000 with method gre, which indicates that NETIF_F_LLTX (0x1000) is not set. With ipip, it reports 0x7000. This test cannot be used on recent kernels where the sysfs file is removed (and ETHTOOL_GFEATURES does not currently work for tunnel devices, because they lack dev->ethtool_ops). The second commented out line calls a simple transmission test [2] that sends on 24 cores at maximum rate. Results of a single run: ipip: 19,372,306 gre before patch: 4,839,753 gre after patch: 19,133,873 This patch replicates the condition check in ipgre_newlink to ipgre_tunnel_locate. It works for me, both with oseq on and off. This is the first time I looked at rtnetlink and iproute2 code, though, so someone more knowledgeable should probably check the patch. Thanks. The tail of both functions is now identical, by the way. To avoid code duplication, I'll be happy to rework this and merge the two. [1] http://patchwork.ozlabs.org/patch/104610/ [2] http://kernel.googlecode.com/files/xmit_udp_parallel.c Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
61d57f87 |
|
24-Jan-2012 |
David S. Miller <davem@davemloft.net> |
ip_gre: Fix bug added to ipgre_tunnel_xmit(). We can remove the rt_gateway == 0 check but we shouldn't remove the 'dst' initialization too. Noticed by Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
496053f4 |
|
11-Jan-2012 |
David S. Miller <davem@davemloft.net> |
ipv4: Remove bogus checks of rt_gateway being zero. It can never actually happen. rt_gateway is either the fully resolved flow lookup key's destination address, or the non-zero FIB entry gateway address. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dfd56b8b |
|
10-Dec-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: use IS_ENABLED(CONFIG_IPV6) Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27217455 |
|
02-Dec-2011 |
David Miller <davem@davemloft.net> |
net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}. To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
|
#
805dc1d6 |
|
17-Nov-2011 |
Herbert Xu <herbert@gondor.apana.org.au> |
ip_gre: Set needed_headroom dynamically again ip_gre: Set needed_headroom dynamically again Now that all needed_headroom users have been fixed up so that we can safely increase needed_headroom, this patch restore the dynamic update of needed_headroom. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8ce120f1 |
|
04-Nov-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: better pcpu data alignment Tunnels can force an alignment of their percpu data to reduce number of cache lines used in fast path, or read in .ndo_get_stats() percpu_alloc() is a very fine grained allocator, so any small hole will be used anyway. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
113ab386 |
|
13-Oct-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
ip_gre: dont increase dev->needed_headroom on a live device It seems ip_gre is able to change dev->needed_headroom on the fly. Its is not legal unfortunately and triggers a BUG in raw_sendmsg() skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) < another cpu change dev->needed_headromm (making it bigger) ... skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE() -> we crash later because skb head is exhausted. Bug introduced in commit 243aad83 in 2.6.34 (ip_gre: include route header_len in max_headroom calculation) Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Timo Teräs <timo.teras@iki.fi> CC: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69cce1d1 |
|
18-Jul-2011 |
David S. Miller <davem@davemloft.net> |
net: Abstract dst->neighbour accesses behind helpers. dst_{get,set}_neighbour() Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1c5cae81 |
|
29-Apr-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: call dev_alloc_name from register_netdevice Force dev_alloc_name() to be called from register_netdevice() by dev_get_valid_name(). That allows to remove multiple explicit dev_alloc_name() calls. The possibility to call dev_alloc_name in advance remains. This also fixes veth creation regresion caused by 84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743 Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cbb1e85f |
|
04-May-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Kill rt->rt_{src, dst} usage in IP GRE tunnels. First, make callers pass on-stack flowi4 to ip_route_output_gre() so they can get at the fully resolved flow key. Next, use that in ipgre_tunnel_xmit() to avoid the need to use rt->rt_{dst,src}. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b71d1d42 |
|
21-Apr-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
inet: constify ip headers and in6_addr Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
78fbfd8a |
|
11-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Create and use route lookup helpers. The idea here is this minimizes the number of places one has to edit in order to make changes to how flows are defined and used. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8909c9ad |
|
01-Mar-2011 |
Vasiliy Kulikov <segoon@openwall.com> |
net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules Since a8f80e8ff94ecba629542d9b4b5f5a8ee3eb565c any process with CAP_NET_ADMIN may load any module from /lib/modules/. This doesn't mean that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are limited to /lib/modules/**. However, CAP_NET_ADMIN capability shouldn't allow anybody load any module not related to networking. This patch restricts an ability of autoloading modules to netdev modules with explicit aliases. This fixes CVE-2011-1019. Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior of loading netdev modules by name (without any prefix) for processes with CAP_SYS_MODULE to maintain the compatibility with network scripts that use autoloading netdev modules by aliases like "eth0", "wlan0". Currently there are only three users of the feature in the upstream kernel: ipip, ip_gre and sit. root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) -- root@albatros:~# grep Cap /proc/$$/status CapInh: 0000000000000000 CapPrm: fffffff800001000 CapEff: fffffff800001000 CapBnd: fffffff800001000 root@albatros:~# modprobe xfs FATAL: Error inserting xfs (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted root@albatros:~# lsmod | grep xfs root@albatros:~# ifconfig xfs xfs: error fetching interface information: Device not found root@albatros:~# lsmod | grep xfs root@albatros:~# lsmod | grep sit root@albatros:~# ifconfig sit sit: error fetching interface information: Device not found root@albatros:~# lsmod | grep sit root@albatros:~# ifconfig sit0 sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1 root@albatros:~# lsmod | grep sit sit 10457 0 tunnel4 2957 1 sit For CAP_SYS_MODULE module loading is still relaxed: root@albatros:~# grep Cap /proc/$$/status CapInh: 0000000000000000 CapPrm: ffffffffffffffff CapEff: ffffffffffffffff CapBnd: ffffffffffffffff root@albatros:~# ifconfig xfs xfs: error fetching interface information: Device not found root@albatros:~# lsmod | grep xfs xfs 745319 0 Reference: https://lkml.org/lkml/2011/2/24/203 Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>
|
#
b23dd4fe |
|
02-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Make output route lookup return rtable directly. Instead of on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
946bf5ee |
|
11-Feb-2011 |
Steffen Klassert <steffen.klassert@secunet.com> |
ip_gre: Add IPPROTO_GRE to flowi in ipgre_tunnel_xmit Commit 5811662b15db018c740c57d037523683fd3e6123 ("net: use the macros defined for the members of flowi") accidentally removed the setting of IPPROTO_GRE from the struct flowi in ipgre_tunnel_xmit. This patch restores it. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
323e126f |
|
12-Dec-2010 |
David S. Miller <davem@davemloft.net> |
ipv4: Don't pre-seed hoplimit metric. Always go through a new ip4_dst_hoplimit() helper, just like ipv6. This allowed several simplifications: 1) The interim dst_metric_hoplimit() can go as it's no longer userd. 2) The sysctl_ip_default_ttl entry no longer needs to use ipv4_doint_and_flush, since the sysctl is not cached in routing cache metrics any longer. 3) ipv4_doint_and_flush no longer needs to be exported and therefore can be marked static. When ipv4_doint_and_flush_strategy was removed some time ago, the external declaration in ip.h was mistakenly left around so kill that off too. We have to move the sysctl_ip_default_ttl declaration into ipv4's route cache definition header net/route.h, because currently net/ip.h (where the declaration lives now) has a back dependency on net/route.h Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5170ae82 |
|
12-Dec-2010 |
David S. Miller <davem@davemloft.net> |
net: Abstract RTAX_HOPLIMIT metric accesses behind helper. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
defb3519 |
|
08-Dec-2010 |
David S. Miller <davem@davemloft.net> |
net: Abstract away all dst_entry metrics accesses. Use helper functions to hide all direct accesses, especially writes, to dst_entry metrics values. This will allow us to: 1) More easily change how the metrics are stored. 2) Implement COW for metrics. In particular this will help us put metrics into the inetpeer cache if that is what we end up doing. We can make the _metrics member a pointer instead of an array, initially have it point at the read-only metrics in the FIB, and then on the first set grab an inetpeer entry and point the _metrics member there. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
|
#
4da6a738 |
|
29-Nov-2010 |
stephen hemminger <shemminger@vyatta.com> |
gre: add module alias for gre0 tunnel device If gre is built as a module the 'ip tunnel add' command would fail because the ip_gre module was not being autoloaded. Adding an alias for the gre0 device name cause dev_load() to autoload it when needed. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
407d6fcb |
|
29-Nov-2010 |
stephen hemminger <shemminger@vyatta.com> |
gre: minor cleanups Use strcpy() rather the sprintf() for the case where name is getting generated. Fix indentation. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5811662b |
|
12-Nov-2010 |
Changli Gao <xiaosuo@gmail.com> |
net: use the macros defined for the members of flowi Use the macros defined for the members of flowi to clean the code up. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc9ff19d |
|
02-Nov-2010 |
Timo Teräs <timo.teras@iki.fi> |
xfrm: use gre key as flow upper protocol info The GRE Key field is intended to be used for identifying an individual traffic flow within a tunnel. It is useful to be able to have XFRM policy selector matches to have different policies for different GRE tunnels. Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c7537967 |
|
11-Nov-2010 |
David S. Miller <davem@davemloft.net> |
ipv4: Make rt->fl.iif tests lest obscure. When we test rt->fl.iif against zero, we're seeing if it's an output or an input route. Make that explicit with some helper functions. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3285ee3b |
|
30-Oct-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
ip_gre: fix fallback tunnel setup Before making the fallback tunnel visible to lookups, we should make sure it is completely setup, once ipgre_tunnel_init() had been called and tstats per_cpu pointer allocated. move rcu_assign_pointer(ign->tunnels_wc[0], tunnel); from ipgre_fb_tunnel_init() to ipgre_init_net() Based on a patch from Pavel Emelyanov Reported-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
74b0b85b |
|
26-Oct-2010 |
Pavel Emelyanov <xemul@parallels.com> |
tunnels: Fix tunnels change rcu protection After making rcu protection for tunnels (ipip, gre, sit and ip6) a bug was introduced into the SIOCCHGTUNNEL code. The tunnel is first unlinked, then addresses change, then it is linked back probably into another bucket. But while changing the parms, the hash table is unlocked to readers and they can lookup the improper tunnel. Respective commits are b7285b79 (ipip: get rid of ipip_lock), 1507850b (gre: get rid of ipgre_lock), 3a43be3c (sit: get rid of ipip6_lock) and 94767632 (ip6tnl: get rid of ip6_tnl_lock). The quick fix is to wait for quiescent state to pass after unlinking, but if it is inappropriate I can invent something better, just let me know. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8723e1b4 |
|
18-Oct-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
inet: RCU changes in inetdev_by_index() Convert inetdev_by_index() to not increment in_dev refcount. Callers hold RCU or RTNL, and should not decrement in_dev refcount. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
caf586e5 |
|
30-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: add a core netdev->rx_dropped counter In various situations, a device provides a packet to our stack and we drop it before it enters protocol stack : - softnet backlog full (accounted in /proc/net/softnet_stat) - bad vlan tag (not accounted) - unknown/unregistered protocol (not accounted) We can handle a per-device counter of such dropped frames at core level, and automatically adds it to the device provided stats (rx_dropped), so that standard tools can be used (ifconfig, ip link, cat /proc/net/dev) This is a generalization of commit 8990f468a (net: rx_dropped accounting), thus reverting it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6d0722a2 |
|
30-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
ip_gre: comments change HARD_TX_LOCK no longer protects tunnels from dead loops, but xmit_recursion percpu counter. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b790e01a |
|
27-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
ip_gre: lockless xmit GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX Note: If tunnels are created with the "oseq" option, LLTX is not enabled : Even using an atomic_t o_seq, we would increase chance for packets being out of order at receiver. Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending 10000000 UDP frames via one gre tunnel (size:200 bytes per frame) Before patch : real 3m0.094s user 0m9.365s sys 47m50.103s After patch: real 0m29.756s user 0m11.097s sys 7m33.012s Last problem to solve is the contention on dst : 38660.00 21.4% __ip_route_output_key vmlinux 20786.00 11.5% dst_release vmlinux 14191.00 7.8% __xfrm_lookup vmlinux 12410.00 6.9% ip_finish_output vmlinux 4540.00 2.5% ip_push_pending_frames vmlinux 4427.00 2.4% ip_append_data vmlinux 4265.00 2.4% __alloc_skb vmlinux 4140.00 2.3% __ip_local_out vmlinux 3991.00 2.2% dev_queue_xmit vmlinux Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e985aad7 |
|
26-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
ip_gre: percpu stats accounting Le lundi 27 septembre 2010 à 14:29 +0100, Ben Hutchings a écrit : > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > > index 5d6ddcb..de39b22 100644 > > --- a/net/ipv4/ip_gre.c > > +++ b/net/ipv4/ip_gre.c > [...] > > @@ -377,7 +405,7 @@ static struct ip_tunnel *ipgre_tunnel_locate(struct net *net, > > if (parms->name[0]) > > strlcpy(name, parms->name, IFNAMSIZ); > > else > > - sprintf(name, "gre%%d"); > > + strcpy(name, "gre%d"); > > > > dev = alloc_netdev(sizeof(*t), name, ipgre_tunnel_setup); > > if (!dev) > [...] > > This is a valid fix, but doesn't belong in this patch! > Sorry ? It was not a fix, but at most a cleanup ;) Anyway I forgot the gretap case... [PATCH 2/4 v2] ip_gre: percpu stats accounting Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets. Other seldom used fields are kept in netdev->stats structure, possibly unsafe. This is a preliminary work to support lockless transmit path, and correct RX stats, that are already unsafe. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a02cec21 |
|
22-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: return operator cleanup Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8990f468 |
|
19-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: rx_dropped accounting Under load, netif_rx() can drop incoming packets but administrators dont have a chance to spot which device needs some tuning (RPS activation for example) This patch adds rx_dropped accounting in vlans and tunnels. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
842c74bf |
|
20-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
ip_gre: CONFIG_IPV6_MODULE support ipv6 can be a module, we should test CONFIG_IPV6 and CONFIG_IPV6_MODULE to enable ipv6 bits in ip_gre. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1507850b |
|
15-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
gre: get rid of ipgre_lock As RTNL is held while doing tunnels inserts and deletes, we can remove ipgre_lock spinlock. My initial RCU conversion was conservative and converted the rwlock to spinlock, with no RTNL requirement. Use appropriate rcu annotations and modern lockdep checks as well. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
00959ade |
|
22-Aug-2010 |
Dmitry Kozlov <xeb@mail.ru> |
PPTP: PPP over IPv4 (Point-to-Point Tunneling Protocol) PPP: introduce "pptp" module which implements point-to-point tunneling protocol using pppox framework NET: introduce the "gre" module for demultiplexing GRE packets on version criteria (required to pptp and ip_gre may coexists) NET: ip_gre: update to use the "gre" module This patch introduces then pptp support to the linux kernel which dramatically speeds up pptp vpn connections and decreases cpu usage in comparison of existing user-space implementation (poptop/pptpclient). There is accel-pptp project (https://sourceforge.net/projects/accel-pptp/) to utilize this module, it contains plugin for pppd to use pptp in client-mode and modified pptpd (poptop) to build high-performance pptp NAS. There was many changes from initial submitted patch, most important are: 1. using rcu instead of read-write locks 2. using static bitmap instead of dynamically allocated 3. using vmalloc for memory allocation instead of BITS_PER_LONG + __get_free_pages 4. fixed many coding style issues Thanks to Eric Dumazet. Signed-off-by: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dd4ba83d |
|
08-Jul-2010 |
Stephen Hemminger <shemminger@vyatta.com> |
gre: propagate ipv6 transport class This patch makes IPV6 over IPv4 GRE tunnel propagate the transport class field from the underlying IPV6 header to the IPV4 Type Of Service field. Without the patch, all IPV6 packets in tunnel look the same to QoS. This assumes that IPV6 transport class is exactly the same as IPv4 TOS. Not sure if that is always the case? Maybe need to mask off some bits. The mask and shift to get tclass is copied from ipv6/datagram.c Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d8d1f30b |
|
11-Jun-2010 |
Changli Gao <xiaosuo@gmail.com> |
net-next: remove useless union keyword remove useless union keyword in rtable, rt6_info and dn_route. Since there is only one member in a union, the union keyword isn't useful. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3fa21e07 |
|
18-May-2010 |
Joe Perches <joe@perches.com> |
net: Remove unnecessary returns from void function()s This patch removes from net/ (but not any netfilter files) all the unnecessary return; statements that precede the last closing brace of void functions. It does not remove the returns that are immediately preceded by a label as gcc doesn't like that. Done via: $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \ xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }' Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d19d56dd |
|
17-May-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: Introduce skb_tunnel_rx() helper skb rxhash should be cleared when a skb is handled by a tunnel before being delivered again, so that correct packet steering can take place. There are other cleanups and accounting that we can factorize in a new helper, skb_tunnel_rx() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
243aad83 |
|
19-Mar-2010 |
Timo Teräs <timo.teras@iki.fi> |
ip_gre: include route header_len in max_headroom calculation Taking route's header_len into account, and updating gre device needed_headroom will give better hints on upper bound of required headroom. This is useful if the gre traffic is xfrm'ed. Signed-off-by: Timo Teras <timo.teras@iki.fi> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6d55cb91 |
|
02-Mar-2010 |
Timo Teräs <timo.teras@iki.fi> |
gre: fix hard header destination address checking ipgre_header() can be called with zero daddr when the gre device is configured as multipoint tunnel and still has the NOARP flag set (which is typically cleared by the userspace arp daemon). If the NOARP packets are not dropped, ipgre_tunnel_xmit() will take rt->rt_gateway (= NBMA IP) and use that for route look up (and may lead to bogus xfrm acquires). The multicast address check is removed as sending to multicast group should be ok. In fact, if gre device has a multicast address as destination ipgre_header is always called with multicast address. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ffe533c |
|
18-Feb-2010 |
Alexey Dobriyan <adobriyan@gmail.com> |
ipv6: drop unused "dev" arg of icmpv6_send() Dunno, what was the idea, it wasn't used for a long time. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c2892f02 |
|
16-Feb-2010 |
Alexey Dobriyan <adobriyan@gmail.com> |
gre: fix netns vs proto registration ordering GRE protocol receive hook can be called right after protocol addition is done. If netns stuff is not yet initialized, we're going to oops in net_generic(). This is remotely oopsable if ip_gre is compiled as module and packet comes at unfortunate moment of module loading. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2c8c1e72 |
|
16-Jan-2010 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: spread __net_init, __net_exit __net_init/__net_exit are apparently not going away, so use them to full extent. In some cases __net_init was removed, because it was called from __net_exit code. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cfb8fbf2 |
|
29-Nov-2009 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Simplify ip_gre pernet operations. Take advantage of the new pernet automatic storage management, and stop using compatibility network namespace functions. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f99189b1 |
|
17-Nov-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
netns: net_identifiers should be read_mostly Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
81adee47 |
|
08-Nov-2009 |
Eric W. Biederman <ebiederm@aristanetworks.com> |
net: Support specifying the network namespace upon device creation. There is no good reason to not support userspace specifying the network namespace during device creation, and it makes it easier to create a network device and pass it to a child network namespace with a well known name. We have to be careful to ensure that the target network namespace for the new device exists through the life of the call. To keep that logic clear I have factored out the network namespace grabbing logic into rtnl_link_get_net. In addtion we need to continue to pass the source network namespace to the rtnl_link_ops.newlink method so that we can find the base device source network namespace. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
|
#
2e9526b3 |
|
29-Oct-2009 |
Herbert Xu <herbert@gondor.apana.org.au> |
gre: Fix dev_addr clobbering for gretap Nathan Neulinger noticed that gretap devices get their MAC address from the local IP address, which results in invalid MAC addresses half of the time. This is because gretap is still using the tunnel netdev ops rather than the correct tap netdev ops struct. This patch also fixes changelink to not clobber the MAC address for the gretap case. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Tested-by: Nathan Neulinger <nneul@mst.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eef6dd65 |
|
27-Oct-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
gre: Optimize multiple unregistration Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d5b2c08 |
|
23-Oct-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
gre: convert hash tables locking to RCU GRE tunnels use one rwlock to protect their hash tables. This locking scheme can be converted to RCU for free, since netdevice already must wait for a RCU grace period at dismantle time. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0bfbedb1 |
|
05-Oct-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
tunnels: Optimize tx path We currently dirty a cache line to update tunnel device stats (tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets counters that already are present in cpu cache, in the cache line shared with txq->_xmit_lock This patch extends IPTUNNEL_XMIT() macro to use txq pointer provided by the caller. Also &tunnel->dev->stats can be replaced by &dev->stats Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a43912ab |
|
23-Sep-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
tunnel: eliminate recursion field It seems recursion field from "struct ip_tunnel" is not anymore needed. recursion prevention is done at the upper level (in dev_queue_xmit()), since we use HARD_TX_LOCK protection for tunnels. This avoids a cache line ping pong on "struct ip_tunnel" : This structure should be now mostly read on xmit and receive paths. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
32613090 |
|
13-Sep-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: constify struct net_protocol Remove long removed "inet_protocol_base" declaration. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6fef4c0c |
|
31-Aug-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
netdev: convert pseudo-devices to netdev_tx_t Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8cdb0456 |
|
14-Aug-2009 |
Tom Goff <thomas.goff@boeing.com> |
gre: Fix MTU calculation for bound GRE tunnels The GRE header length should be subtracted when the tunnel MTU is calculated. This just corrects for the associativity change introduced by commit 42aa916265d740d66ac1f17290366e9494c884c2 ("gre: Move MTU setting out of ipgre_tunnel_bind_dev"). Signed-off-by: Tom Goff <thomas.goff@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee686ca9 |
|
14-Jul-2009 |
Andreas Jaggi <aj@open.ch> |
gre: fix ToS/DiffServ inherit bug Fixes two bugs: - ToS/DiffServ inheritance was unintentionally activated when using impair fixed ToS values - ECN bit was lost during ToS/DiffServ inheritance Signed-off-by: Andreas Jaggi <aj@open.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ed10654 |
|
23-Jun-2009 |
Patrick McHardy <kaber@trash.net> |
net: use NETDEV_TX_OK instead of 0 in ndo_start_xmit() functions This patch is the result of an automatic spatch transformation to convert all ndo_start_xmit() return values of 0 to NETDEV_TX_OK. Some occurences are missed by the automatic conversion, those will be handled in a seperate patch. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
adf30907 |
|
01-Jun-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: skb->dst accessors Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
511c3f92 |
|
01-Jun-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: skb->rtable accessor Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb Delete skb->rtable field Setting rtable is not allowed, just set dst instead as rtable is an alias. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
108bfa89 |
|
28-May-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: unset IFF_XMIT_DST_RELEASE in ipgre_tunnel_setup() ipgre_tunnel_xmit() might need skb->dst, so tell dev_hard_start_xmit() to no release it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
da6185d8 |
|
25-Feb-2009 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
gre: used time_before for comparing jiffies The functions time_before is more robust for comparing jiffies against other values. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
afcf1242 |
|
26-Jan-2009 |
Timo Teras <timo.teras@iki.fi> |
gre: optimize hash lookup Instead of keeping candidate tunnel device from all categories, keep only one candidate with best score. This optimizes stack usage and speeds up exit code. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
749c10f9 |
|
19-Jan-2009 |
Timo Teras <timo.teras@iki.fi> |
gre: strict physical device binding Check the device on receive path and allow otherwise identical devices as long as the physical device differs. This is useful for NBMA tunnels, where you want to use different gre IP for each public IP available via different physical devices. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be77e593 |
|
23-Nov-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: fix tunnels in netns after ndo_ changes dev_net_set() should be the very first thing after alloc_netdev(). "ndo_" changes turned simple assignment (which is OK to do before netns assignment) into quite non-trivial operation (which is not OK, init_net was used). This leads to incomplete initialisation of tunnel device in netns. BUG: unable to handle kernel NULL pointer dereference at 00000004 IP: [<c02efdb5>] ip6_tnl_exit_net+0x37/0x4f *pde = 00000000 Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC last sysfs file: /sys/class/net/lo/operstate Pid: 10, comm: netns Not tainted (2.6.28-rc6 #1) EIP: 0060:[<c02efdb5>] EFLAGS: 00010246 CPU: 0 EIP is at ip6_tnl_exit_net+0x37/0x4f EAX: 00000000 EBX: 00000020 ECX: 00000000 EDX: 00000003 ESI: c5caef30 EDI: c782bbe8 EBP: c7909f50 ESP: c7909f48 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 Process netns (pid: 10, ti=c7908000 task=c7905780 task.ti=c7908000) Stack: c03e75e0 c7390bc8 c7909f60 c0245448 c7390bd8 c7390bf0 c7909fa8 c012577a 00000000 00000002 00000000 c0125736 c782bbe8 c7909f90 c0308fe3 c782bc04 c7390bd4 c0245406 c084b718 c04f0770 c03ad785 c782bbe8 c782bc04 c782bc0c Call Trace: [<c0245448>] ? cleanup_net+0x42/0x82 [<c012577a>] ? run_workqueue+0xd6/0x1ae [<c0125736>] ? run_workqueue+0x92/0x1ae [<c0308fe3>] ? schedule+0x275/0x285 [<c0245406>] ? cleanup_net+0x0/0x82 [<c0125ae1>] ? worker_thread+0x81/0x8d [<c0128344>] ? autoremove_wake_function+0x0/0x33 [<c0125a60>] ? worker_thread+0x0/0x8d [<c012815c>] ? kthread+0x39/0x5e [<c0128123>] ? kthread+0x0/0x5e [<c0103b9f>] ? kernel_thread_helper+0x7/0x10 Code: db e8 05 ff ff ff 89 c6 e8 dc 04 f6 ff eb 08 8b 40 04 e8 38 89 f5 ff 8b 44 9e 04 85 c0 75 f0 43 83 fb 20 75 f2 8b 86 84 00 00 00 <8b> 40 04 e8 1c 89 f5 ff e8 98 04 f6 ff 89 f0 e8 f8 63 e6 ff 5b EIP: [<c02efdb5>] ip6_tnl_exit_net+0x37/0x4f SS:ESP 0068:c7909f48 ---[ end trace 6c2f2328fccd3e0c ]--- Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b8c26a33 |
|
20-Nov-2008 |
Stephen Hemminger <shemminger@vyatta.com> |
ipgre: convert to netdevice_ops Convert ipgre tunnel to netdevice ops. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ed2533e |
|
03-Nov-2008 |
Jianjun Kong <jianjun@zeuux.org> |
net: clean up net/ipv4/fib_frontend.c fib_hash.c ip_gre.c Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7bb82d92 |
|
11-Oct-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
gre: Initialise rtnl_link tunnel parameters properly Brown paper bag error of calling memset with sizeof(p) instead of sizeof(*p). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d74f8ba |
|
10-Oct-2008 |
Patrick McHardy <kaber@trash.net> |
gre: minor cleanups in netlink interface - use typeful helpers for IFLA_GRE_LOCAL/IFLA_GRE_REMOTE - replace magic value by FIELD_SIZEOF - use MODULE_ALIAS_RTNL_LINK macro Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ba9e64b1 |
|
10-Oct-2008 |
Patrick McHardy <kaber@trash.net> |
gre: fix copy and paste error The flags are dumped twice, the keys not at all. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
64194c31 |
|
09-Oct-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
inet: Make tunnel RX/TX byte counters more consistent This patch makes the RX/TX byte counters for IPIP, GRE and SIT more consistent. Previously we included the external IP headers on the way out but not when the packet is inbound. The new scheme is to count payload only in both directions. For IPIP and SIT this simply means the exclusion of the external IP header. For GRE this means that we exclude the GRE header as well. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e1a80002 |
|
09-Oct-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
gre: Add Transparent Ethernet Bridging This patch adds support for Ethernet over GRE encapsulation. This is exposed to user-space with a new link type of "gretap" instead of "gre". It will create an ARPHRD_ETHER device in lieu of the usual ARPHRD_IPGRE. Note that to preserver backwards compatibility all Transparent Ethernet Bridging packets are passed to an ARPHRD_IPGRE tunnel if its key matches and there is no ARPHRD_ETHER device whose key matches more closely. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c19e654d |
|
09-Oct-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
gre: Add netlink interface This patch adds a netlink interface that will eventually displace the existing ioctl interface. It utilises the elegant rtnl_link_ops mechanism. This also means that user-space no longer needs to rely on the tunnel interface being of type GRE to identify GRE tunnels. The identification can now occur using rtnl_link_ops. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
42aa9162 |
|
09-Oct-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
gre: Move MTU setting out of ipgre_tunnel_bind_dev This patch moves the dev->mtu setting out of ipgre_tunnel_bind_dev. This is in prepartion of using rtnl_link where we'll need to make the MTU setting conditional on whether the user has supplied an MTU. This also requires the move of the ipgre_tunnel_bind_dev call out of the dev->init function so that we can access the user parameters later. This patch also adds a check to prevent setting the MTU below the minimum of 68. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c95b819a |
|
09-Oct-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
gre: Use needed_headroom Now that we have dev->needed_headroom, we can use it instead of having a bogus dev->hard_header_len. This also allows us to include dev->hard_header_len in the MTU computation so that when we do have a meaningful hard_harder_len in future it is included automatically in figuring out the MTU. Incidentally, this fixes a bug where we ignored the needed_headroom field of the underlying device in calculating our own hard_header_len. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
071f92d0 |
|
21-May-2008 |
Rami Rosen <ramirose@gmail.com> |
net: The world is not perfect patch. Unless there will be any objection here, I suggest consider the following patch which simply removes the code for the -DI_WISH_WORLD_WERE_PERFECT in the three methods which use it. The compilation errors we get when using -DI_WISH_WORLD_WERE_PERFECT show that this code was not built and not used for really a long time. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
addd68eb |
|
21-May-2008 |
Pavel Emelyanov <xemul@openvz.org> |
ipgre: Use on-device stats instead of private ones. Just switch from tunnel->stat to tunnel->dev->stats. The ip_tunnel->stat member itself will be removed after I fix its other users (very soon). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f96c148f |
|
16-Apr-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[GRE]: Allow for IPPROTO_GRE protocol in namespaces. This one was also disabled by default for sanity. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0b67eceb |
|
16-Apr-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[GRE]: Allow to create IPGRE tunnels in net namespaces. I.e. set the proper net and mark as NETNS_LOCAL. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
96635522 |
|
16-Apr-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[GRE]: Use proper net in routing calls. As for the IPIP tunnel, there are some ip_route_output_key() calls in there that require a proper net so give one to them. And a proper net for the __get_dev_by_index hanging around. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb8ce741 |
|
16-Apr-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[GRE]: Make tunnels hashes per-net. Very similar to what was done for the IPIP code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7daa0004 |
|
16-Apr-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[GRE]: Make the fallback tunnel device per-net. Everything is prepared for this change now. Create on in init callback, use it over the code and destroy on net exit. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3b4667f3 |
|
16-Apr-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[GRE]: Use proper net in hash-lookup functions. This is the part#2 of the patch #2 - get the proper net for these functions. This change in a separate patch in order not to get lost in a large previous patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f57e7d5a |
|
16-Apr-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[GRE]: Add net/gre_net argument to some functions. The fallback device and hashes are to become per-net, but many code doesn't have anything to get the struct net pointer from. So pass the proper net there with an extra argument. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
59a4c759 |
|
16-Apr-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[GRE]: Introduce empty ipgre_net structure and net init/exit ops. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c346dca1 |
|
25-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
ee6b9673 |
|
05-Mar-2008 |
Eric Dumazet <dada1@cosmosbay.com> |
[IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts (Anonymous) unions can help us to avoid ugly casts. A common cast it the (struct rtable *)skb->dst one. Defining an union like : union { struct dst_entry *dst; struct rtable *rtable; }; permits to use skb->rtable in place. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b37d428b |
|
27-Feb-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Don't create tunnels with '%' in name. Four tunnel drivers (ip_gre, ipip, ip6_tunnel and sit) can receive a pre-defined name for a device from the userspace. Since these drivers call the register_netdevice() (rtnl_lock, is held), which does _not_ generate the device's name, this name may contain a '%' character. Not sure how bad is this to have a device with a '%' in its name, but all the other places either use the register_netdev(), which call the dev_alloc_name(), or explicitly call the dev_alloc_name() before registering, i.e. do not allow for such names. This had to be prior to the commit 34cc7b, but I forgot to number the patches and this one got lost, sorry. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
34cc7ba6 |
|
23-Feb-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly. Use the added dev_alloc_name() call to create tunnel device name, rather than iterate in a hand-made loop with an artificial limit. Thanks Patrick for noticing this. [ The way this works is, when the device is actually registered, the generic code noticed the '%' in the name and invokes dev_alloc_name() to fully resolve the name. -DaveM ] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f206351a |
|
22-Jan-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Add namespace parameter to ip_route_output_key. Needed to propagate it down to the ip_route_output_flow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7fee0ca2 |
|
21-Jan-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Add netns parameter to inetdev_by_index. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f97c1e0c |
|
16-Dec-2007 |
Joe Perches <joe@perches.com> |
[IPV4] net/ipv4: Use ipv4_is_<type> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee34c1eb |
|
13-Dec-2007 |
Michal Schmidt <mschmidt@redhat.com> |
[IP_GRE]: Rebinding of GRE tunnels to other interfaces This is similar to the change already done for IPIP tunnels. Once created, a GRE tunnel can't be bound to another device. To reproduce: # create a tunnel: ip tunnel add tunneltest0 mode gre remote 10.0.0.1 dev eth0 # try to change the bounding device from eth0 to eth1: ip tunnel change tunneltest0 dev eth1 # show the result: ip tunnel show tunneltest0 tunneltest0: gre/ip remote 10.0.0.1 local any dev eth0 ttl inherit Notice the bound device has not changed from eth0 to eth1. This patch fixes it. When changing the binding, it also recalculates the MTU according to the new bound device's MTU. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1d069167 |
|
20-Dec-2007 |
Timo Teras <timo.teras@iki.fi> |
[IPV4] ip_gre: set mac_header correctly in receive path mac_header update in ipgre_recv() was incorrectly changed to skb_reset_mac_header() when it was introduced. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c2636b4d |
|
23-Oct-2007 |
Chuck Lever <chuck.lever@oracle.com> |
[NET]: Treat the sign of the result of skb_headroom() consistently In some places, the result of skb_headroom() is compared to an unsigned integer, and in others, the result is compared to a signed integer. Make the comparisons consistent and correct. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6a5f44d7 |
|
23-Oct-2007 |
Timo Teras <timo.teras@iki.fi> |
[IPV4] ip_gre: sendto/recvfrom NBMA address When GRE tunnel is in NBMA mode, this patch allows an application to use a PF_PACKET socket to: - send a packet to specific NBMA address with sendto() - use recvfrom() to receive packet and check which NBMA address it came from This is required to implement properly NHRP over GRE tunnel. Signed-off-by: Timo Teras <timo.teras@iki.fi> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3b04ddde |
|
09-Oct-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[NET]: Move hardware header operations out of netdevice. Since hardware header operations are part of the protocol class not the device instance, make them into a separate object and save memory. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
10d024c1 |
|
17-Sep-2007 |
Ralf Baechle <ralf@linux-mips.org> |
[NET]: Nuke SET_MODULE_OWNER macro. It's been a useless no-op for long enough in 2.6 so I figured it's time to remove it. The number of people that could object because they're maintaining unified 2.4 and 2.6 drivers is probably rather small. [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
881d966b |
|
17-Sep-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cfbba49d |
|
09-Jul-2007 |
Patrick McHardy <kaber@trash.net> |
[NET]: Avoid copying writable clones in tunnel drivers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5056a1ef |
|
24-Apr-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV4] IP_GRE: Unify code path to get hash array index. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
b0e380b1 |
|
10-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: unions of just one member don't get anything done, kill them Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and skb->mac to skb->mac_header, to match the names of the associated helpers (skb[_[re]set]_{transport,network,mac}_header). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9c70220b |
|
25-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce skb_transport_header(skb) For the places where we need a pointer to the transport header, it is still legal to touch skb->h.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
88c7664f |
|
13-Mar-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce icmp_hdr(), remove skb->h.icmph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0660e03f |
|
25-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eddc9ec5 |
|
20-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4209fb60 |
|
10-Mar-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Use skb_reset_network_header where the return of __pskb_pull was being used It returns skb->data, so we can just use skb_reset_network_header after it. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e2d1bca7 |
|
10-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Use skb_reset_network_header in skb_push cases skb_push updates and returns skb->data, so we can just call skb_reset_network_header after the call to skb_push. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c1d2bbe1 |
|
10-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce skb_reset_network_header(skb) For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
459a98ed |
|
19-Mar-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce skb_reset_mac_header(skb) For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cd354f1a |
|
14-Feb-2007 |
Tim Schmielau <tim@physik3.uni-rostock.de> |
[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e905a9ed |
|
09-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] IPV4: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
22f8cde5 |
|
07-Feb-2007 |
Stephen Hemminger <shemminger@osdl.org> |
[NET]: unregister_netdevice as void There was no real useful information from the unregister_netdevice() return code, the only error occurred in a situation that was a driver bug. So change it to a void function. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5f92a738 |
|
14-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[NET]: Annotate callers of the reset of checksum.h stuff. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d3bc23e7 |
|
14-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[NET]: Annotate callers of csum_fold() in net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d5a0a1e3 |
|
08-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[IPV4]: encapsulation annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
496c98df |
|
10-Oct-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET]: Use hton{l,s}() for non-initializers. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c55e2f49 |
|
19-Sep-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[IPV4]: ipip and ip_gre encapsulation bugs Handling of ipip and ip_gre ICMP error relaying is b0rken; it accesses 8bit field + 3 reserved octets as host-endian 32bit, does comparison, subtraction and stuffs the result back. That breaks on big-endian. Fixed, made endian-clean. [ Note that this effected code is permanently commented out with and ifdef, so this error couldn't actually cause problems for anyone. -DaveM ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
84fa7933 |
|
29-Aug-2006 |
Patrick McHardy <kaber@trash.net> |
[NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose checksum still needs to be completed) and CHECKSUM_COMPLETE (for incoming packets, device supplied full checksum). Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d9c5a32 |
|
21-Jul-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV4]: Get rid of redundant IPCB->opts initialisation Now that we always zero the IPCB->opts in ip_rcv, it is no longer necessary to do so before calling netif_rx for tunneled packets. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ab3d562 |
|
30-Jun-2006 |
Jörn Engel <joern@wohnheim.fh-wedel.de> |
Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
45af08be |
|
05-Apr-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[INET]: Use port unreachable instead of proto for tunnels This patch changes GRE and SIT to generate port unreachable instead of protocol unreachable errors when we can't find a matching tunnel for a packet. This removes the ambiguity as to whether the error is caused by no tunnel being found or by the lack of support for the given tunnel type. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
48d5cad8 |
|
15-Feb-2006 |
Patrick McHardy <kaber@trash.net> |
[XFRM]: Fix SNAT-related crash in xfrm4_output_finish When a packet matching an IPsec policy is SNATed so it doesn't match any policy anymore it looses its xfrm bundle, which makes xfrm4_output_finish crash because of a NULL pointer dereference. This patch directs these packets to the original output path instead. Since the packets have already passed the POST_ROUTING hook, but need to start at the beginning of the original output path which includes another POST_ROUTING invocation, a flag is added to the IPCB to indicate that the packet was rerouted and doesn't need to pass the POST_ROUTING hook again. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4fc268d2 |
|
11-Jan-2006 |
Randy Dunlap <rdunlap@infradead.org> |
[PATCH] capable/capability.h (net/) net: Use <linux/capability.h> where capable() is used. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
2941a486 |
|
08-Jan-2006 |
Patrick McHardy <kaber@trash.net> |
[NET]: Convert net/{ipv4,ipv6,sched} to netdev_priv Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3e3850e9 |
|
07-Jan-2006 |
Patrick McHardy <kaber@trash.net> |
[NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder ip_route_me_harder doesn't use the port numbers of the xfrm lookup and uses ip_route_input for non-local addresses which doesn't do a xfrm lookup, ip6_route_me_harder doesn't do a xfrm lookup at all. Use xfrm_decode_session and do the lookup manually, make sure both only do the lookup if the packet hasn't been transformed already. Makeing sure the lookup only happens once needs a new field in the IP6CB, which exceeds the size of skb->cb. The size of skb->cb is increased to 48b. Apparently the IPv6 mobile extensions need some more room anyway. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8cdfab8a |
|
07-Jan-2006 |
Patrick McHardy <kaber@trash.net> |
[IPV4]: reset IPCB flags when neccessary Reset IPSKB_XFRM_TUNNEL_SIZE flags in ipip and ip_gre hard_start_xmit function before the packet reenters IP. This is neccessary so the encapsulated packets are checked not to be oversized in xfrm4_output.c again. Reset all flags in sit when a packet changes its address family. Also remove some obsolete IPSKB flags. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
46f25dff |
|
05-Jan-2006 |
Kris Katterjohn <kjak@users.sourceforge.net> |
[NET]: Change 1500 to ETH_DATA_LEN in some files These patches add the header linux/if_ether.h and change 1500 to ETH_DATA_LEN in some files. Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1542272a |
|
14-Dec-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[GRE]: Fix hardware checksum modification The skb_postpull_rcsum introduced a bug to the checksum modification. Although the length pulled is offset bytes, the origin of the pulling is the GRE header, not the IP header. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4b30b1c6 |
|
29-Nov-2005 |
Adrian Bunk <bunk@stusta.de> |
[IPV4]: make two functions static This patch makes two needlessly global functions static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fb286bb2 |
|
10-Nov-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[NET]: Detect hardware rx checksum faults correctly Here is the patch that introduces the generic skb_checksum_complete which also checks for hardware RX checksum faults. If that happens, it'll call netdev_rx_csum_fault which currently prints out a stack trace with the device name. In future it can turn off RX checksum. I've converted every spot under net/ that does RX checksum checks to use skb_checksum_complete or __skb_checksum_complete with the exceptions of: * Those places where checksums are done bit by bit. These will call netdev_rx_csum_fault directly. * The following have not been completely checked/converted: ipmr ip_vs netfilter dccp This patch is based on patches and suggestions from Stephen Hemminger and David S. Miller. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5ed6399 |
|
03-Oct-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl The following patch renames __in_dev_get() to __in_dev_get_rtnl() and introduces __in_dev_get_rcu() to cover the second case. 1) RCU with refcnt should use in_dev_get(). 2) RCU without refcnt should use __in_dev_get_rcu(). 3) All others must hold RTNL and use __in_dev_get_rtnl(). There is one exception in net/ipv4/route.c which is in fact a pre-existing race condition. I've marked it as such so that we remember to fix it. This patch is based on suggestions and prior work by Suzanne Wood and Paul McKenney. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
db44575f |
|
30-Jul-2005 |
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> |
[NET]: fix oops after tunnel module unload Tunnel modules used to obtain module refcount each time when some tunnel was created, which meaned that tunnel could be unloaded only after all the tunnels are deleted. Since killing old MOD_*_USE_COUNT macros this protection has gone. It is possible to return it back as module_get/put, but it looks more natural and practically useful to force destruction of all the child tunnels on module unload. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|