#
1ac13efd |
|
08-Dec-2023 |
Eric Dumazet <edumazet@google.com> |
ipv6: annotate data-races around np->ucast_oif np->ucast_oif is read locklessly in some contexts. Make all accesses to this field lockless, adding appropriate annotations. This also makes setsockopt( IPV6_UNICAST_IF ) lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d2f011a0 |
|
08-Dec-2023 |
Eric Dumazet <edumazet@google.com> |
ipv6: annotate data-races around np->mcast_oif np->mcast_oif is read locklessly in some contexts. Make all accesses to this field lockless, adding appropriate annotations. This also makes setsockopt( IPV6_MULTICAST_IF ) lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2b1dc628 |
|
04-Oct-2023 |
Florian Westphal <fw@strlen.de> |
xfrm: pass struct net to xfrm_decode_session wrappers Preparation patch, extra arg is not used. No functional changes intended. This is needed to replace the xfrm session decode functions with the flow dissector. skb_flow_dissect() cannot be used as-is, because it attempts to deduce the 'struct net' to use for bpf program fetch from skb->sk or skb->dev, but xfrm code path can see skbs that have neither sk or dev filled in. So either flow dissector needs to try harder, e.g. by also trying skb->dst->dev, or we have to pass the struct net explicitly. Passing the struct net doesn't look too bad to me, most places already have it available or can derive it from the output device. Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/netdev/202309271628.27fd2187-oliver.sang@intel.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
#
1086ca7c |
|
12-Sep-2023 |
Eric Dumazet <edumazet@google.com> |
ipv6: lockless IPV6_DONTFRAG implementation Move np->dontfrag flag to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c899710f |
|
08-Aug-2023 |
Joel Granados <joel.granados@gmail.com> |
networking: Update to register_net_sysctl_sz Move from register_net_sysctl to register_net_sysctl_sz for all the networking related files. Do this while making sure to mirror the NULL assignments with a table_size of zero for the unprivileged users. We need to move to the new function in preparation for when we change SIZE_MAX to ARRAY_SIZE() in the register_net_sysctl macro. Failing to do so would erroneously allow ARRAY_SIZE() to be called on a pointer. We hold off the SIZE_MAX to ARRAY_SIZE change until we have migrated all the relevant net sysctl registering functions to register_net_sysctl_sz in subsequent commits. An additional size function was added to the following files in order to calculate the size of an array that is defined in another file: include/net/ipv6.h net/ipv6/icmp.c net/ipv6/route.c net/ipv6/sysctl_net_ipv6.c Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
5bc67a85 |
|
11-Jul-2023 |
Guillaume Nault <gnault@redhat.com> |
ipv6: Constify the sk parameter of several helper functions. icmpv6_flow_init(), ip6_datagram_flow_key_init() and ip6_mc_hdr() don't need to modify their sk argument. Make that explicit using const. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2aaa8a15 |
|
07-Jul-2023 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
icmp6: Fix null-ptr-deref of ip6_null_entry->rt6i_idev in icmp6_dev(). With some IPv6 Ext Hdr (RPL, SRv6, etc.), we can send a packet that has the link-local address as src and dst IP and will be forwarded to an external IP in the IPv6 Ext Hdr. For example, the script below generates a packet whose src IP is the link-local address and dst is updated to 11::. # for f in $(find /proc/sys/net/ -name *seg6_enabled*); do echo 1 > $f; done # python3 >>> from socket import * >>> from scapy.all import * >>> >>> SRC_ADDR = DST_ADDR = "fe80::5054:ff:fe12:3456" >>> >>> pkt = IPv6(src=SRC_ADDR, dst=DST_ADDR) >>> pkt /= IPv6ExtHdrSegmentRouting(type=4, addresses=["11::", "22::"], segleft=1) >>> >>> sk = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW) >>> sk.sendto(bytes(pkt), (DST_ADDR, 0)) For such a packet, we call ip6_route_input() to look up a route for the next destination in these three functions depending on the header type. * ipv6_rthdr_rcv() * ipv6_rpl_srh_rcv() * ipv6_srh_rcv() If no route is found, ip6_null_entry is set to skb, and the following dst_input(skb) calls ip6_pkt_drop(). Finally, in icmp6_dev(), we dereference skb_rt6_info(skb)->rt6i_idev->dev as the input device is the loopback interface. Then, we have to check if skb_rt6_info(skb)->rt6i_idev is NULL or not to avoid NULL pointer deref for ip6_null_entry. BUG: kernel NULL pointer dereference, address: 0000000000000000 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 157 Comm: python3 Not tainted 6.4.0-11996-gb121d614371c #35 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503) Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01 RSP: 0018:ffffc90000003c70 EFLAGS: 00000286 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18 RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001 R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10 R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0 FS: 00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <IRQ> ip6_pkt_drop (net/ipv6/route.c:4513) ipv6_rthdr_rcv (net/ipv6/exthdrs.c:640 net/ipv6/exthdrs.c:686) ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5)) ip6_input_finish (./include/linux/rcupdate.h:781 net/ipv6/ip6_input.c:483) __netif_receive_skb_one_core (net/core/dev.c:5455) process_backlog (./include/linux/rcupdate.h:781 net/core/dev.c:5895) __napi_poll (net/core/dev.c:6460) net_rx_action (net/core/dev.c:6529 net/core/dev.c:6660) __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554) do_softirq (kernel/softirq.c:454 kernel/softirq.c:441) </IRQ> <TASK> __local_bh_enable_ip (kernel/softirq.c:381) __dev_queue_xmit (net/core/dev.c:4231) ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:135) rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914) sock_sendmsg (net/socket.c:725 net/socket.c:748) __sys_sendto (net/socket.c:2134) __x64_sys_sendto (net/socket.c:2146 net/socket.c:2142 net/socket.c:2142) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) RIP: 0033:0x7f9dc751baea Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffe98712c38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007ffe98712cf8 RCX: 00007f9dc751baea RDX: 0000000000000060 RSI: 00007f9dc6460b90 RDI: 0000000000000003 RBP: 00007f9dc56e8be0 R08: 00007ffe98712d70 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f9dc6af5d1b </TASK> Modules linked in: CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503) Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01 RSP: 0018:ffffc90000003c70 EFLAGS: 00000286 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18 RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001 R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10 R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0 FS: 00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") Reported-by: Wang Yufen <wangyufen@huawei.com> Closes: https://lore.kernel.org/netdev/c41403a9-c2f6-3b7e-0c96-e1901e605cd0@huawei.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7ab75456 |
|
18-Apr-2023 |
Mahesh Bandewar <maheshb@google.com> |
ipv6: add icmpv6_error_anycast_as_unicast for ICMPv6 ICMPv6 error packets are not sent to the anycast destinations and this prevents things like traceroute from working. So create a setting similar to ECHO when dealing with Anycast sources (icmpv6_echo_ignore_anycast). Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Maciej Żenczykowski <maze@google.com> Link: https://lore.kernel.org/r/20230419013238.2691167-1-maheshb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ac03694b |
|
16-Feb-2023 |
Eric Dumazet <edumazet@google.com> |
ipv6: icmp6: add drop reason support to icmpv6_echo_reply() Change icmpv6_echo_reply() to return a drop reason. For the moment, return NOT_SPECIFIED or SKB_CONSUMED. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
545dbcd1 |
|
10-Feb-2023 |
Eric Dumazet <edumazet@google.com> |
ipv6: icmp6: add drop reason support to ndisc_rcv() Creates three new drop reasons: SKB_DROP_REASON_IPV6_NDISC_FRAG: invalid frag (suppress_frag_ndisc). SKB_DROP_REASON_IPV6_NDISC_HOP_LIMIT: invalid hop limit. SKB_DROP_REASON_IPV6_NDISC_BAD_CODE: invalid NDISC icmp6 code. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
30c89bad |
|
10-Feb-2023 |
Eric Dumazet <edumazet@google.com> |
ipv6: icmp6: add drop reason support to icmpv6_notify() Accurately reports what happened in icmpv6_notify() when handling a packet. This makes use of the new IPV6_BAD_EXTHDR drop reason. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
bc617613 |
|
30-Jan-2023 |
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> |
ipv6: ICMPV6: Use swap() instead of open coding it Swap is a function interface that provides exchange function. To avoid code duplication, we can use swap function. ./net/ipv6/icmp.c:344:25-26: WARNING opportunity for swap(). Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3896 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230131063456.76302-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
d0941130 |
|
24-Jan-2023 |
Jamie Bainbridge <jamie.bainbridge@gmail.com> |
icmp: Add counters for rate limits There are multiple ICMP rate limiting mechanisms: * Global limits: net.ipv4.icmp_msgs_burst/icmp_msgs_per_sec * v4 per-host limits: net.ipv4.icmp_ratelimit/ratemask * v6 per-host limits: net.ipv6.icmp_ratelimit/ratemask However, when ICMP output is limited, there is no way to tell which limit has been hit or even if the limits are responsible for the lack of ICMP output. Add counters for each of the cases above. As we are within local_bh_disable(), use the __INC stats variant. Example output: # nstat -sz "*RateLimit*" IcmpOutRateLimitGlobal 134 0.0 IcmpOutRateLimitHost 770 0.0 Icmp6OutRateLimitHost 84 0.0 Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Suggested-by: Abhishek Rawal <rawal.abhishek92@gmail.com> Link: https://lore.kernel.org/r/273b32241e6b7fdc5c609e6f5ebc68caf3994342.1674605770.git.jamie.bainbridge@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
4a2f7083 |
|
11-Jul-2022 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
icmp: Fix data-races around sysctl_icmp_echo_enable_probe. While reading sysctl_icmp_echo_enable_probe, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: d329ea5bd884 ("icmp: add response to RFC 8335 PROBE messages") Fixes: 1fd07f33c3ea ("ipv6: ICMPV6: add response to ICMPV6 RFC 8335 PROBE messages") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ad6d548 |
|
13-Apr-2022 |
Menglong Dong <imagedong@tencent.com> |
net: icmp: introduce function icmpv6_param_prob_reason() In order to add the skb drop reasons support to icmpv6_param_prob(), introduce the function icmpv6_param_prob_reason() and make icmpv6_param_prob() an inline call to it. This new function will be used in the following patches. Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b384c95a |
|
07-Apr-2022 |
Menglong Dong <imagedong@tencent.com> |
net: icmp: add skb drop reasons to icmp protocol Replace kfree_skb() used in icmp_rcv() and icmpv6_rcv() with kfree_skb_reason(). In order to get the reasons of the skb drops after icmp message handle, we change the return type of 'handler()' in 'struct icmp_control' from 'bool' to 'enum skb_drop_reason'. This may change its original intention, as 'false' means failure, but 'SKB_NOT_DROPPED_YET' means success now. Therefore, all 'handler' and the call of them need to be handled. Following 'handler' functions are involved: icmp_unreach() icmp_redirect() icmp_echo() icmp_timestamp() icmp_discard() And following new drop reasons are added: SKB_DROP_REASON_ICMP_CSUM SKB_DROP_REASON_INVALID_PROTO The reason 'INVALID_PROTO' is introduced for the case that the packet doesn't follow rfc 1122 and is dropped. This is not a common case, and I believe we can locate the problem from the data in the packet. For now, this 'INVALID_PROTO' is used for the icmp broadcasts with wrong types. Maybe there should be a document file for these reasons. For example, list all the case that causes the 'UNHANDLED_PROTO' and 'INVALID_PROTO' drop reason. Therefore, users can locate their problems according to the document. Reviewed-by: Hao Peng <flyingpeng@tencent.com> Reviewed-by: Jiang Biao <benbjiang@tencent.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6a17b961 |
|
24-Jan-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6: do not use per netns icmp sockets Back in linux-2.6.25 (commit 98c6d1b261e7 "[NETNS]: Make icmpv6_sk per namespace.", we added private per-cpu/per-netns ipv6 icmp sockets. This adds memory and cpu costs, which do not seem needed. Now typical servers have 256 or more cores, this adds considerable tax to netns users. icmp sockets are used from BH context, are not receiving packets, and do not store any persistent state but the 'struct net' pointer. icmpv6_xmit_lock() already makes sure to lock the chosen per-cpu socket. This patch has a considerable impact on the number of netns that the worker thread in cleanup_net() can dismantle per second, because ip6mr_sk_done() is no longer called, meaning we no longer acquire the rtnl mutex, competing with other threads adding new netns. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e4129440 |
|
03-Jan-2022 |
Andrew Lunn <andrew@lunn.ch> |
icmp: ICMPV6: Examine invoking packet for Segment Route Headers. RFC8754 says: ICMP error packets generated within the SR domain are sent to source nodes within the SR domain. The invoking packet in the ICMP error message may contain an SRH. Since the destination address of a packet with an SRH changes as each segment is processed, it may not be the destination used by the socket or application that generated the invoking packet. For the source of an invoking packet to process the ICMP error message, the ultimate destination address of the IPv6 header may be required. The following logic is used to determine the destination address for use by protocol-error handlers. * Walk all extension headers of the invoking IPv6 packet to the routing extension header preceding the upper-layer header. - If routing header is type 4 Segment Routing Header (SRH) o The SID at Segment List[0] may be used as the destination address of the invoking packet. Mangle the skb so the network header points to the invoking packet inside the ICMP packet. The seg6 helpers can then be used on the skb to find any segment routing headers. If found, mark this fact in the IPv6 control block of the skb, and store the offset into the packet of the SRH. Then restore the skb back to its old state. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1fd07f33 |
|
26-Jun-2021 |
Andreas Roeseler <andreas.a.roeseler@gmail.com> |
ipv6: ICMPV6: add response to ICMPV6 RFC 8335 PROBE messages This patch builds off of commit 2b246b2569cd2ac6ff700d0dce56b8bae29b1842 and adds functionality to respond to ICMPV6 PROBE requests. Add icmp_build_probe function to construct PROBE requests for both ICMPV4 and ICMPV6. Modify icmpv6_rcv to detect ICMPV6 PROBE messages and call the icmpv6_echo_reply handler. Modify icmpv6_echo_reply to build a PROBE response message based on the queried interface. This patch has been tested using a branch of the iputils git repo which can be found here: https://github.com/Juniper-Clinic-2020/iputils/tree/probe-request Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31433202 |
|
12-Apr-2021 |
Andreas Roeseler <andreas.a.roeseler@gmail.com> |
icmp: ICMPV6: pass RFC 8335 reply messages to ping_rcv The current icmp_rcv function drops all unknown ICMP types, including ICMP_EXT_ECHOREPLY (type 43). In order to parse Extended Echo Reply messages, we have to pass these packets to the ping_rcv function, which does not do any other filtering and passes the packet to the designated socket. Pass incoming RFC 8335 ICMP Extended Echo Reply packets to the ping_rcv handler instead of discarding the packet. Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a6175633 |
|
31-Mar-2021 |
Eric Dumazet <edumazet@google.com> |
ipv6: convert elligible sysctls to u8 Convert most sysctls that can fit in a byte. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f185de28 |
|
25-Mar-2021 |
Taehee Yoo <ap420073@gmail.com> |
mld: add new workqueues for process mld events When query/report packets are received, mld module processes them. But they are processed under BH context so it couldn't use sleepable functions. So, in order to switch context, the two workqueues are added which processes query and report event. In the struct inet6_dev, mc_{query | report}_queue are added so it is per-interface queue. And mc_{query | report}_work are workqueue structure. When the query or report event is received, skb is queued to proper queue and worker function is scheduled immediately. Workqueues and queues are protected by spinlock, which is mc_{query | report}_lock, and worker functions are protected by RTNL. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee576c47 |
|
23-Feb-2021 |
Jason A. Donenfeld <Jason@zx2c4.com> |
net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending The icmp{,v6}_send functions make all sorts of use of skb->cb, casting it with IPCB or IP6CB, assuming the skb to have come directly from the inet layer. But when the packet comes from the ndo layer, especially when forwarded, there's no telling what might be in skb->cb at that point. As a result, the icmp sending code risks reading bogus memory contents, which can result in nasty stack overflows such as this one reported by a user: panic+0x108/0x2ea __stack_chk_fail+0x14/0x20 __icmp_send+0x5bd/0x5c0 icmp_ndo_send+0x148/0x160 In icmp_send, skb->cb is cast with IPCB and an ip_options struct is read from it. The optlen parameter there is of particular note, as it can induce writes beyond bounds. There are quite a few ways that can happen in __ip_options_echo. For example: // sptr/skb are attacker-controlled skb bytes sptr = skb_network_header(skb); // dptr/dopt points to stack memory allocated by __icmp_send dptr = dopt->__data; // sopt is the corrupt skb->cb in question if (sopt->rr) { optlen = sptr[sopt->rr+1]; // corrupt skb->cb + skb->data soffset = sptr[sopt->rr+2]; // corrupt skb->cb + skb->data // this now writes potentially attacker-controlled data, over // flowing the stack: memcpy(dptr, sptr+sopt->rr, optlen); } In the icmpv6_send case, the story is similar, but not as dire, as only IP6CB(skb)->iif and IP6CB(skb)->dsthao are used. The dsthao case is worse than the iif case, but it is passed to ipv6_find_tlv, which does a bit of bounds checking on the value. This is easy to simulate by doing a `memset(skb->cb, 0x41, sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by good fortune and the rarity of icmp sending from that context that we've avoided reports like this until now. For example, in KASAN: BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0 Write of size 38 at addr ffff888006f1f80e by task ping/89 CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5 Call Trace: dump_stack+0x9a/0xcc print_address_description.constprop.0+0x1a/0x160 __kasan_report.cold+0x20/0x38 kasan_report+0x32/0x40 check_memory_region+0x145/0x1a0 memcpy+0x39/0x60 __ip_options_echo+0xa0e/0x12b0 __icmp_send+0x744/0x1700 Actually, out of the 4 drivers that do this, only gtp zeroed the cb for the v4 case, while the rest did not. So this commit actually removes the gtp-specific zeroing, while putting the code where it belongs in the shared infrastructure of icmp{,v6}_ndo_send. This commit fixes the issue by passing an empty IPCB or IP6CB along to the functions that actually do the work. For the icmp_send, this was already trivial, thanks to __icmp_send providing the plumbing function. For icmpv6_send, this required a tiny bit of refactoring to make it behave like the v4 case, after which it was straight forward. Fixes: a2b78e9b2cac ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs") Reported-by: SinYu <liuxyon@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/netdev/CAF=yD-LOF116aHub6RMe8vB8ZpnrrnoTdqhobEx+bvoA8AsP0w@mail.gmail.com/T/ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20210223131858.72082-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
3df98d79 |
|
27-Sep-2020 |
Paul Moore <paul@paul-moore.com> |
lsm,selinux: pass flowi_common instead of flowi to the LSM hooks As pointed out by Herbert in a recent related patch, the LSM hooks do not have the necessary address family information to use the flowi struct safely. As none of the LSMs currently use any of the protocol specific flowi information, replace the flowi pointers with pointers to the address family independent flowi_common struct. Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
2efdaaaf |
|
27-Oct-2020 |
Hangbin Liu <liuhangbin@gmail.com> |
IPv6: reply ICMP error if the first fragment don't include all headers Based on RFC 8200, Section 4.5 Fragment Header: - If the first fragment does not include all headers through an Upper-Layer header, then that fragment should be discarded and an ICMP Parameter Problem, Code 3, message should be sent to the source of the fragment, with the Pointer field set to zero. Checking each packet header in IPv6 fast path will have performance impact, so I put the checking in ipv6_frag_rcv(). As the packet may be any kind of L4 protocol, I only checked some common protocols' header length and handle others by (offset + 1) > skb->len. Also use !(frag_off & htons(IP6_OFFSET)) to catch atomic fragments (fragmented packet with only one fragment). When send ICMP error message, if the 1st truncated fragment is ICMP message, icmp6_send() will break as is_ineligible() return true. So I added a check in is_ineligible() to let fragment packet with nexthdr ICMP but no ICMP header return false. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
272928d1 |
|
12-Oct-2020 |
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> |
ipv6/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) As per RFC4443, the destination address field for ICMPv6 error messages is copied from the source address field of the invoking packet. In configurations with Virtual Routing and Forwarding tables, looking up which routing table to use for sending ICMPv6 error messages is currently done by using the destination net_device. If the source and destination interfaces are within separate VRFs, or one in the global routing table and the other in a VRF, looking up the source address of the invoking packet in the destination interface's routing table will fail if the destination interface's routing table contains no route to the invoking packet's source address. One observable effect of this issue is that traceroute6 does not work in the following cases: - Route leaking between global routing table and VRF - Route leaking between VRFs Use the source device routing table when sending ICMPv6 error messages. [ In the context of ipv4, it has been pointed out that a similar issue may exist with ICMP errors triggered when forwarding between network namespaces. It would be worthwhile to investigate whether ipv6 has similar issues, but is outside of the scope of this investigation. ] [ Testing shows that similar issues exist with ipv6 unreachable / fragmentation needed messages. However, investigation of this additional failure mode is beyond this investigation's scope. ] Link: https://tools.ietf.org/html/rfc4443 Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8d5930df |
|
10-Jul-2020 |
Al Viro <viro@zeniv.linux.org.uk> |
skb_copy_and_csum_bits(): don't bother with the last argument it's always 0 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
0da7536f |
|
01-Jul-2020 |
Willem de Bruijn <willemb@google.com> |
ip: Fix SO_MARK in RST, ACK and ICMP packets When no full socket is available, skbs are sent over a per-netns control socket. Its sk_mark is temporarily adjusted to match that of the real (request or timewait) socket or to reflect an incoming skb, so that the outgoing skb inherits this in __ip_make_skb. Introduction of the socket cookie mark field broke this. Now the skb is set through the cookie and cork: <caller> # init sockc.mark from sk_mark or cmsg ip_append_data ip_setup_cork # convert sockc.mark to cork mark ip_push_pending_frames ip_finish_skb __ip_make_skb # set skb->mark to cork mark But I missed these special control sockets. Update all callers of __ip(6)_make_skb that were originally missed. For IPv6, the same two icmp(v6) paths are affected. The third case is not, as commit 92e55f412cff ("tcp: don't annotate mark on control socket from tcp_v6_send_response()") replaced the ctl_sk->sk_mark with passing the mark field directly as a function argument. That commit predates the commit that introduced the bug. Fixes: c6af0c227a22 ("ip: support SO_MARK cmsg") Signed-off-by: Willem de Bruijn <willemb@google.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc7a21b6 |
|
19-Jun-2020 |
Eric Dumazet <edumazet@google.com> |
ipv6: icmp6: avoid indirect call for icmpv6_send() If IPv6 is builtin, we do not need an expensive indirect call to reach icmp6_send(). v2: put inline keyword before the type to avoid sparse warnings. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b93cfb9c |
|
03-Apr-2020 |
Tim Stallard <code@timstallard.me.uk> |
net: icmp6: do not select saddr from iif when route has prefsrc set Since commit fac6fce9bdb5 ("net: icmp6: provide input address for traceroute6") ICMPv6 errors have source addresses from the ingress interface. However, this overrides when source address selection is influenced by setting preferred source addresses on routes. This can result in ICMP errors being lost to upstream BCP38 filters when the wrong source addresses are used, breaking path MTU discovery and traceroute. This patch sets the modified source address selection to only take place when the route used has no prefsrc set. It can be tested with: ip link add v1 type veth peer name v2 ip netns add test ip netns exec test ip link set lo up ip link set v2 netns test ip link set v1 up ip netns exec test ip link set v2 up ip addr add 2001:db8::1/64 dev v1 nodad ip addr add 2001:db8::3 dev v1 nodad ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad ip netns exec test ip route add unreachable 2001:db8:1::1 ip netns exec test ip addr add 2001:db8:100::1 dev lo ip netns exec test ip route add 2001:db8::1 dev v2 src 2001:db8:100::1 ip route add 2001:db8:1000::1 via 2001:db8::2 traceroute6 -s 2001:db8::1 2001:db8:1000::1 traceroute6 -s 2001:db8::3 2001:db8:1000::1 ip netns delete test Output before: $ traceroute6 -s 2001:db8::1 2001:db8:1000::1 traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets 1 2001:db8::2 (2001:db8::2) 0.843 ms !N 0.396 ms !N 0.257 ms !N $ traceroute6 -s 2001:db8::3 2001:db8:1000::1 traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets 1 2001:db8::2 (2001:db8::2) 0.772 ms !N 0.257 ms !N 0.357 ms !N After: $ traceroute6 -s 2001:db8::1 2001:db8:1000::1 traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets 1 2001:db8:100::1 (2001:db8:100::1) 8.885 ms !N 0.310 ms !N 0.174 ms !N $ traceroute6 -s 2001:db8::3 2001:db8:1000::1 traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets 1 2001:db8::2 (2001:db8::2) 1.403 ms !N 0.205 ms !N 0.313 ms !N Fixes: fac6fce9bdb5 ("net: icmp6: provide input address for traceroute6") Signed-off-by: Tim Stallard <code@timstallard.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a8eceea8 |
|
12-Mar-2020 |
Joe Perches <joe@perches.com> |
inet: Use fallthrough; Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/ And by hand: net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block that causes gcc to emit a warning if converted in-place. So move the new fallthrough; inside the containing #ifdef/#endif too. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fac6fce9 |
|
30-Oct-2019 |
Francesco Ruggeri <fruggeri@arista.com> |
net: icmp6: provide input address for traceroute6 traceroute6 output can be confusing, in that it shows the address that a router would use to reach the sender, rather than the address the packet used to reach the router. Consider this case: ------------------------ N2 | | ------ ------ N3 ---- | R1 | | R2 |------|H2| ------ ------ ---- | | ------------------------ N1 | ---- |H1| ---- where H1's default route is through R1, and R1's default route is through R2 over N2. traceroute6 from H1 to H2 shows R2's address on N1 rather than on N2. The script below can be used to reproduce this scenario. traceroute6 output without this patch: traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets 1 2000:101::1 (2000:101::1) 0.036 ms 0.008 ms 0.006 ms 2 2000:101::2 (2000:101::2) 0.011 ms 0.008 ms 0.007 ms 3 2000:103::4 (2000:103::4) 0.013 ms 0.010 ms 0.009 ms traceroute6 output with this patch: traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets 1 2000:101::1 (2000:101::1) 0.056 ms 0.019 ms 0.006 ms 2 2000:102::2 (2000:102::2) 0.013 ms 0.008 ms 0.008 ms 3 2000:103::4 (2000:103::4) 0.013 ms 0.009 ms 0.009 ms #!/bin/bash # # ------------------------ N2 # | | # ------ ------ N3 ---- # | R1 | | R2 |------|H2| # ------ ------ ---- # | | # ------------------------ N1 # | # ---- # |H1| # ---- # # N1: 2000:101::/64 # N2: 2000:102::/64 # N3: 2000:103::/64 # # R1's host part of address: 1 # R2's host part of address: 2 # H1's host part of address: 3 # H2's host part of address: 4 # # For example: # the IPv6 address of R1's interface on N2 is 2000:102::1/64 # # Nets are implemented by macvlan interfaces (bridge mode) over # dummy interfaces. # # Create net namespaces ip netns add host1 ip netns add host2 ip netns add rtr1 ip netns add rtr2 # Create nets ip link add net1 type dummy; ip link set net1 up ip link add net2 type dummy; ip link set net2 up ip link add net3 type dummy; ip link set net3 up # Add interfaces to net1, move them to their nemaspaces ip link add link net1 dev host1net1 type macvlan mode bridge ip link set host1net1 netns host1 ip link add link net1 dev rtr1net1 type macvlan mode bridge ip link set rtr1net1 netns rtr1 ip link add link net1 dev rtr2net1 type macvlan mode bridge ip link set rtr2net1 netns rtr2 # Add interfaces to net2, move them to their nemaspaces ip link add link net2 dev rtr1net2 type macvlan mode bridge ip link set rtr1net2 netns rtr1 ip link add link net2 dev rtr2net2 type macvlan mode bridge ip link set rtr2net2 netns rtr2 # Add interfaces to net3, move them to their nemaspaces ip link add link net3 dev rtr2net3 type macvlan mode bridge ip link set rtr2net3 netns rtr2 ip link add link net3 dev host2net3 type macvlan mode bridge ip link set host2net3 netns host2 # Configure interfaces and routes in host1 ip netns exec host1 ip link set lo up ip netns exec host1 ip link set host1net1 up ip netns exec host1 ip -6 addr add 2000:101::3/64 dev host1net1 ip netns exec host1 ip -6 route add default via 2000:101::1 # Configure interfaces and routes in rtr1 ip netns exec rtr1 ip link set lo up ip netns exec rtr1 ip link set rtr1net1 up ip netns exec rtr1 ip -6 addr add 2000:101::1/64 dev rtr1net1 ip netns exec rtr1 ip link set rtr1net2 up ip netns exec rtr1 ip -6 addr add 2000:102::1/64 dev rtr1net2 ip netns exec rtr1 ip -6 route add default via 2000:102::2 ip netns exec rtr1 sysctl net.ipv6.conf.all.forwarding=1 # Configure interfaces and routes in rtr2 ip netns exec rtr2 ip link set lo up ip netns exec rtr2 ip link set rtr2net1 up ip netns exec rtr2 ip -6 addr add 2000:101::2/64 dev rtr2net1 ip netns exec rtr2 ip link set rtr2net2 up ip netns exec rtr2 ip -6 addr add 2000:102::2/64 dev rtr2net2 ip netns exec rtr2 ip link set rtr2net3 up ip netns exec rtr2 ip -6 addr add 2000:103::2/64 dev rtr2net3 ip netns exec rtr2 sysctl net.ipv6.conf.all.forwarding=1 # Configure interfaces and routes in host2 ip netns exec host2 ip link set lo up ip netns exec host2 ip link set host2net3 up ip netns exec host2 ip -6 addr add 2000:103::4/64 dev host2net3 ip netns exec host2 ip -6 route add default via 2000:103::2 # Ping host2 from host1 ip netns exec host1 ping6 -c5 2000:103::4 # Traceroute host2 from host1 ip netns exec host1 traceroute6 2000:103::4 # Delete nets ip link del net3 ip link del net2 ip link del net1 # Delete namespaces ip netns del rtr2 ip netns del rtr1 ip netns del host2 ip netns del host1 Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Original-patch-by: Honggang Xu <hxu@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a346abe0 |
|
01-Jul-2019 |
Eric Dumazet <edumazet@google.com> |
ipv6: icmp: allow flowlabel reflection in echo replies Extend flowlabel_reflect bitmask to allow conditional reflection of incoming flowlabels in echo replies. Note this has precedence against auto flowlabels. Add flowlabel_reflect enum to replace hard coded values. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e1ae5c2e |
|
10-Jun-2019 |
Stephen Suryaputra <ssuryaextr@gmail.com> |
vrf: Increment Icmp6InMsgs on the original netdev Get the ingress interface and increment ICMP counters based on that instead of skb->dev when the the dev is a VRF device. This is a follow up on the following message: https://www.spinics.net/lists/netdev/msg560268.html v2: Avoid changing skb->dev since it has unintended effect for local delivery (David Ahern). Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2789c14d |
|
31-May-2019 |
Eric Dumazet <edumazet@google.com> |
ipv6: icmp: use this_cpu_read() in icmpv6_sk() In general, this_cpu_read(*X) is faster than *this_cpu_ptr(X) Also remove the inline attibute, totally useless. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2874c5fd |
|
27-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
0bc19985 |
|
17-Apr-2019 |
Stephen Suryaputra <ssuryaextr@gmail.com> |
ipv6: Add rate limit mask for ICMPv6 messages To make ICMPv6 closer to ICMPv4, add ratemask parameter. Since the ICMP message types use larger numeric values, a simple bitmask doesn't fit. I use large bitmap. The input and output are the in form of list of ranges. Set the default to rate limit all error messages but Packet Too Big. For Packet Too Big, use ratemask instead of hard-coded. There are functions where icmpv6_xrlim_allow() and icmpv6_global_allow() aren't called. This patch only adds them to icmpv6_echo_reply(). Rate limiting error messages is mandated by RFC 4443 but RFC 4890 says that it is also acceptable to rate limit informational messages. Thus, I removed the current hard-coded behavior of icmpv6_mask_allow() that doesn't rate limit informational messages. v2: Add dummy function proc_do_large_bitmap() if CONFIG_PROC_SYSCTL isn't defined, expand the description in ip-sysctl.txt and remove unnecessary conditional before kfree(). v3: Inline the bitmap instead of dynamically allocated. Still is a pointer to it is needed because of the way proc_do_large_bitmap work. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0b03a5ca |
|
20-Mar-2019 |
Stephen Suryaputra <ssuryaextr@gmail.com> |
ipv6: Add icmp_echo_ignore_anycast for ICMPv6 In addition to icmp_echo_ignore_multicast, there is a need to also prevent responding to pings to anycast addresses for security. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
03f1eccc |
|
18-Mar-2019 |
Stephen Suryaputra <ssuryaextr@gmail.com> |
ipv6: Add icmp_echo_ignore_multicast support for ICMPv6 IPv4 has icmp_echo_ignore_broadcast to prevent responding to broadcast pings. IPv6 needs a similar mechanism. v1->v2: - Remove NET_IPV6_ICMP_ECHO_IGNORE_MULTICAST. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75efc250 |
|
23-Feb-2019 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
ipv6: icmp: use percpu allocation Use percpu allocation for the ipv6.icmp_sk. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3232a1ef |
|
23-Feb-2019 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
ipv6: icmp: use icmpv6_sk_exit() Simply use icmpv6_sk_exit() when inet_ctl_sock_create() fail in icmpv6_sk_init(). Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d933670 |
|
04-Jan-2019 |
Eric Dumazet <edumazet@google.com> |
ipv6: make icmp6_send() robust against null skb->dev syzbot was able to crash one host with the following stack trace : kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 8625 Comm: syz-executor4 Not tainted 4.20.0+ #8 RIP: 0010:dev_net include/linux/netdevice.h:2169 [inline] RIP: 0010:icmp6_send+0x116/0x2d30 net/ipv6/icmp.c:426 icmpv6_send smack_socket_sock_rcv_skb security_sock_rcv_skb sk_filter_trim_cap __sk_receive_skb dccp_v6_do_rcv release_sock This is because a RX packet found socket owned by user and was stored into socket backlog. Before leaving RCU protected section, skb->dev was cleared in __sk_receive_skb(). When socket backlog was finally handled at release_sock() time, skb was fed to smack_socket_sock_rcv_skb() then icmp6_send() We could fix the bug in smack_socket_sock_rcv_skb(), or simply make icmp6_send() more robust against such possibility. In the future we might provide to icmp6_send() the net pointer instead of infering it. Fixes: d66a8acbda92 ("Smack: Inform peer that IPv6 traffic has been blocked") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Piotr Sawicki <p.sawicki2@partner.samsung.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
32bbd879 |
|
07-Nov-2018 |
Stefano Brivio <sbrivio@redhat.com> |
net: Convert protocol error handlers from void to int We'll need this to handle ICMP errors for tunnels without a sending socket (i.e. FoU and GUE). There, we might have to look up different types of IP tunnels, registered as network protocols, before we get a match, so we want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both inet_protos and inet6_protos. These error codes will be used in the next patch. For consistency, return sensible error codes in protocol error handlers whenever handlers can't handle errors because, even if valid, they don't match a protocol or any of its states. This has no effect on existing error handling paths. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e6f86b0f |
|
10-Aug-2018 |
Virgile Jarry <virgile@acceis.fr> |
ipv6: Add icmp_echo_ignore_all support for ICMPv6 Preventing the kernel from responding to ICMP Echo Requests messages can be useful in several ways. The sysctl parameter 'icmp_echo_ignore_all' can be used to prevent the kernel from responding to IPv4 ICMP echo requests. For IPv6 pings, such a sysctl kernel parameter did not exist. Add the ability to prevent the kernel from responding to IPv6 ICMP echo requests through the use of the following sysctl parameter : /proc/sys/net/ipv6/icmp/echo_ignore_all. Update the documentation to reflect this change. Signed-off-by: Virgile Jarry <virgile@acceis.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5f379ef5 |
|
02-Aug-2018 |
Georg Kohmann <geokohma@cisco.com> |
ipv6: icmp: Updating pmtu for link local route When a ICMPV6_PKT_TOOBIG is received from a link local address the pmtu will be updated on a route with an arbitrary interface index. Subsequent packets sent back to the same link local address may therefore end up not considering the updated pmtu. Current behavior breaks TAHI v6LC4.1.4 Reduce PMTU On-link. Referring to RFC 1981: Section 3: "Note that Path MTU Discovery must be performed even in cases where a node "thinks" a destination is attached to the same link as itself. In a situation such as when a neighboring router acts as proxy [ND] for some destination, the destination can to appear to be directly connected but is in fact more than one hop away." Using the interface index from the incoming ICMPV6_PKT_TOOBIG when updating the pmtu. Signed-off-by: Georg Kohmann <geokohma@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
24b711ed |
|
19-Jul-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Fix linklocal to global address with VRF Example setup: host: ip -6 addr add dev eth1 2001:db8:104::4 where eth1 is enslaved to a VRF switch: ip -6 ro add 2001:db8:104::4/128 dev br1 where br1 only has an LLA ping6 2001:db8:104::4 ssh 2001:db8:104::4 (NOTE: UDP works fine if the PKTINFO has the address set to the global address and ifindex is set to the index of eth1 with a destination an LLA). For ICMP, icmp6_iif needs to be updated to check if skb->dev is an L3 master. If it is then return the ifindex from rt6i_idev similar to what is done for loopback. For TCP, restore the original tcp_v6_iif definition which is needed in most places and add a new tcp_v6_iif_l3_slave that considers the l3_slave variability. This latter check is only needed for socket lookups. Fixes: 9ff74384600a ("net: vrf: Handle ipv6 multicast and link-local addresses") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5fdaa88d |
|
06-Jul-2018 |
Willem de Bruijn <willemb@google.com> |
ipv6: fold sockcm_cookie into ipcm6_cookie ipcm_cookie includes sockcm_cookie. Do the same for ipcm6_cookie. This reduces the number of arguments that need to be passed around, applies ipcm6_init to all cookie fields at once and reduces code differentiation between ipv4 and ipv6. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b515430a |
|
06-Jul-2018 |
Willem de Bruijn <willemb@google.com> |
ipv6: ipcm6_cookie initializer Initialize the cookie in one location to reduce code duplication and avoid bugs from inconsistent initialization, such as that fixed in commit 9887cba19978 ("ip: limit use of gso_size to udp"). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6396bb22 |
|
12-Jun-2018 |
Kees Cook <keescook@chromium.org> |
treewide: kzalloc() -> kcalloc() The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
2f635cee |
|
27-Mar-2018 |
Kirill Tkhai <ktkhai@virtuozzo.com> |
net: Drop pernet_operations::async Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b4bac172 |
|
02-Mar-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Add support for path selection using hash of 5-tuple Some operators prefer IPv6 path selection to use a standard 5-tuple hash rather than just an L3 hash with the flow the label. To that end add support to IPv6 for multipath hash policy similar to bf4e0a3db97eb ("net: ipv4: add support for ECMP hash policy choice"). The default is still L3 which covers source and destination addresses along with flow label and IPv6 protocol. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b75cc8f9 |
|
02-Mar-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Pass skb to route lookup IPv6 does path selection for multipath routes deep in the lookup functions. The next patch adds L4 hash option and needs the skb for the forward path. To get the skb to the relevant FIB lookup functions it needs to go through the fib rules layer, so add a lookup_data argument to the fib_lookup_arg struct. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5e5d6fed |
|
28-Feb-2018 |
Roopa Prabhu <roopa@cumulusnetworks.com> |
ipv6: route: dissect flow in input path if fib rules need it Dissect flow in fwd path if fib rules require it. Controlled by a flag to avoid penatly for the common case. Flag is set when fib rules with sport, dport and proto match that require flow dissect are installed. Also passes the dissected hash keys to the multipath hash function when applicable to avoid dissecting the flow again. icmp packets will continue to use inner header for hash calculations. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a2e9332 |
|
19-Feb-2018 |
Kirill Tkhai <ktkhai@virtuozzo.com> |
net: Convert icmpv6_sk_ops, ndisc_net_ops and igmp6_net_ops These pernet_operations create and destroy net::ipv6.icmp_sk socket, used to send ICMP or error reply. Nobody can dereference the socket to handle a packet before net is initialized, as there is no routing; nobody can do that in parallel with exit, as all of devices are moved to init_net or destroyed and there are no packets it-flight. So, it's possible to mark these pernet_operations as async. The same for ndisc_net_ops and for igmp6_net_ops. The last one also creates and destroys /proc entries. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
275757e6 |
|
16-Oct-2017 |
Gustavo A. R. Silva <garsilva@embeddedor.com> |
ipv6: mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in some cases I placed the "fall through" comment on its own line, which is what GCC is expecting to find. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4e64b1ed |
|
06-Oct-2017 |
Joe Perches <joe@perches.com> |
net/ipv6: Convert icmpv6_push_pending_frames to void commit cc71b7b07119 ("net/ipv6: remove unused err variable on icmpv6_push_pending_frames") exposed icmpv6_push_pending_frames return value not being used. Remove now unnecessary int err declarations and uses. Miscellanea: o Remove unnecessary goto and out: labels o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc71b7b0 |
|
05-Oct-2017 |
Tim Hansen <devtimhansen@gmail.com> |
net/ipv6: remove unused err variable on icmpv6_push_pending_frames int err is unused by icmpv6_push_pending_frames(), this patch returns removes the variable and returns the function with 0. git bisect shows this variable has been around since linux has been in git in commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2. This was found by running make coccicheck M=net/ipv6/ on linus' tree on commit 77ede3a014a32746002f7889211f0cecf4803163 (current HEAD as of this patch). Signed-off-by: Tim Hansen <devtimhansen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1b70d792 |
|
28-Aug-2017 |
David Ahern <dsahern@gmail.com> |
ipv6: Use rt6i_idev index for echo replies to a local address Tariq repored local pings to linklocal address is failing: $ ifconfig ens8 ens8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 11.141.16.6 netmask 255.255.0.0 broadcast 11.141.255.255 inet6 fe80::7efe:90ff:fecb:7502 prefixlen 64 scopeid 0x20<link> ether 7c:fe:90:cb:75:02 txqueuelen 1000 (Ethernet) RX packets 12 bytes 1164 (1.1 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 30 bytes 2484 (2.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 $ /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8 PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data bytes Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
23aebdac |
|
23-Aug-2017 |
Jakub Sitnicki <jkbs@redhat.com> |
ipv6: Compute multipath hash for ICMP errors from offending packet When forwarding or sending out an ICMPv6 error, look at the embedded packet that triggered the error and compute a flow hash over its headers. This let's us route the ICMP error together with the flow it belongs to when multipath (ECMP) routing is in use, which in turn makes Path MTU Discovery work in ECMP load-balanced or anycast setups (RFC 7690). Granted, end-hosts behind the ECMP router (aka servers) need to reflect the IPv6 Flow Label for PMTUD to work. The code is organized to be in parallel with ipv4 stack: ip_multipath_l3_keys -> ip6_multipath_l3_keys fib_multipath_hash -> rt6_multipath_hash Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4832c30d |
|
17-Aug-2017 |
David Ahern <dsahern@gmail.com> |
net: ipv6: put host and anycast routes on device with address One nagging difference between ipv4 and ipv6 is host routes for ipv6 addresses are installed using the loopback device or VRF / L3 Master device. e.g., 2001:db8:1::/120 dev veth0 proto kernel metric 256 pref medium local 2001:db8:1::1 dev lo table local proto kernel metric 0 pref medium Using the loopback device is convenient -- necessary for local tx, but has some nasty side effects, most notably setting the 'lo' device down causes all host routes for all local IPv6 address to be removed from the FIB and completely breaks IPv6 networking across all interfaces. This patch puts FIB entries for IPv6 routes against the device. This simplifies the routes in the FIB, for example by making dst->dev and rt6i_idev->dev the same (a future patch can look at removing the device reference taken for rt6i_idev for FIB entries). When copies are made on FIB lookups, the cloned route has dst->dev set to loopback (or the L3 master device). This is needed for the local Tx of packets to local addresses. With fib entries allocated against the real network device, the addrconf code that reinserts host routes on admin up of 'lo' is no longer needed. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
849a44de |
|
14-Jun-2017 |
Jesper Dangaard Brouer <brouer@redhat.com> |
net: don't global ICMP rate limit packets originating from loopback Florian Weimer seems to have a glibc test-case which requires that loopback interfaces does not get ICMP ratelimited. This was broken by commit c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited"). An ICMP response will usually be routed back-out the same incoming interface. Thus, take advantage of this and skip global ICMP ratelimit when the incoming device is loopback. In the unlikely event that the outgoing it not loopback, due to strange routing policy rules, ICMP rate limiting still works via peer ratelimiting via icmpv4_xrlim_allow(). Thus, we should still comply with RFC1812 (section 4.3.2.8 "Rate Limiting"). This seems to fix the reproducer given by Florian. While still avoiding to perform expensive and unneeded outgoing route lookup for rate limited packets (in the non-loopback case). Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited") Reported-by: Florian Weimer <fweimer@redhat.com> Reported-by: "H.J. Lu" <hjl.tools@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7ba91ecb |
|
09-Jan-2017 |
Jesper Dangaard Brouer <brouer@redhat.com> |
net: for rate-limited ICMP replies save one atomic operation It is possible to avoid the atomic operation in icmp{v6,}_xmit_lock, by checking the sysctl_icmp_msgs_per_sec ratelimit before these calls, as pointed out by Eric Dumazet, but the BH disabled state must be correct. The icmp_global_allow() call states it must be called with BH disabled. This protection was given by the calls icmp_xmit_lock and icmpv6_xmit_lock. Thus, split out local_bh_disable/enable from these functions and maintain it explicitly at callers. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c0303efe |
|
09-Jan-2017 |
Jesper Dangaard Brouer <brouer@redhat.com> |
net: reduce cycles spend on ICMP replies that gets rate limited This patch split the global and per (inet)peer ICMP-reply limiter code, and moves the global limit check to earlier in the packet processing path. Thus, avoid spending cycles on ICMP replies that gets limited/suppressed anyhow. The global ICMP rate limiter icmp_global_allow() is a good solution, it just happens too late in the process. The kernel goes through the full route lookup (return path) for the ICMP message, before taking the rate limit decision of not sending the ICMP reply. Details: The kernels global rate limiter for ICMP messages got added in commit 4cdf507d5452 ("icmp: add a global rate limitation"). It is a token bucket limiter with a global lock. It brilliantly avoids locking congestion by only updating when 20ms (HZ/50) were elapsed. It can then avoids taking lock when credit is exhausted (when under pressure) and time constraint for refill is not yet meet. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7c0f6ba6 |
|
24-Dec-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
79dc7e3f |
|
27-Nov-2016 |
David Ahern <dsa@cumulusnetworks.com> |
net: handle no dst on skb in icmp6_send Andrey reported the following while fuzzing the kernel with syzkaller: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff8800666d4200 task.stack: ffff880067348000 RIP: 0010:[<ffffffff833617ec>] [<ffffffff833617ec>] icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451 RSP: 0018:ffff88006734f2c0 EFLAGS: 00010206 RAX: ffff8800666d4200 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018 RBP: ffff88006734f630 R08: ffff880064138418 R09: 0000000000000003 R10: dffffc0000000000 R11: 0000000000000005 R12: 0000000000000000 R13: ffffffff84e7e200 R14: ffff880064138484 R15: ffff8800641383c0 FS: 00007fb3887a07c0(0000) GS:ffff88006cc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 000000006b040000 CR4: 00000000000006f0 Stack: ffff8800666d4200 ffff8800666d49f8 ffff8800666d4200 ffffffff84c02460 ffff8800666d4a1a 1ffff1000ccdaa2f ffff88006734f498 0000000000000046 ffff88006734f440 ffffffff832f4269 ffff880064ba7456 0000000000000000 Call Trace: [<ffffffff83364ddc>] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557 [< inline >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88 [<ffffffff83394405>] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157 [<ffffffff8339a759>] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663 [<ffffffff832ee773>] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191 ... icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both cases the dst->dev should be preferred for determining the L3 domain if the dst has been set on the skb. Fallback to the skb->dev if it has not. This covers the case reported here where icmp6_send is invoked on Rx before the route lookup. Fixes: 5d41ce29e ("net: icmp6_send should use dst dev to determine L3 domain") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d41ce29 |
|
03-Nov-2016 |
David Ahern <dsa@cumulusnetworks.com> |
net: icmp6_send should use dst dev to determine L3 domain icmp6_send is called in response to some event. The skb may not have the device set (skb->dev is NULL), but it is expected to have a dst set. Update icmp6_send to use the dst on the skb to determine L3 domain. Fixes: ca254490c8dfd ("net: Add VRF support to IPv6 stack") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e2d118a1 |
|
03-Nov-2016 |
Lorenzo Colitti <lorenzo@google.com> |
net: inet: Support UID-based routing in IP protocols. - Use the UID in routing lookups made by protocol connect() and sendmsg() functions. - Make sure that routing lookups triggered by incoming packets (e.g., Path MTU discovery) take the UID of the socket into account. - For packets not associated with a userspace socket, (e.g., ping replies) use UID 0 inside the user namespace corresponding to the network namespace the socket belongs to. This allows all namespaces to apply routing and iptables rules to kernel-originated traffic in that namespaces by matching UID 0. This is better than using the UID of the kernel socket that is sending the traffic, because the UID of kernel sockets created at namespace creation time (e.g., the per-processor ICMP and TCP sockets) is the UID of the user that created the socket, which might not be mapped in the namespace. Tested: compiles allnoconfig, allyesconfig, allmodconfig Tested: https://android-review.googlesource.com/253302 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
20e1954f |
|
18-Jun-2016 |
Eric Dumazet <edumazet@google.com> |
ipv6: RFC 4884 partial support for SIT/GRE tunnels When receiving an ICMPv4 message containing extensions as defined in RFC 4884, and translating it to ICMPv6 at SIT or GRE tunnel, we need some extra manipulation in order to properly forward the extensions. This patch only takes care of Time Exceeded messages as they are the ones that typically carry information from various routers in a fabric during a traceroute session. It also avoids complex skb logic if the data_len is not a multiple of 8. RFC states : The "original datagram" field MUST contain at least 128 octets. If the original datagram did not contain 128 octets, the "original datagram" field MUST be zero padded to 128 octets. In practice routers use 128 bytes of original datagram, not more. Initial translation was added in commit ca15a078bd90 ("sit: generate icmpv6 error when receiving icmpv4 error") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Oussama Ghorbel <ghorbel@pivasoftware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2d7a3b27 |
|
18-Jun-2016 |
Eric Dumazet <edumazet@google.com> |
ipv6: translate ICMP_TIME_EXCEEDED to ICMPV6_TIME_EXCEED For better traceroute/mtr support for SIT and GRE tunnels, we translate IPV4 ICMP ICMP_TIME_EXCEEDED to ICMPV6_TIME_EXCEED We also have to translate the IPv4 source IP address of ICMP message to IPv6 v4mapped. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5fbba8ac |
|
18-Jun-2016 |
Eric Dumazet <edumazet@google.com> |
ip6: move ipip6_err_gen_icmpv6_unreach() We want to use this helper from GRE as well, so this is the time to move it in net/ipv6/icmp.c Also add a @nhs parameter, since SIT and GRE have different values for the header(s) to skip. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b1cadc1a |
|
18-Jun-2016 |
Eric Dumazet <edumazet@google.com> |
ipv6: icmp: add a force_saddr param to icmp6_send() SIT or GRE tunnels might want to translate an IPV4 address into a v4mapped one when translating ICMP to ICMPv6. This patch adds the parameter to icmp6_send() but does not change icmpv6_send() signature. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ff74384 |
|
13-Jun-2016 |
David Ahern <dsa@cumulusnetworks.com> |
net: vrf: Handle ipv6 multicast and link-local addresses IPv6 multicast and link-local addresses require special handling by the VRF driver: 1. Rather than using the VRF device index and full FIB lookups, packets to/from these addresses should use direct FIB lookups based on the VRF device table. 2. fail sends/receives on a VRF device to/from a multicast address (e.g, make ping6 ff02::1%<vrf> fail) 3. move the setting of the flow oif to the first dst lookup and revert the change in icmpv6_echo_reply made in ca254490c8dfd ("net: Add VRF support to IPv6 stack"). Linklocal/mcast addresses require use of the skb->dev. With this change connections into and out of a VRF enslaved device work for multicast and link-local addresses work (icmp, tcp, and udp) e.g., 1. packets into VM with VRF config: ping6 -c3 fe80::e0:f9ff:fe1c:b974%br1 ping6 -c3 ff02::1%br1 ssh -6 fe80::e0:f9ff:fe1c:b974%br1 2. packets going out a VRF enslaved device: ping6 -c3 fe80::18f8:83ff:fe4b:7a2e%eth1 ping6 -c3 ff02::1%eth1 ssh -6 root@fe80::18f8:83ff:fe4b:7a2e%eth1 Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dcb94b88 |
|
11-Jun-2016 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: fix endianness error in icmpv6_err IPv6 ping socket error handler doesn't correctly convert the new 32 bit mtu to host endianness before using. Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: 6d0bfe22611602f ("net: ipv6: Add IPv6 support to the ping socket.") Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
38b7097b |
|
11-Jun-2016 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: use TOS marks from sockets for routing decision In IPv6 the ToS values are part of the flowlabel in flowi6 and get extracted during fib rule lookup, but we forgot to correctly initialize the flowlabel before the routing lookup. Reported-by: <liam.mcbirnie@boeing.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1d2f7b2d |
|
04-May-2016 |
David Ahern <dsa@cumulusnetworks.com> |
net: ipv6: tcp reset, icmp need to consider L3 domain Responses for packets to unused ports are getting lost with L3 domains. IPv4 has ip_send_unicast_reply for sending TCP responses which accounts for L3 domains; update the IPv6 counterpart tcp_v6_send_response. For icmp the L3 master check needs to be moved up in icmp6_send to properly respond to UDP packets to a port with no listener. Fixes: ca254490c8df ("net: Add VRF support to IPv6 stack") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
26879da5 |
|
02-May-2016 |
Wei Wang <weiwan@google.com> |
ipv6: add new struct ipcm6_cookie In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local variables like hlimits, tclass, opt and dontfrag and pass them to corresponding functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames. This is not a good practice and makes it hard to add new parameters. This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in ipv4 and include the above mentioned variables. And we only pass the pointer to this structure to corresponding functions. This makes it easier to add new parameters in the future and makes the function cleaner. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3832ed2 |
|
27-Apr-2016 |
Eric Dumazet <edumazet@google.com> |
ipv6: kill ICMP6MSGIN_INC_STATS_BH() IPv6 ICMP stats are atomics anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a16292a0 |
|
27-Apr-2016 |
Eric Dumazet <edumazet@google.com> |
net: rename ICMP6_INC_STATS_BH() Rename ICMP6_INC_STATS_BH() to __ICMP6_INC_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c14ac945 |
|
02-Apr-2016 |
Soheil Hassas Yeganeh <soheil@google.com> |
sock: enable timestamping using control messages Currently, SOL_TIMESTAMPING can only be enabled using setsockopt. This is very costly when users want to sample writes to gather tx timestamps. Add support for enabling SO_TIMESTAMPING via control messages by using tsflags added in `struct sockcm_cookie` (added in the previous patches in this series) to set the tx_flags of the last skb created in a sendmsg. With this patch, the timestamp recording bits in tx_flags of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg. Please note that this is only effective for overriding the recording timestamps flags. Users should enable timestamp reporting (e.g., SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using socket options and then should ask for SOF_TIMESTAMPING_TX_* using control messages per sendmsg to sample timestamps for each write. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6bd4f355 |
|
02-Dec-2015 |
Eric Dumazet <edumazet@google.com> |
ipv6: kill sk_dst_lock While testing the np->opt RCU conversion, I found that UDP/IPv6 was using a mixture of xchg() and sk_dst_lock to protect concurrent changes to sk->sk_dst_cache, leading to possible corruptions and crashes. ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest way to fix the mess is to remove sk_dst_lock completely, as we did for IPv4. __ip6_dst_store() and ip6_dst_store() share same implementation. sk_setup_caps() being called with socket lock being held or not, we have to use sk_dst_set() instead of __sk_dst_set() Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);" in ip6_dst_store() before the sk_setup_caps(sk, dst) call. This is because ip6_dst_store() can be called from process context, without any lock held. As soon as the dst is installed in sk->sk_dst_cache, dst can be freed from another cpu doing a concurrent ip6_dst_store() Doing the dst dereference before doing the install is needed to make sure no use after free would trigger. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4b3418fb |
|
24-Oct-2015 |
Bjørn Mork <bjorn@mork.no> |
ipv6: icmp: include addresses in debug messages Messages like "icmp6_send: no reply to icmp error" are close to useless. Adding source and destination addresses to provide some more clue. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ca254490 |
|
12-Oct-2015 |
David Ahern <dsa@cumulusnetworks.com> |
net: Add VRF support to IPv6 stack As with IPv4 support for VRFs added to IPv6 stack by replacing hardcoded table ids with possibly device specific ones and manipulating the oif in the flowi6. The flow flags are used to skip oif compare in nexthop lookups if the device is enslaved to a VRF via the L3 master device. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
343d60aa |
|
30-Jul-2015 |
Roopa Prabhu <roopa@cumulusnetworks.com> |
ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument This patch adds net argument to ipv6_stub_impl.ipv6_dst_lookup for use cases where sk is not available (like mpls). sk appears to be needed to get the namespace 'net' and is optional otherwise. This patch series changes ipv6_stub_impl.ipv6_dst_lookup to take net argument. sk remains optional. All callers of ipv6_stub_impl.ipv6_dst_lookup have been modified to pass net. I have modified them to use already available 'net' in the scope of the call. I can change them to sock_net(sk) to avoid any unintended change in behaviour if sock namespace is different. They dont seem to be from code inspection. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2647a9b0 |
|
22-May-2015 |
Martin KaFai Lau <kafai@fb.com> |
ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST When creating a RTF_CACHE route, RTF_ANYCAST is set based on rt6i_dst. Also, rt6i_gateway is always set to the nexthop while the nexthop could be a gateway or the rt6i_dst.addr. After removing the rt6i_dst and rt6i_src dependency in the last patch, we also need to stop the caller from depending on rt6i_gateway and RTF_ANYCAST. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fd0273d7 |
|
22-May-2015 |
Martin KaFai Lau <kafai@fb.com> |
ipv6: Remove external dependency on rt6i_dst and rt6i_src This patch removes the assumptions that the returned rt is always a RTF_CACHE entry with the rt6i_dst and rt6i_src containing the destination and source address. The dst and src can be recovered from the calling site. We may consider to rename (rt6i_dst, rt6i_src) to (rt6i_key_dst, rt6i_key_src) later. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
63159f29 |
|
29-Mar-2015 |
Ian Morris <ipm@chirality.org.uk> |
ipv6: coding style: comparison for equality with NULL The ipv6 code uses a mixture of coding styles. In some instances check for NULL pointer is done as x == NULL and sometimes as !x. !x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9a6b4b39 |
|
14-Jan-2015 |
zhuyj <zyjzyj2000@gmail.com> |
ipv6:icmp:remove unnecessary brackets There are too many brackets. Maybe only one bracket is enough. Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5d08d71 |
|
23-Nov-2014 |
Ian Morris <ipm@chirality.org.uk> |
ipv6: coding style improvements (remove assignment in if statements) This change has no functional impact and simply addresses some coding style issues detected by checkpatch. Specifically this change adjusts "if" statements which also include the assignment of a variable. No changes to the resultant object files result as determined by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e3e32170 |
|
17-Nov-2014 |
Rick Jones <rick.jones2@hp.com> |
icmp: Remove some spurious dropped packet profile hits from the ICMP path If icmp_rcv() has successfully processed the incoming ICMP datagram, we should use consume_skb() rather than kfree_skb() because a hit on the likes of perf -e skb:kfree_skb is not called-for. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ba7a46f1 |
|
11-Nov-2014 |
Joe Perches <joe@perches.com> |
net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited Use the more common dynamic_debug capable net_dbg_ratelimited and remove the LIMIT_NETDEBUG macro. All messages are still ratelimited. Some KERN_<LEVEL> uses are changed to KERN_DEBUG. This may have some negative impact on messages that were emitted at KERN_INFO that are not not enabled at all unless DEBUG is defined or dynamic_debug is enabled. Even so, these messages are now _not_ emitted by default. This also eliminates the use of the net_msg_warn sysctl "/proc/sys/net/core/warnings". For backward compatibility, the sysctl is not removed, but it has no function. The extern declaration of net_msg_warn is removed from sock.h and made static in net/core/sysctl_net_core.c Miscellanea: o Update the sysctl documentation o Remove the embedded uses of pr_fmt o Coalesce format fragments o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
105970f6 |
|
20-Oct-2014 |
Kenjiro Nakayama <nakayamakenjiro@gmail.com> |
net: Remove trailing whitespace in tcp.h icmp.c syncookies.c Remove trailing whitespace in tcp.h icmp.c syncookies.c Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ea85a0a2 |
|
07-Oct-2014 |
David S. Miller <davem@davemloft.net> |
ipv6: Do not warn for informational ICMP messages, regardless of type. There is no reason to emit a log message for these. Based upon a suggestion from Hannes Frederic Sowa. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
|
#
4cdf507d |
|
19-Sep-2014 |
Eric Dumazet <edumazet@google.com> |
icmp: add a global rate limitation Current ICMP rate limiting uses inetpeer cache, which is an RBL tree protected by a lock, meaning that hosts can be stuck hard if all cpus want to check ICMP limits. When say a DNS or NTP server process is restarted, inetpeer tree grows quick and machine comes to its knees. iptables can not help because the bottleneck happens before ICMP messages are even cooked and sent. This patch adds a new global limitation, using a token bucket filter, controlled by two new sysctl : icmp_msgs_per_sec - INTEGER Limit maximal number of ICMP packets sent per second from this host. Only messages whose type matches icmp_ratemask are controlled by this limit. Default: 1000 icmp_msgs_burst - INTEGER icmp_msgs_per_sec controls number of ICMP packets sent per second, while icmp_msgs_burst controls the burst size of these packets. Default: 50 Note that if we really want to send millions of ICMP messages per second, we might extend idea and infra added in commit 04ca6973f7c1a ("ip: make IP identifiers less predictable") : add a token bucket in the ip_idents hash and no longer rely on inetpeer. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
67ba4152 |
|
24-Aug-2014 |
Ian Morris <ipm@chirality.org.uk> |
ipv6: White-space cleansing : Line Layouts This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. A number of items are addressed in this patch: * Multiple spaces converted to tabs * Spaces before tabs removed. * Spaces in pointer typing cleansed (char *)foo etc. * Remove space after sizeof * Ensure spacing around comparators such as if statements. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7304fe46 |
|
31-Jul-2014 |
Duan Jiong <duanj.fnst@cn.fujitsu.com> |
net: fix the counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS When dealing with ICMPv[46] Error Message, function icmp_socket_deliver() and icmpv6_notify() do some valid checks on packet's length, but then some protocols check packet's length redaudantly. So remove those duplicated statements, and increase counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS in function icmp_socket_deliver() and icmpv6_notify() respectively. In addition, add missed counter in udp6/udplite6 when socket is NULL. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e110861f |
|
13-May-2014 |
Lorenzo Colitti <lorenzo@google.com> |
net: add a sysctl to reflect the fwmark on replies Kernel-originated IP packets that have no user socket associated with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.) are emitted with a mark of zero. Add a sysctl to make them have the same mark as the packet they are replying to. This allows an administrator that wishes to do so to use mark-based routing, firewalling, etc. for these replies by marking the original packets inbound. Tested using user-mode linux: - ICMP/ICMPv6 echo replies and errors. - TCP RST packets (IPv4 and IPv6). Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
39471ac8 |
|
07-May-2014 |
Tom Herbert <therbert@google.com> |
icmp6: Call skb_checksum_validate Use skb_checksum_validate to verify checksum. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5c98631c |
|
28-Apr-2014 |
Lorenzo Colitti <lorenzo@google.com> |
net: ipv6: Introduce ip6_sk_dst_hoplimit. This replaces 6 identical code snippets with a call to a new static inline function. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
43a43b60 |
|
31-Mar-2014 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: some ipv6 statistic counters failed to disable bh After commit c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify processing to workqueue") some counters are now updated in process context and thus need to disable bh before doing so, otherwise deadlocks can happen on 32-bit archs. Fabio Estevam noticed this while while mounting a NFS volume on an ARM board. As a compensation for missing this I looked after the other *_STATS_BH and found three other calls which need updating: 1) icmp6_send: ip6_fragment -> icmpv6_send -> icmp6_send (error handling) 2) ip6_push_pending_frames: rawv6_sendmsg -> rawv6_push_pending_frames -> ... (only in case of icmp protocol with raw sockets in error handling) 3) ping6_v6_sendmsg (error handling) Fixes: c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify processing to workqueue") Reported-by: Fabio Estevam <festevam@gmail.com> Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d94c1f92 |
|
07-Feb-2014 |
FX Le Bail <fx.lebail@yahoo.com> |
ipv6: icmp6_send: fix Oops when pinging a not set up IPv6 peer on a sit tunnel The patch 446fab59333dea91e54688f033dd8d788d0486fb ("ipv6: enable anycast addresses as source addresses in ICMPv6 error messages") causes an Oops when pinging a not set up IPv6 peer on a sit tunnel. The problem is that ipv6_anycast_destination() uses unconditionally skb_dst(skb), which is NULL in this case. The solution is to use instead the ipv6_chk_acast_addr_src() function. Here are the steps to reproduce it: modprobe sit ip link add sit1 type sit remote 10.16.0.121 local 10.16.0.249 ip l s sit1 up ip -6 a a dev sit1 2001:1234::123 remote 2001:1234::121 ping6 2001:1234::121 Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
446fab59 |
|
19-Jan-2014 |
FX Le Bail <fx.lebail@yahoo.com> |
ipv6: enable anycast addresses as source addresses in ICMPv6 error messages - Uses ipv6_anycast_destination() in icmp6_send(). Suggested-by: Bill Fink <billfink@mindspring.com> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ec35b61e |
|
13-Jan-2014 |
FX Le Bail <fx.lebail@yahoo.com> |
IPv6: move the anycast_src_echo_reply sysctl to netns_sysctl_ipv6 This change move anycast_src_echo_reply sysctl with other ipv6 sysctls. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
825edac4 |
|
11-Jan-2014 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: copy traffic class from ping request to reply Suggested-by: Simon Schneider <simon-schneider@gmx.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
509aba3b |
|
07-Jan-2014 |
FX Le Bail <fx.lebail@yahoo.com> |
IPv6: add the option to use anycast addresses as source addresses in echo reply This change allows to follow a recommandation of RFC4942. - Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses as source addresses for ICMPv6 echo reply. This sysctl is false by default to preserve existing behavior. - Add inline check ipv6_anycast_destination(). - Use them in icmpv6_echo_reply(). Reference: RFC4942 - IPv6 Transition/Coexistence Security Considerations (http://tools.ietf.org/html/rfc4942#section-2.1.6) 2.1.6. Anycast Traffic Identification and Security [...] To avoid exposing knowledge about the internal structure of the network, it is recommended that anycast servers now take advantage of the ability to return responses with the anycast address as the source address if possible. Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e8243534 |
|
29-Dec-2013 |
stephen hemminger <stephen@networkplumber.org> |
ipv6: namespace cleanups Running 'make namespacecheck' shows: net/ipv6/route.o ipv6_route_table_template rt6_bind_peer net/ipv6/icmp.o icmpv6_route_lookup ipv6_icmp_table_template This addresses some of those warnings by: * make icmpv6_route_lookup static * move inline's out of ip6_route.h since only used into route.c * move rt6_bind_peer into route.c Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b55b76b2 |
|
04-Sep-2013 |
Duan Jiong <duanj.fnst@cn.fujitsu.com> |
ipv6:introduce function to find route for redirect RFC 4861 says that the IP source address of the Redirect is the same as the current first-hop router for the specified ICMP Destination Address, so the gateway should be taken into consideration when we find the route for redirect. There was once a check in commit a6279458c534d01ccc39498aba61c93083ee0372 ("NDISC: Search over all possible rules on receipt of redirect.") and the check went away in commit b94f1c0904da9b8bf031667afc48080ba7c3e8c9 ("ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect()"). The bug is only "exploitable" on layer-2 because the source address of the redirect is checked to be a valid link-local address but it makes spoofing a lot easier in the same L2 domain nonetheless. Thanks very much for Hannes's help. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
61e76b17 |
|
30-Aug-2013 |
Jiri Bohac <jbohac@suse.cz> |
ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTO RFC 4443 has defined two additional codes for ICMPv6 type 1 (destination unreachable) messages: 5 - Source address failed ingress/egress policy 6 - Reject route to destination Now they are treated as protocol error and icmpv6_err_convert() converts them to EPROTO. RFC 4443 says: "Codes 5 and 6 are more informative subsets of code 1." Treat codes 5 and 6 as code 1 (EACCES) Btw, connect() returning -EPROTO confuses firefox, so that fallback to other/IPv4 addresses does not work: https://bugzilla.mozilla.org/show_bug.cgi?id=910773 Signed-off-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fe2c6338 |
|
12-Jun-2013 |
Joe Perches <joe@perches.com> |
net: Convert uses of typedef ctl_table to struct ctl_table Reduce the uses of this unnecessary typedef. Done via perl script: $ git grep --name-only -w ctl_table net | \ xargs perl -p -i -e '\ sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \ s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge' Reflow the modified lines that now exceed 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
29a3cad5 |
|
28-May-2013 |
Simon Horman <horms@verge.net.au> |
ipv6: Correct comparisons and calculations using skb->tail and skb-transport_header This corrects an regression introduced by "net: Use 16bits for *_headers fields of struct skbuff" when NET_SKBUFF_DATA_USES_OFFSET is not set. In that case skb->tail will be a pointer whereas skb->transport_header will be an offset from head. This is corrected by using wrappers that ensure that comparisons and calculations are always made using pointers. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6d0bfe22 |
|
22-May-2013 |
Lorenzo Colitti <lorenzo@google.com> |
net: ipv6: Add IPv6 support to the ping socket. This adds the ability to send ICMPv6 echo requests without a raw socket. The equivalent ability for ICMPv4 was added in 2011. Instead of having separate code paths for IPv4 and IPv6, make most of the code in net/ipv4/ping.c dual-stack and only add a few IPv6-specific bits (like the protocol definition) to a new net/ipv6/ping.c. Hopefully this will reduce divergence and/or duplication of bugs in the future. Caveats: - Setting options via ancillary data (e.g., using IPV6_PKTINFO to specify the outgoing interface) is not yet supported. - There are no separate security settings for IPv4 and IPv6; everything is controlled by /proc/net/ipv4/ping_group_range. - The proc interface does not yet display IPv6 ping sockets properly. Tested with a patched copy of ping6 and using raw socket calls. Compiles and works with all of CONFIG_IPV6={n,m,y}. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6a5dc9e5 |
|
29-Apr-2013 |
Eric Dumazet <edumazet@google.com> |
net: Add MIB counters for checksum errors Add MIB counters for checksum errors in IP layer, and TCP/UDP/ICMP layers, to help diagnose problems. $ nstat -a | grep Csum IcmpInCsumErrors 72 0.0 TcpInCsumErrors 382 0.0 UdpInCsumErrors 463221 0.0 Icmp6InCsumErrors 75 0.0 Udp6InCsumErrors 173442 0.0 IpExtInCsumErrors 10884 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5f5624cf |
|
25-Apr-2013 |
Pravin B Shelar <pshelar@nicira.com> |
ipv6: Kill ipv6 dependency of icmpv6_send(). Following patch adds icmp-registration module for ipv6. It allows ipv6 protocol to register icmp_sender which is used for sending ipv6 icmp msgs. This extra layer allows us to kill ipv6 dependency for sending icmp packets. This patch also fixes ip_tunnel compilation problem when ip_tunnel is statically compiled in kernel but ipv6 is module Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
842df073 |
|
07-Mar-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: use newly introduced __ipv6_addr_needs_scope_id and ipv6_iface_scope_id This patch requires multicast interface-scoped addresses to supply a sin6_scope_id. Because the sin6_scope_id is now also correctly used in case of interface-scoped multicast traffic this enables one to use interface scoped addresses over interfaces which are not targeted by the default multicast route (the route has to be put there manually, though). getsockname() and getpeername() now return the correct sin6_scope_id in case of interface-local mc addresses. v2: a) rebased ontop of patch 1/4 (now uses ipv6_addr_props) v3: a) reverted changes for ipv6_addr_props v4: a) unchanged Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>dave Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6f809da2 |
|
16-Jan-2013 |
Steffen Klassert <steffen.klassert@secunet.com> |
ipv6: Add an error handler for icmp6 pmtu and redirect events are now handled in the protocols error handler, so add an error handler for icmp6 to do this. It is needed in the case when we have no socket context. Based on a patch by Duan Jiong. Reported-by: Duan Jiong <djduanjiong@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c027aab4 |
|
15-Nov-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Enable some sysctls that are safe for the userns root - Enable the per device ipv4 sysctls: net/ipv4/conf/<if>/forwarding net/ipv4/conf/<if>/mc_forwarding net/ipv4/conf/<if>/accept_redirects net/ipv4/conf/<if>/secure_redirects net/ipv4/conf/<if>/shared_media net/ipv4/conf/<if>/rp_filter net/ipv4/conf/<if>/send_redirects net/ipv4/conf/<if>/accept_source_route net/ipv4/conf/<if>/accept_local net/ipv4/conf/<if>/src_valid_mark net/ipv4/conf/<if>/proxy_arp net/ipv4/conf/<if>/medium_id net/ipv4/conf/<if>/bootp_relay net/ipv4/conf/<if>/log_martians net/ipv4/conf/<if>/tag net/ipv4/conf/<if>/arp_filter net/ipv4/conf/<if>/arp_announce net/ipv4/conf/<if>/arp_ignore net/ipv4/conf/<if>/arp_accept net/ipv4/conf/<if>/arp_notify net/ipv4/conf/<if>/proxy_arp_pvlan net/ipv4/conf/<if>/disable_xfrm net/ipv4/conf/<if>/disable_policy net/ipv4/conf/<if>/force_igmp_version net/ipv4/conf/<if>/promote_secondaries net/ipv4/conf/<if>/route_localnet - Enable the global ipv4 sysctl: net/ipv4/ip_forward - Enable the per device ipv6 sysctls: net/ipv6/conf/<if>/forwarding net/ipv6/conf/<if>/hop_limit net/ipv6/conf/<if>/mtu net/ipv6/conf/<if>/accept_ra net/ipv6/conf/<if>/accept_redirects net/ipv6/conf/<if>/autoconf net/ipv6/conf/<if>/dad_transmits net/ipv6/conf/<if>/router_solicitations net/ipv6/conf/<if>/router_solicitation_interval net/ipv6/conf/<if>/router_solicitation_delay net/ipv6/conf/<if>/force_mld_version net/ipv6/conf/<if>/use_tempaddr net/ipv6/conf/<if>/temp_valid_lft net/ipv6/conf/<if>/temp_prefered_lft net/ipv6/conf/<if>/regen_max_retry net/ipv6/conf/<if>/max_desync_factor net/ipv6/conf/<if>/max_addresses net/ipv6/conf/<if>/accept_ra_defrtr net/ipv6/conf/<if>/accept_ra_pinfo net/ipv6/conf/<if>/accept_ra_rtr_pref net/ipv6/conf/<if>/router_probe_interval net/ipv6/conf/<if>/accept_ra_rt_info_max_plen net/ipv6/conf/<if>/proxy_ndp net/ipv6/conf/<if>/accept_source_route net/ipv6/conf/<if>/optimistic_dad net/ipv6/conf/<if>/mc_forwarding net/ipv6/conf/<if>/disable_ipv6 net/ipv6/conf/<if>/accept_dad net/ipv6/conf/<if>/force_tllao - Enable the global ipv6 sysctls: net/ipv6/bindv6only net/ipv6/icmp/ratelimit Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
464dc801 |
|
15-Nov-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Don't export sysctls to unprivileged users In preparation for supporting the creation of network namespaces by unprivileged users, modify all of the per net sysctl exports and refuse to allow them to unprivileged users. This makes it safe for unprivileged users in general to access per net sysctls, and allows sysctls to be exported to unprivileged users on an individual basis as they are deemed safe. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07a93626 |
|
29-Oct-2012 |
Amerigo Wang <amwang@redhat.com> |
ipv6: use IS_ENABLED() #if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE) can be replaced by #if IS_ENABLED(CONFIG_FOO) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b94f1c09 |
|
12-Jul-2012 |
David S. Miller <davem@davemloft.net> |
ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect(). And delete rt6_redirect(), since it is no longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1d861aa4 |
|
10-Jul-2012 |
David S. Miller <davem@davemloft.net> |
inet: Minimize use of cached route inetpeer. Only use it in the absolutely required cases: 1) COW'ing metrics 2) ipv4 PMTU 3) ipv4 redirects Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f9242b6b |
|
19-Jun-2012 |
David S. Miller <davem@davemloft.net> |
inet: Sanitize inet{,6} protocol demux. Don't pretend that inet_protos[] and inet6_protos[] are hashes, thay are just a straight arrays. Remove all unnecessary hash masking. Document MAX_INET_PROTOS. Use RAW_HTABLE_SIZE when appropriate. Reported-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
81aded24 |
|
15-Jun-2012 |
David S. Miller <davem@davemloft.net> |
ipv6: Handle PMTU in ICMP error handlers. One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts to handle the error pass the 32-bit info cookie in network byte order whereas ipv4 passes it around in host byte order. Like the ipv4 side, we have two helper functions. One for when we have a socket context and one for when we do not. ip6ip6 tunnels are not handled here, because they handle PMTU events by essentially relaying another ICMP packet-too-big message back to the original sender. This patch allows us to get rid of rt6_do_pmtu_disc(). It handles all kinds of situations that simply cannot happen when we do the PMTU update directly using a fully resolved route. In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very likely be removed or changed into a BUG_ON() check. We should never have a prefixed ipv6 route when we get there. Another piece of strange history here is that TCP and DCCP, unlike in ipv4, never invoke the update_pmtu() method from their ICMP error handlers. This is incredibly astonishing since this is the context where we have the most accurate context in which to make a PMTU update, namely we have a fully connected socket and associated cached socket route. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fbfe95a4 |
|
09-Jun-2012 |
David S. Miller <davem@davemloft.net> |
inet: Create and use rt{,6}_get_peer_create(). There's a lot of places that open-code rt{,6}_get_peer() only because they want to set 'create' to one. So add an rt{,6}_get_peer_create() for their sake. There were also a few spots open-coding plain rt{,6}_get_peer() and those are transformed here as well. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a50feda5 |
|
18-May-2012 |
Eric Dumazet <edumazet@google.com> |
ipv6: bool/const conversions phase2 Mostly bool conversions, some inline removals and const additions. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3213831 |
|
15-May-2012 |
Joe Perches <joe@perches.com> |
net: ipv6: Standardize prefixes for message logging Add #define pr_fmt(fmt) as appropriate. Add "IPv6: " to appropriate files. Convert printk(KERN_<LEVEL> to pr_<level> (but not KERN_DEBUG). Standardize on "%s: " not "%s(): " when emitting __func__. Use "%s: ", __func__ instead of embedding function name. Coalesce formats, align arguments. ADDRCONF output is now prefixed with "IPv6: " Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a2d91a09 |
|
01-Apr-2012 |
Eldad Zack <eldad@fogrefinery.com> |
net/ipv6/icmp.c: Checkpatch cleanups icmp.c:501: ERROR: "(foo*)" should be "(foo *)" icmp.c:582: ERROR: "(foo*)" should be "(foo *)" icmp.c:954: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ffc93f2 |
|
28-Mar-2012 |
David Howells <dhowells@redhat.com> |
Remove all #inclusions of asm/system.h Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
|
#
c4062dfc |
|
08-Feb-2012 |
Erich E. Hoover <ehoover@mines.edu> |
ipv6: Implement IPV6_UNICAST_IF socket option. The IPV6_UNICAST_IF feature is the IPv6 compliment to IP_UNICAST_IF. Signed-off-by: Erich E. Hoover <ehoover@mines.edu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75f2811c |
|
30-Nov-2011 |
Jesse Gross <jesse@nicira.com> |
ipv6: Add fragment reporting to ipv6_skip_exthdr(). While parsing through IPv6 extension headers, fragment headers are skipped making them invisible to the caller. This reports the fragment offset of the last header in order to make it possible to determine whether the packet is fragmented and, if so whether it is a first or last fragment. Signed-off-by: Jesse Gross <jesse@nicira.com>
|
#
4e3fd7a0 |
|
20-Nov-2011 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: remove ipv6_addr_copy() C assignment can handle struct in6_addr copying. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
87fb4b7b |
|
13-Oct-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: more accurate skb truesize skb truesize currently accounts for sk_buff struct and part of skb head. kmalloc() roundings are also ignored. Considering that skb_shared_info is larger than sk_buff, its time to take it into account for better memory accounting. This patch introduces SKB_TRUESIZE(X) macro to centralize various assumptions into a single place. At skb alloc phase, we put skb_shared_info struct at the exact end of skb head, to allow a better use of memory (lowering number of reallocations), since kmalloc() gives us power-of-two memory blocks. Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are aligned to cache lines, as before. Note: This patch might trigger performance regressions because of misconfigured protocol stacks, hitting per socket or global memory limits that were previously not reached. But its a necessary step for a more accurate memory accounting. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Andi Kleen <ak@linux.intel.com> CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cfdf7647 |
|
27-Jul-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv6: some RCU conversions ICMP and ND are not fast path, but still we can avoid changing idev refcount, using RCU. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b71d1d42 |
|
21-Apr-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
inet: constify ip headers and in6_addr Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1958b856 |
|
12-Mar-2011 |
David S. Miller <davem@davemloft.net> |
net: Put fl6_* macros to struct flowi6 and use them again. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4c9483b2 |
|
12-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv6: Convert to use flowi6 where applicable. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6281dcc9 |
|
11-Mar-2011 |
David S. Miller <davem@davemloft.net> |
net: Make flowi ports AF dependent. Create two sets of port member accessors, one set prefixed by fl4_* and the other prefixed by fl6_* This will let us to create AF optimal flow instances. It will work because every context in which we access the ports, we have to be fully aware of which AF the flowi is anyways. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1d28f42c |
|
11-Mar-2011 |
David S. Miller <davem@davemloft.net> |
net: Put flowi_* prefix on AF independent members of struct flowi I intend to turn struct flowi into a union of AF specific flowi structs. There will be a common structure that each variant includes first, much like struct sock_common. This is the first step to move in that direction. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
452edd59 |
|
02-Mar-2011 |
David S. Miller <davem@davemloft.net> |
xfrm: Return dst directly from xfrm_lookup() Instead of on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b42835db |
|
01-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv6: Make icmp route lookup code a bit clearer. The route lookup code in icmpv6_send() is slightly tricky as a result of having to handle all of the requirements of RFC 4301 host relookups. Pull the route resolution into a seperate function, so that the error handling and route reference counting is hopefully easier to see and contained wholly within this new routine. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
92d86829 |
|
04-Feb-2011 |
David S. Miller <davem@davemloft.net> |
inetpeer: Move ICMP rate limiting state into inet_peer entries. Like metrics, the ICMP rate limiting bits are cached state about a destination. So move it into the inet_peer entries. If an inet_peer cannot be bound (the reason is memory allocation failure or similar), the policy is to allow. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
00d9d6a1 |
|
07-Jun-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv6: fix ICMP6_MIB_OUTERRORS In commit 1f8438a85366 (icmp: Account for ICMP out errors), I did a typo on IPV6 side, using ICMP6_MIB_OUTMSGS instead of ICMP6_MIB_OUTERRORS Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
13b52cd4 |
|
23-Apr-2010 |
Brian Haley <brian.haley@hp.com> |
IPv6: Add dontfrag argument to relevant functions Add dontfrag argument to relevant functions for IPV6_DONTFRAG support, as well as allowing the value to be passed-in via ancillary cmsg data. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f8438a8 |
|
03-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
icmp: Account for ICMP out errors When ip_append() fails because of socket limit or memory shortage, increment ICMP_MIB_OUTERRORS counter, so that "netstat -s" can report these errors. LANG=C netstat -s | grep "ICMP messages failed" 0 ICMP messages failed For IPV6, implement ICMP6_MIB_OUTERRORS counter as well. # grep Icmp6OutErrors /proc/net/dev_snmp6/* /proc/net/dev_snmp6/eth0:Icmp6OutErrors 0 /proc/net/dev_snmp6/lo:Icmp6OutErrors 0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
3ffe533c |
|
18-Feb-2010 |
Alexey Dobriyan <adobriyan@gmail.com> |
ipv6: drop unused "dev" arg of icmpv6_send() Dunno, what was the idea, it wasn't used for a long time. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bc417d99 |
|
18-Feb-2010 |
Alexey Dobriyan <adobriyan@gmail.com> |
ipv6: remove stale MIB definitions ICMP6 MIB statistics was per-netns for quite a time. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2c8c1e72 |
|
16-Jan-2010 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: spread __net_init, __net_exit __net_init/__net_exit are apparently not going away, so use them to full extent. In some cases __net_init was removed, because it was called from __net_exit code. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f8572d8f |
|
05-Nov-2009 |
Eric W. Biederman <ebiederm@xmission.com> |
sysctl net: Remove unused binary sysctl code Now that sys_sysctl is a compatiblity wrapper around /proc/sys all sysctl strategy routines, and all ctl_name and strategy entries in the sysctl tables are unused, and can be revmoed. In addition neigh_sysctl_register has been modified to no longer take a strategy argument and it's callers have been modified not to pass one. Cc: "David Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
41135cc8 |
|
13-Sep-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: constify struct inet6_protocol Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e651f03a |
|
09-Aug-2009 |
Gerrit Renker <gerrit@erg.abdn.ac.uk> |
inet6: Conversion from u8 to int This replaces assignments of the type "int on LHS" = "u8 on RHS" with simpler code. The LHS can express all of the unsigned right hand side values, hence the assigned value can not be negative. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d5fdd6ba |
|
23-Jun-2009 |
Brian Haley <brian.haley@hp.com> |
ipv6: Use correct data types for ICMPv6 type and code Change all the code that deals directly with ICMPv6 type and code values to use u8 instead of a signed int as that's the actual data type. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c06a478 |
|
27-Jan-2009 |
Jiri Pirko <jpirko@redhat.com> |
net: fix xfrm reverse flow lookup for icmp6 This patch fixes the xfrm reverse flow lookup for icmp6 so that icmp6 packets don't get lost over ipsec tunnels. Similar patch is in RHEL5 kernel for a quite long time and I do not see why it isn't in mainline. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52479b62 |
|
25-Nov-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
netns xfrm: lookup in netns Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns to flow_cache_lookup() and resolver callback. Take it from socket or netdevice. Stub DECnet to init_net. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07f0757a |
|
19-Nov-2008 |
Joe Perches <joe@perches.com> |
include/net net/ - csum_partial - remove unnecessary casts The first argument to csum_partial is const void * casts to char/u8 * are not necessary Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6d9f239a |
|
03-Nov-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: '&' redux I want to compile out proc_* and sysctl_* handlers totally and stub them to NULL depending on config options, however usage of & will prevent this, since taking adress of NULL pointer will break compilation. So, drop & in front of every ->proc_handler and every ->strategy handler, it was never needed in fact. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b095d989 |
|
29-Oct-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: replace %p6 with %pI6 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0c6ce78a |
|
28-Oct-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: replace uses of NIP6_FMT with %p6 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
def8b4fa |
|
28-Oct-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: reduce structures when XFRM=n ifdef out * struct sk_buff::sp (pointer) * struct dst_entry::xfrm (pointer) * struct sock::sk_policy (2 pointers) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
55d43808 |
|
08-Oct-2008 |
Denis V. Lunev <den@openvz.org> |
ipv6: added net argument to ICMP6MSGIN_INC_STATS_BH Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e41b5368 |
|
08-Oct-2008 |
Denis V. Lunev <den@openvz.org> |
ipv6: added net argument to ICMP6_INC_STATS_BH Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3bd653c8 |
|
08-Oct-2008 |
Denis V. Lunev <den@openvz.org> |
netns: add net parameter to IP6_INC_STATS Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fdc0bde9 |
|
23-Aug-2008 |
Denis V. Lunev <den@openvz.org> |
icmp: icmp_sk() should not use smp_processor_id() in preemptible code Pass namespace into icmp_xmit_lock, obtain socket inside and return it as a result for caller. Thanks Alexey Dobryan for this report: Steps to reproduce: CONFIG_PREEMPT=y CONFIG_DEBUG_PREEMPT=y tracepath <something> BUG: using smp_processor_id() in preemptible [00000000] code: tracepath/3205 caller is icmp_sk+0x15/0x30 Pid: 3205, comm: tracepath Not tainted 2.6.27-rc4 #1 Call Trace: [<ffffffff8031af14>] debug_smp_processor_id+0xe4/0xf0 [<ffffffff80409405>] icmp_sk+0x15/0x30 [<ffffffff8040a17b>] icmp_send+0x4b/0x3f0 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8023a475>] ? local_bh_enable_ip+0x95/0x110 [<ffffffff804285b9>] ? _spin_unlock_bh+0x39/0x40 [<ffffffff8025a26c>] ? mark_held_locks+0x4c/0x90 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160 [<ffffffff803e91b4>] ip_fragment+0x8d4/0x900 [<ffffffff803e7030>] ? ip_finish_output2+0x0/0x290 [<ffffffff803e91e0>] ? ip_finish_output+0x0/0x60 [<ffffffff803e6650>] ? dst_output+0x0/0x10 [<ffffffff803e922c>] ip_finish_output+0x4c/0x60 [<ffffffff803e92e3>] ip_output+0xa3/0xf0 [<ffffffff803e68d0>] ip_local_out+0x20/0x30 [<ffffffff803e753f>] ip_push_pending_frames+0x27f/0x400 [<ffffffff80406313>] udp_push_pending_frames+0x233/0x3d0 [<ffffffff804067d1>] udp_sendmsg+0x321/0x6f0 [<ffffffff8040d155>] inet_sendmsg+0x45/0x80 [<ffffffff803b967f>] sock_sendmsg+0xdf/0x110 [<ffffffff8024a100>] ? autoremove_wake_function+0x0/0x40 [<ffffffff80257ce5>] ? validate_chain+0x415/0x1010 [<ffffffff8027dc10>] ? __do_fault+0x140/0x450 [<ffffffff802597d0>] ? __lock_acquire+0x260/0x590 [<ffffffff803b9e55>] ? sockfd_lookup_light+0x45/0x80 [<ffffffff803ba50a>] sys_sendto+0xea/0x120 [<ffffffff80428e42>] ? _spin_unlock_irqrestore+0x42/0x80 [<ffffffff803134bc>] ? __up_read+0x4c/0xb0 [<ffffffff8024e0c6>] ? up_read+0x26/0x30 [<ffffffff8020b8bb>] system_call_fastpath+0x16/0x1b icmp6_sk() is similar. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6dbf4bca |
|
01-Jul-2008 |
Stephen Hemminger <shemminger@vyatta.com> |
icmp: fix units for ratelimit Convert the sysctl values for icmp ratelimit to use milliseconds instead of jiffies which is based on kernel configured HZ. Internal kernel jiffies are not a proper unit for any userspace API. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0b040829 |
|
10-Jun-2008 |
Adrian Bunk <bunk@kernel.org> |
net: remove CVS keywords This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ed8516f |
|
03-Apr-2008 |
Denis V. Lunev <den@openvz.org> |
[IPV6]: Simplify IPv6 control sockets creation. Do this by replacing sock_create_kern with inet_ctl_sock_create. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
af268182 |
|
03-Apr-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
[ICMP]: Ensure that ICMP relookup maintains status quo The ICMP relookup path is only meant to modify behaviour when appropriate IPsec policies are in place and marked as requiring relookups. It is certainly not meant to modify behaviour when IPsec policies don't exist at all. However, due to an oversight on the error paths existing behaviour may in fact change should one of the relookup steps fail. This patch corrects this by redirecting all errors on relookup failures to the previous code path. That is, if the initial xfrm_lookup let the packet pass, we will stand by that decision should the relookup fail due to an error. This should be safe from a security point-of-view because compliant systems must install a default deny policy so the packet would'nt have passed in that case. Many thanks to Julian Anastasov for pointing out this error. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f32c5f2c |
|
02-Apr-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV6]: Fix ICMP relookup error path dst leak When we encounter an error while looking up the dst the second time we need to drop the first dst. This patch is pretty much the same as the one for IPv4. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3b1e0a65 |
|
25-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
c346dca1 |
|
25-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
6b75d090 |
|
10-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Optimize hop-limit determination. Last part of hop-limit determination is always: hoplimit = dst_metric(dst, RTAX_HOPLIMIT); if (hoplimit < 0) hoplimit = ipv6_get_hoplimit(dst->dev). Let's consolidate it as ip6_dst_hoplimit(dst). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
9a43b709 |
|
05-Mar-2008 |
Benjamin Thery <benjamin.thery@bull.net> |
[NETNS][IPV6] icmp6 - make icmpv6_socket per namespace This patch make the changes necessary to support network namespaces in ICMPv6. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4591db4f |
|
05-Mar-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6] route6 - add netns parameter to ip6_route_output Add an netns parameter to ip6_route_output. That will allow to access to the right routing table for outgoing traffic. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
95e41e93 |
|
06-Dec-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Make ndisc_flow_init() common for later use. For later use, this patch is renaming ndisc_flow_init() to icmpv6_flow_init() and putting it in common place. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
5ee09105 |
|
27-Feb-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] SYSCTL: complete initialization for sysctl table in subsystem code. Move initialization bits for subsystem sysctl tables to appropriate functions. - route - icmp Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
8ed7edce |
|
03-Mar-2008 |
Alexey Dobriyan <adobriyan@sw.ru> |
ipv6: fix inet6_init/icmpv6_cleanup sections mismatch Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
98c6d1b2 |
|
29-Feb-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Make icmpv6_sk per namespace. All preparations are done. Now just add a hook to perform an initialization on namespace startup and replace icmpv6_sk macro with proper inline call. Actual namespace the packet belongs too will be passed later along with the one for the routing. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5c8cafd6 |
|
29-Feb-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: icmp(v6)_sk should not pin a namespace. So, change icmp(v6)_sk creation/disposal to the scheme used in the netlink for rtnl, i.e. create a socket in the context of the init_net and assign the namespace without getting a referrence later. Also use sk_release_kernel instead of sock_release to properly destroy such sockets. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
79c91159 |
|
29-Feb-2008 |
Denis V. Lunev <den@openvz.org> |
[ICMP]: Allocate data for __icmp(v6)_sk dynamically. Own __icmp(v6)_sk should be present in each namespace. So, it should be allocated dynamically. Though, alloc_percpu does not fit the case as it implies additional dereferrence for no bonus. Allocate data for pointers just like __percpu_alloc_mask does and place pointers to struct sock into this array. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
405666db |
|
29-Feb-2008 |
Denis V. Lunev <den@openvz.org> |
[ICMP]: Pass proper ICMP socket into icmp(v6)_xmit_(un)lock. We have to get socket lock inside icmp(v6)_xmit_lock/unlock. The socket is get from global variable now. When this code became namespaces, one should pass a namespace and get socket from it. Though, above is useless. Socket is available in the caller, just pass it inside. This saves a bit of code now and saves more later. add/remove: 0/0 grow/shrink: 1/3 up/down: 1/-169 (-168) function old new delta icmp_rcv 718 719 +1 icmpv6_rcv 2343 2303 -40 icmp_send 1566 1518 -48 icmp_reply 549 468 -81 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b7e729c4 |
|
29-Feb-2008 |
Denis V. Lunev <den@openvz.org> |
[ICMP]: Store sock rather than socket for ICMP flow control. Basically, there is no difference, what to store: socket or sock. Though, sock looks better as there will be 1 less dereferrence on the fast path. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b0f976f |
|
29-Feb-2008 |
Denis V. Lunev <den@openvz.org> |
[INET]: Remove struct net_proto_family* from _init calls. struct net_proto_family* is not used in icmp[v6]_init, ndisc_init, igmp_init and tcp_v4_init. Remove it. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8cf22943 |
|
05-Feb-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
[ICMP]: Restore pskb_pull calls in receive function Somewhere along the development of my ICMP relookup patch the header length check went AWOL on the non-IPsec path. This patch restores the check. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bfeade08 |
|
10-Jan-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6]: inet6_addr - check ipv6 address per namespace When a new address is added, we must check if the new address does not already exists. This patch makes this check to be aware of a network namespace, so the check will look if the address already exists for the specified network namespace. While the addresses are browsed, the addresses which do not belong to the namespace are discarded. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
41a76906 |
|
10-Jan-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6]: Make icmpv6_time sysctl per namespace. This patch moves the icmpv6_time sysctl to the network namespace structure. Because the ipv6 protocol is not yet per namespace, the variable is accessed relatively to the initial network namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
760f2d01 |
|
10-Jan-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6]: Make multiple instance of sysctl tables. Each network namespace wants its own set of sysctl value, eg. we should not be able from a namespace to set a sysctl value for another namespace , especially for the initial network namespace. This patch duplicates the sysctl table when we register a new network namespace for ipv6. The duplicated table are postfixed with the "template" word to notify the developper the table is cloned. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aebcf82c |
|
12-Dec-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPSEC]: Do not let packets pass when ICMP flag is off This fixes a logical error in ICMP policy checks which lets packets through if the state ICMP flag is off. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7233b9f3 |
|
12-Dec-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPSEC]: Fix reversed ICMP6 policy check The policy check I added for ICMP on IPv6 is reversed. This patch fixes that. It also adds an skb->sp check so that unprotected packets that fail the policy check do not crash the machine. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8b7817f3 |
|
12-Dec-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPSEC]: Add ICMP host relookup support RFC 4301 requires us to relookup ICMP traffic that does not match any policies using the reverse of its payload. This patch implements this for ICMP traffic that originates from or terminates on localhost. This is activated on outbound with the new policy flag XFRM_POLICY_ICMP, and on inbound by the new state flag XFRM_STATE_ICMP. On inbound the policy check is now performed by the ICMP protocol so that it can repeat the policy check where necessary. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69d6da0b |
|
19-Nov-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[IPv6] RAW: Compact the API for the kernel Same as in the previous patch for ipv4, compact the API and hide hash table and rwlock inside the raw.c file. Plus fix some "bad" places from checkpatch.pl point of view (assignments inside if()). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c964ff4f |
|
21-Jan-2008 |
Wang Chen <wangchen@cn.fujitsu.com> |
[IPV6]: ICMP6_MIB_OUTMSGS increment duplicated icmpv6_send() calls ip6_push_pending_frames() indirectly. Both ip6_push_pending_frames() and icmpv6_send() increment counter ICMP6_MIB_OUTMSGS. This patch remove the increment from icmpv6_send. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5bbef20 |
|
15-Oct-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV6]: Replace sk_buff ** with sk_buff * in input handlers With all the users of the double pointers removed from the IPv6 input path, this patch converts all occurances of sk_buff ** to sk_buff * in IPv6 input handlers. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
14878f75 |
|
16-Sep-2007 |
David L Stevens <dlstevens@us.ibm.com> |
[IPV6]: Add ICMPMsgStats MIB (RFC 4293) [rev 2] Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches "remove" (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for "Icmp" to get the named data from new counters. [new to 2nd revision] 6) support per-interface ICMP stats 7) use common macro for per-device stat macros Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f13ec93f |
|
15-Jul-2007 |
Dmitry Butskoy <dmitry@butskoy.name> |
[IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets From: Dmitry Butskoy <dmitry@butskoy.name> Taken from http://bugzilla.kernel.org/show_bug.cgi?id=8747 Problem Description: It is related to the possibility to obtain MSG_ERRQUEUE messages from the udp and raw sockets, both connected and unconnected. There is a little typo in net/ipv6/icmp.c code, which prevents such messages to be delivered to the errqueue of the correspond raw socket, when the socket is CONNECTED. The typo is due to swap of local/remote addresses. Consider __raw_v6_lookup() function from net/ipv6/raw.c. When a raw socket is looked up usual way, it is something like: sk = __raw_v6_lookup(sk, nexthdr, daddr, saddr, IP6CB(skb)->iif); where "daddr" is a destination address of the incoming packet (IOW our local address), "saddr" is a source address of the incoming packet (the remote end). But when the raw socket is looked up for some icmp error report, in net/ipv6/icmp.c:icmpv6_notify() , daddr/saddr are obtained from the echoed fragment of the "bad" packet, i.e. "daddr" is the original destination address of that packet, "saddr" is our local address. Hence, for icmpv6_notify() must use "saddr, daddr" in its arguments, not "daddr, saddr" ... Steps to reproduce: Create some raw socket, connect it to an address, and cause some error situation: f.e. set ttl=1 where the remote address is more than 1 hop to reach. Set IPV6_RECVERR . Then send something and wait for the error (f.e. poll() with POLLERR|POLLIN). You should receive "time exceeded" icmp message (because of "ttl=1"), but the socket do not receive it. If you do not connect your raw socket, you will receive MSG_ERRQUEUE successfully. (The reason is that for unconnected socket there are no actual checks for local/remote addresses). Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
59fbb3a6 |
|
27-Jun-2007 |
Masahide NAKAMURA <nakam@linux-ipv6.org> |
[IPV6] MIP6: Loadable module support for MIPv6. This patch makes MIPv6 loadable module named "mip6". Here is a modprobe.conf(5) example to load it automatically when user application uses XFRM state for MIPv6: alias xfrm-type-10-43 mip6 alias xfrm-type-10-60 mip6 Some MIPv6 feature is not included by this modular, however, it should not be affected to other features like either IPsec or IPv6 with and without the patch. We may discuss XFRM, MH (RAW socket) and ancillary data/sockopt separately for future work. Loadable features: * MH receiving check (to send ICMP error back) * RO header parsing and building (i.e. RH2 and HAO in DSTOPTS) * XFRM policy/state database handling for RO These are NOT covered as loadable: * Home Address flags and its rule on source address selection * XFRM sub policy (depends on its own kernel option) * XFRM functions to receive RO as IPv6 extension header * MH sending/receiving through raw socket if user application opens it (since raw socket allows to do so) * RH2 sending as ancillary data * RH2 operation with setsockopt(2) Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ff50b79 |
|
20-Apr-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[NET]: cleanup extra semicolons Spring cleaning time... There seems to be a lot of places in the network code that have extra bogus semicolons after conditionals. Most commonly is a bogus semicolon after: switch() { } Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27a884dc |
|
19-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Convert skb->tail to sk_buff_data_t So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc70ab26 |
|
13-Mar-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[ICMP6]: Introduce icmp6_hdr() For consistency with all the other skb->h.raw accessors. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0660e03f |
|
25-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d56f90a7 |
|
10-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce skb_network_header() For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bbe735e4 |
|
10-Mar-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce skb_network_offset() For the quite common 'skb->nh.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7159039a |
|
22-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Decentralize EXPORT_SYMBOLs. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
cd354f1a |
|
14-Feb-2007 |
Tim Schmielau <tim@physik3.uni-rostock.de> |
[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1ab1457c |
|
09-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] IPV6: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5f92a738 |
|
14-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[NET]: Annotate callers of the reset of checksum.h stuff. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
868c86bc |
|
14-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[NET]: annotate csum_ipv6_magic() callers in net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d3a1be9c |
|
10-Nov-2006 |
Brian Haley <brian.haley@hp.com> |
[IPv6]: Only modify checksum for UDP Only change upper-layer checksum from 0 to 0xFFFF for UDP (as RFC 768 states), not for others as RFC 4443 doesn't require it. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a11d206d |
|
04-Nov-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Per-interface statistics support. For IP MIB (RFC4293). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
04ce6909 |
|
08-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[IPV6]: 'info' argument of ipv6 ->err_handler() is net-endian Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
79383236 |
|
23-Aug-2006 |
Masahide NAKAMURA <nakam@linux-ipv6.org> |
[IPV6] MIP6: Revert address to send ICMPv6 error. IPv6 source address is replaced in receiving packet with home address option carried by destination options header. To send ICMPv6 error back, original address which is received one on wire should be used. This function checks such header is included and reverts them. Based on MIPL2 kernel patch. This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ab32ea5d |
|
22-Sep-2006 |
Brian Haley <brian.haley@hp.com> |
[NET/IPV4/IPV6]: Change some sysctl variables to __read_mostly Change net/core, ipv4 and ipv6 sysctl variables to __read_mostly. Couldn't actually measure any performance increase while testing (.3% I consider noise), but seems like the right thing to do. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
84fa7933 |
|
29-Aug-2006 |
Patrick McHardy <kaber@trash.net> |
[NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose checksum still needs to be completed) and CHECKSUM_COMPLETE (for incoming packets, device supplied full checksum). Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
beb8d13b |
|
05-Aug-2006 |
Venkat Yekkirala <vyekkirala@TrustedCS.com> |
[MLSXFRM]: Add flow labeling This labels the flows that could utilize IPSec xfrms at the points the flows are defined so that IPSec policy and SAs at the right label can be used. The following protos are currently not handled, but they should continue to be able to use single-labeled IPSec like they currently do. ipmr ip_gre ipip igmp sit sctp ip6_tunnel (IPv6 over IPv6 tunnel device) decnet Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e012d51c |
|
13-Sep-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Fix tclass setting for raw sockets. np->cork.tclass is used only in cork'ed context. Otherwise, np->tclass should be used. Bug#7096 reported by Remi Denis-Courmont <rdenis@simphalempin.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
640c41c7 |
|
15-Aug-2006 |
Ingo Molnar <mingo@elte.hu> |
[IPV6] lockdep: annotate __icmpv6_socket Split off __icmpv6_socket's sk->sk_dst_lock class, because it gets used from softirqs, which is safe for __icmpv6_sockets (because they never get directly used via userspace syscalls), but unsafe for normal sockets. Has no effect on non-lockdep kernels. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6f912042 |
|
10-Apr-2006 |
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> |
[PATCH] for_each_possible_cpu: network codes for_each_cpu() actually iterates across all possible CPUs. We've had mistakes in the past where people were using for_each_cpu() where they should have been iterating across only online or present CPUs. This is inefficient and possibly buggy. We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the future. This patch replaces for_each_cpu with for_each_possible_cpu under /net Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
763ecff1 |
|
15-Feb-2006 |
Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> |
[NETFILTER]: nf_conntrack: attach conntrack to locally generated ICMPv6 error Locally generated ICMPv6 errors should be associated with the conntrack of the original packet. Since the conntrack entry may not be in the hash tables (for the first packet), it must be manually attached. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
46b86a2d |
|
13-Jan-2006 |
Joe Perches <joe@perches.com> |
[NET]: Use NIP6_FMT in kernel.h There are errors and inconsistency in the display of NIP6 strings. ie: net/ipv6/ip6_flowlabel.c There are errors and inconsistency in the display of NIPQUAD strings too. ie: net/netfilter/nf_conntrack_ftp.c This patch: adds NIP6_FMT to kernel.h changes all code to use NIP6_FMT fixes net/ipv6/ip6_flowlabel.c adds NIPQUAD_FMT to kernel.h fixes net/netfilter/nf_conntrack_ftp.c changes a few uses of "%u.%u.%u.%u" to NIPQUAD_FMT for symmetry to NIP6_FMT Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
951dbc8a |
|
07-Jan-2006 |
Patrick McHardy <kaber@trash.net> |
[IPV6]: Move nextheader offset to the IP6CB Move nextheader offset to the IP6CB to make it possible to pass a packet to ip6_input_finish multiple times and have it skip already parsed headers. As a nice side effect this gets rid of the manual hopopts skipping in ip6_input_finish. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8de3351e |
|
21-Dec-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Try not to send icmp to anycast address. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
9b5b5cff |
|
29-Nov-2005 |
Arjan van de Ven <arjan@infradead.org> |
[NET]: Add const markers to various variables. the patch below marks various variables const in net/; the goal is to move them to the .rodata section so that they can't false-share cachelines with things that get written to, as well as potentially helping gcc a bit with optimisations. (these were found using a gcc patch to warn about such variables) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fb286bb2 |
|
10-Nov-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[NET]: Detect hardware rx checksum faults correctly Here is the patch that introduces the generic skb_checksum_complete which also checks for hardware RX checksum faults. If that happens, it'll call netdev_rx_csum_fault which currently prints out a stack trace with the device name. In future it can turn off RX checksum. I've converted every spot under net/ that does RX checksum checks to use skb_checksum_complete or __skb_checksum_complete with the exceptions of: * Those places where checksums are done bit by bit. These will call netdev_rx_csum_fault directly. * The following have not been completely checked/converted: ipmr ip_vs netfilter dccp This patch is based on patches and suggestions from Stephen Hemminger and David S. Miller. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
670c02c2 |
|
13-Oct-2005 |
John Hawkes <hawkes@sgi.com> |
[NET]: Wider use of for_each_*cpu() In 'net' change the explicit use of for-loops and NR_CPUS into the general for_each_cpu() or for_each_online_cpu() constructs, as appropriate. This widens the scope of potential future optimizations of the general constructs, as well as takes advantage of the existing optimizations of first_cpu() and next_cpu(), which is advantageous when the true CPU count is much smaller than NR_CPUS. Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
|
#
e104411b |
|
08-Sep-2005 |
Patrick McHardy <kaber@trash.net> |
[XFRM]: Always release dst_entry on error in xfrm_lookup Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
41a1f8ea |
|
07-Sep-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Support IPV6_{RECV,}TCLASS socket options / ancillary data. Based on patch from David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
2dac4b96 |
|
01-Sep-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Repair Incoming Interface Handling for Raw Socket. Due to changes to enforce checking interface bindings, sockets did not see loopback packets bound for our local address on our interface. e.g.) When we ping6 fe80::1%eth0, skb->dev points loopback_dev while IP6CB(skb)->iif indicates eth0. This patch fixes the issue by using appropriate incoming interface, in the sense of scoping architecture. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ba89966c |
|
26-Aug-2005 |
Eric Dumazet <dada1@cosmosbay.com> |
[NET]: use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointers This patch puts mostly read only data in the right section (read_mostly), to help sharing of these data between CPUS without memory ping pongs. On one of my production machine, tcp_statistics was sitting in a heavily modified cache line, so *every* SNMP update had to force a reload. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
64ce2073 |
|
09-Aug-2005 |
Patrick McHardy <kaber@trash.net> |
[NET]: Make NETDEBUG pure printk wrappers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0bd1b59b |
|
09-Aug-2005 |
Andrew McDonald <andrew@mcdonald.org.uk> |
[IPV6]: Check interface bindings on IPv6 raw socket reception Take account of whether a socket is bound to a particular device when selecting an IPv6 raw socket to receive a packet. Also perform this check when receiving IPv6 packets with router alert options. Signed-off-by: Andrew McDonald <andrew@mcdonald.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
84427d53 |
|
13-Jun-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Ensure to use icmpv6_socket in non-preemptive context. We saw following trace several times: |BUG: using smp_processor_id() in preemptible [00000001] code: httpd/30137 |caller is icmpv6_send+0x23/0x540 | [<c01ad63b>] smp_processor_id+0x9b/0xb8 | [<c02993e7>] icmpv6_send+0x23/0x540 This is because of icmpv6_socket, which is the only one user of smp_processor_id() in icmpv6_send(), AFAIK. Since it should be used in non-preemptive context, let's defer the dereference after disabling preemption (by icmpv6_xmit_lock()). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0d3d077c |
|
24-Apr-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[SELINUX]: Fix ipv6_skip_exthdr() invocation causing OOPS. The SELinux hooks invoke ipv6_skip_exthdr() with an incorrect length final argument. However, the length argument turns out to be superfluous. I was just reading ipv6_skip_exthdr and it occured to me that we can get rid of len altogether. The only place where len is used is to check whether the skb has two bytes for ipv6_opt_hdr. This check is done by skb_header_pointer/skb_copy_bits anyway. Now it might appear that we've made the code slower by deferring the check to skb_copy_bits. However, this check should not trigger in the common case so this is OK. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|