#
7633c4da |
|
08-Apr-2024 |
Jiri Benc <jbenc@redhat.com> |
ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it still means hlist_for_each_entry_rcu can return an item that got removed from the list. The memory itself of such item is not freed thanks to RCU but nothing guarantees the actual content of the memory is sane. In particular, the reference count can be zero. This can happen if ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough timing, this can happen: 1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry. 2. Then, the whole ipv6_del_addr is executed for the given entry. The reference count drops to zero and kfree_rcu is scheduled. 3. ipv6_get_ifaddr continues and tries to increments the reference count (in6_ifa_hold). 4. The rcu is unlocked and the entry is freed. 5. The freed entry is returned. Prevent increasing of the reference count in such case. The name in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe. [ 41.506330] refcount_t: addition on 0; use-after-free. [ 41.506760] WARNING: CPU: 0 PID: 595 at lib/refcount.c:25 refcount_warn_saturate+0xa5/0x130 [ 41.507413] Modules linked in: veth bridge stp llc [ 41.507821] CPU: 0 PID: 595 Comm: python3 Not tainted 6.9.0-rc2.main-00208-g49563be82afa #14 [ 41.508479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) [ 41.509163] RIP: 0010:refcount_warn_saturate+0xa5/0x130 [ 41.509586] Code: ad ff 90 0f 0b 90 90 c3 cc cc cc cc 80 3d c0 30 ad 01 00 75 a0 c6 05 b7 30 ad 01 01 90 48 c7 c7 38 cc 7a 8c e8 cc 18 ad ff 90 <0f> 0b 90 90 c3 cc cc cc cc 80 3d 98 30 ad 01 00 0f 85 75 ff ff ff [ 41.510956] RSP: 0018:ffffbda3c026baf0 EFLAGS: 00010282 [ 41.511368] RAX: 0000000000000000 RBX: ffff9e9c46914800 RCX: 0000000000000000 [ 41.511910] RDX: ffff9e9c7ec29c00 RSI: ffff9e9c7ec1c900 RDI: ffff9e9c7ec1c900 [ 41.512445] RBP: ffff9e9c43660c9c R08: 0000000000009ffb R09: 00000000ffffdfff [ 41.512998] R10: 00000000ffffdfff R11: ffffffff8ca58a40 R12: ffff9e9c4339a000 [ 41.513534] R13: 0000000000000001 R14: ffff9e9c438a0000 R15: ffffbda3c026bb48 [ 41.514086] FS: 00007fbc4cda1740(0000) GS:ffff9e9c7ec00000(0000) knlGS:0000000000000000 [ 41.514726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.515176] CR2: 000056233b337d88 CR3: 000000000376e006 CR4: 0000000000370ef0 [ 41.515713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 41.516252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 41.516799] Call Trace: [ 41.517037] <TASK> [ 41.517249] ? __warn+0x7b/0x120 [ 41.517535] ? refcount_warn_saturate+0xa5/0x130 [ 41.517923] ? report_bug+0x164/0x190 [ 41.518240] ? handle_bug+0x3d/0x70 [ 41.518541] ? exc_invalid_op+0x17/0x70 [ 41.520972] ? asm_exc_invalid_op+0x1a/0x20 [ 41.521325] ? refcount_warn_saturate+0xa5/0x130 [ 41.521708] ipv6_get_ifaddr+0xda/0xe0 [ 41.522035] inet6_rtm_getaddr+0x342/0x3f0 [ 41.522376] ? __pfx_inet6_rtm_getaddr+0x10/0x10 [ 41.522758] rtnetlink_rcv_msg+0x334/0x3d0 [ 41.523102] ? netlink_unicast+0x30f/0x390 [ 41.523445] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 41.523832] netlink_rcv_skb+0x53/0x100 [ 41.524157] netlink_unicast+0x23b/0x390 [ 41.524484] netlink_sendmsg+0x1f2/0x440 [ 41.524826] __sys_sendto+0x1d8/0x1f0 [ 41.525145] __x64_sys_sendto+0x1f/0x30 [ 41.525467] do_syscall_64+0xa5/0x1b0 [ 41.525794] entry_SYSCALL_64_after_hwframe+0x72/0x7a [ 41.526213] RIP: 0033:0x7fbc4cfcea9a [ 41.526528] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 [ 41.527942] RSP: 002b:00007ffcf54012a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 41.528593] RAX: ffffffffffffffda RBX: 00007ffcf5401368 RCX: 00007fbc4cfcea9a [ 41.529173] RDX: 000000000000002c RSI: 00007fbc4b9d9bd0 RDI: 0000000000000005 [ 41.529786] RBP: 00007fbc4bafb040 R08: 00007ffcf54013e0 R09: 000000000000000c [ 41.530375] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 41.530977] R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007fbc4ca85d1b [ 41.531573] </TASK> Fixes: 5c578aedcb21d ("IPv6: convert addrconf hash list to RCU") Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jiri Benc <jbenc@redhat.com> Link: https://lore.kernel.org/r/8ab821e36073a4a406c50ec83c9e8dc586c539e4.1712585809.git.jbenc@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c04f7dfe |
|
21-Mar-2024 |
Ido Schimmel <idosch@nvidia.com> |
ipv6: Fix address dump when IPv6 is disabled on an interface Cited commit started returning an error when user space requests to dump the interface's IPv6 addresses and IPv6 is disabled on the interface. Restore the previous behavior and do not return an error. Before cited commit: # ip address show dev dummy1 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 1a:52:02:5a:c2:6e brd ff:ff:ff:ff:ff:ff inet6 fe80::1852:2ff:fe5a:c26e/64 scope link proto kernel_ll valid_lft forever preferred_lft forever # ip link set dev dummy1 mtu 1000 # ip address show dev dummy1 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1000 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 1a:52:02:5a:c2:6e brd ff:ff:ff:ff:ff:ff After cited commit: # ip address show dev dummy1 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 1e:9b:94:00:ac:e8 brd ff:ff:ff:ff:ff:ff inet6 fe80::1c9b:94ff:fe00:ace8/64 scope link proto kernel_ll valid_lft forever preferred_lft forever # ip link set dev dummy1 mtu 1000 # ip address show dev dummy1 RTNETLINK answers: No such device Dump terminated With this patch: # ip address show dev dummy1 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 42:35:fc:53:66:cf brd ff:ff:ff:ff:ff:ff inet6 fe80::4035:fcff:fe53:66cf/64 scope link proto kernel_ll valid_lft forever preferred_lft forever # ip link set dev dummy1 mtu 1000 # ip address show dev dummy1 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1000 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 42:35:fc:53:66:cf brd ff:ff:ff:ff:ff:ff Fixes: 9cc4cc329d30 ("ipv6: use xa_array iterator to implement inet6_dump_addr()") Reported-by: Gal Pressman <gal@nvidia.com> Closes: https://lore.kernel.org/netdev/7e261328-42eb-411d-b1b4-ad884eeaae4d@linux.dev/ Tested-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240321173042.2151756-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
155549a6 |
|
06-Mar-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: remove RTNL protection from inet6_dump_addr() We can now remove RTNL acquisition while running inet6_dump_addr(), inet6_dump_ifmcaddr() and inet6_dump_ifacaddr(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9cc4cc32 |
|
06-Mar-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: use xa_array iterator to implement inet6_dump_addr() inet6_dump_addr() can use the new xa_array iterator for better scalability. Make it ready for RCU-only protection. RTNL use is removed in the following patch. Also properly return 0 at the end of a dump to avoid and extra recvmsg() to get NLMSG_DONE. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
46f5182d |
|
06-Mar-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: make in6_dump_addrs() lockless in6_dump_addrs() is called with RCU protection. There is no need holding idev->lock to iterate through unicast addresses. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f0a7da70 |
|
06-Mar-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: make inet6_fill_ifaddr() lockless Make inet6_fill_ifaddr() lockless, and add approriate annotations on ifa->tstamp, ifa->valid_lft, ifa->preferred_lft, ifa->ifa_proto and ifa->rt_priority. Also constify 2nd argument of inet6_fill_ifaddr(), inet6_fill_ifmcaddr() and inet6_fill_ifacaddr(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
02e24903 |
|
06-Mar-2024 |
Eric Dumazet <edumazet@google.com> |
netlink: let core handle error cases in dump operations After commit b5a899154aa9 ("netlink: handle EMSGSIZE errors in the core"), we can remove some code that was not 100 % correct anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306102426.245689-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
2a02f837 |
|
28-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: use xa_array iterator to implement inet6_netconf_dump_devconf() 1) inet6_netconf_dump_devconf() can run under RCU protection instead of RTNL. 2) properly return 0 at the end of a dump, avoiding an an extra recvmsg() system call. 3) Do not use inet6_base_seq() anymore, for_each_netdev_dump() has nice properties. Restarting a GETDEVCONF dump if a device has been added/removed or if net->ipv6.dev_addr_genid has changed is moot. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f0ff05a |
|
28-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6/addrconf: annotate data-races around devconf fields (II) Final (?) round of this series. Annotate lockless reads on following devconf fields, because they be changed concurrently from /proc/net/ipv6/conf. - accept_dad - optimistic_dad - use_optimistic - use_oif_addrs_only - ra_honor_pio_life - keep_addr_on_down - ndisc_notify - ndisc_evict_nocarrier - suppress_frag_ndisc - addr_gen_mode - seg6_enabled - ioam6_enabled - ioam6_id - ioam6_id_wide - drop_unicast_in_l2_multicast - mldv[12]_unsolicited_report_interval - force_mld_version - force_tllao - accept_untracked_na - drop_unsolicited_na - accept_source_route Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2aba913f |
|
28-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6/addrconf: annotate data-races around devconf fields (I) Annotate lockless reads and writes on following devconf fields: - regen_min_advance - regen_max_retry - dad_transmits - use_tempaddr - max_addresses - max_desync_factor - temp_valid_lft - rtr_solicits - rtr_solicit_max_interval - rtr_solicit_interval - rtr_solicit_delay - enhanced_dad - accept_redirects Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
45b90ec9 |
|
28-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: addrconf_disable_policy() optimization Writing over /proc/sys/net/ipv6/conf/default/disable_policy does not need to hold RTNL. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
624d5aec |
|
28-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: annotate data-races around devconf->disable_policy idev->cnf.disable_policy and net->ipv6.devconf_all->disable_policy can be read locklessly. Add appropriate annotations on reads and writes. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a8fbd4d9 |
|
28-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: annotate data-races around devconf->proxy_ndp devconf->proxy_ndp can be read and written locklessly, add appropriate annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fca34cc0 |
|
28-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: annotate data-races around idev->cnf.ignore_routes_with_linkdown idev->cnf.ignore_routes_with_linkdown can be used without any locks, add appropriate annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
32f75417 |
|
28-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: annotate data-races around cnf.forwarding idev->cnf.forwarding and net->ipv6.devconf_all->forwarding might be read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e7135f48 |
|
28-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: annotate data-races around cnf.mtu6 idev->cnf.mtu6 might be read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
553ced03 |
|
28-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: addrconf_disable_ipv6() optimization Writing over /proc/sys/net/ipv6/conf/default/disable_ipv6 does not need to hold RTNL. v3: remove a wrong change (Jakub Kicinski feedback) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d289ab65 |
|
28-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: annotate data-races around cnf.disable_ipv6 disable_ipv6 is read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations. v2: do not preload net before rtnl_trylock() in addrconf_disable_ipv6() (Jiri) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
67ea41d1 |
|
27-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
inet6: expand rcu_read_lock() scope in inet6_dump_addr() I missed that inet6_dump_addr() is calling in6_dump_addrs() from two points. First one under RTNL protection, and second one under rcu_read_lock(). Since we want to remove RTNL use from inet6_dump_addr() very soon, no longer assume in6_dump_addrs() is protected by RTNL (even if this is still the case). Use rcu_read_lock() earlier to fix this lockdep splat: WARNING: suspicious RCU usage 6.8.0-rc5-syzkaller-01618-gf8cbf6bde4c8 #0 Not tainted net/ipv6/addrconf.c:5317 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by syz-executor.2/8834: #0: ffff88802f554678 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x119/0x780 net/netlink/af_netlink.c:2338 #1: ffffffff8f377a88 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0x676/0xda0 net/netlink/af_netlink.c:2265 #2: ffff88807e5f0580 (&ndev->lock){++--}-{2:2}, at: in6_dump_addrs+0xb8/0x1de0 net/ipv6/addrconf.c:5279 stack backtrace: CPU: 1 PID: 8834 Comm: syz-executor.2 Not tainted 6.8.0-rc5-syzkaller-01618-gf8cbf6bde4c8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106 lockdep_rcu_suspicious+0x220/0x340 kernel/locking/lockdep.c:6712 in6_dump_addrs+0x1b47/0x1de0 net/ipv6/addrconf.c:5317 inet6_dump_addr+0x1597/0x1690 net/ipv6/addrconf.c:5428 netlink_dump+0x6a6/0xda0 net/netlink/af_netlink.c:2266 __netlink_dump_start+0x59d/0x780 net/netlink/af_netlink.c:2374 netlink_dump_start include/linux/netlink.h:340 [inline] rtnetlink_rcv_msg+0xcf7/0x10d0 net/core/rtnetlink.c:6555 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2547 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x8e0/0xcb0 net/netlink/af_netlink.c:1902 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667 Fixes: c3718936ec47 ("ipv6: anycast: complete RCU handling of struct ifacaddr6") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240227222259.4081489-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c3718936 |
|
23-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: anycast: complete RCU handling of struct ifacaddr6 struct ifacaddr6 are already freed after RCU grace period. Add __rcu qualifier to aca_next pointer, and idev->ac_list Add relevant rcu_assign_pointer() and dereference accessors. ipv6_chk_acast_dev() no longer needs to acquire idev->lock. /proc/net/anycast6 is now purely RCU protected, it no longer acquires idev->lock. Similarly in6_dump_addrs() can use RCU protection to iterate through anycast addresses. It was relying on a mixture of RCU and RTNL but next patches will get rid of RTNL there. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240223201054.220534-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
69fdb7e4 |
|
22-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: switch inet6_dump_ifinfo() to RCU protection No longer hold RTNL while calling inet6_dump_ifinfo() Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ac14ad97 |
|
22-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: use xarray iterator to implement inet6_dump_ifinfo() Prepare inet6_dump_ifinfo() to run with RCU protection instead of RTNL and use for_each_netdev_dump() interface. Also properly return 0 at the end of a dump, avoiding an extra recvmsg() system call and RTNL acquisition. Note that RTNL-less dumps need core changes, coming later in the series. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8afc7a78 |
|
22-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: prepare inet6_fill_ifinfo() for RCU protection We want to use RCU protection instead of RTNL for inet6_fill_ifinfo(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4ad26813 |
|
22-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: prepare inet6_fill_ifla6_attrs() for RCU We want to no longer hold RTNL while calling inet6_fill_ifla6_attrs() in the future. Add needed READ_ONCE()/WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f4bcbf36 |
|
13-Feb-2024 |
Alex Henrie <alexhenrie24@gmail.com> |
net: ipv6/addrconf: clamp preferred_lft to the minimum required If the preferred lifetime was less than the minimum required lifetime, ipv6_create_tempaddr would error out without creating any new address. On my machine and network, this error happened immediately with the preferred lifetime set to 5 seconds or less, after a few minutes with the preferred lifetime set to 6 seconds, and not at all with the preferred lifetime set to 7 seconds. During my investigation, I found a Stack Exchange post from another person who seems to have had the same problem: They stopped getting new addresses if they lowered the preferred lifetime below 3 seconds, and they didn't really know why. The preferred lifetime is a preference, not a hard requirement. The kernel does not strictly forbid new connections on a deprecated address, nor does it guarantee that the address will be disposed of the instant its total valid lifetime expires. So rather than disable IPv6 privacy extensions altogether if the minimum required lifetime swells above the preferred lifetime, it is more in keeping with the user's intent to increase the temporary address's lifetime to the minimum necessary for the current network conditions. With these fixes, setting the preferred lifetime to 5 or 6 seconds "just works" because the extra fraction of a second is practically unnoticeable. It's even possible to reduce the time before deprecation to 1 or 2 seconds by setting /proc/sys/net/ipv6/conf/*/regen_min_advance and /proc/sys/net/ipv6/conf/*/dad_transmits to 0. I realize that that is a pretty niche use case, but I know at least one person who would gladly sacrifice performance and convenience to be sure that they are getting the maximum possible level of privacy. Link: https://serverfault.com/a/1031168/310447 Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
a5fcea2d |
|
13-Feb-2024 |
Alex Henrie <alexhenrie24@gmail.com> |
net: ipv6/addrconf: introduce a regen_min_advance sysctl In RFC 8981, REGEN_ADVANCE cannot be less than 2 seconds, and the RFC does not permit the creation of temporary addresses with lifetimes shorter than that: > When processing a Router Advertisement with a > Prefix Information option carrying a prefix for the purposes of > address autoconfiguration (i.e., the A bit is set), the host MUST > perform the following steps: > 5. A temporary address is created only if this calculated preferred > lifetime is greater than REGEN_ADVANCE time units. However, some users want to change their IPv6 address as frequently as possible regardless of the RFC's arbitrary minimum lifetime. For the benefit of those users, add a regen_min_advance sysctl parameter that can be set to below or above 2 seconds. Link: https://datatracker.ietf.org/doc/html/rfc8981 Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
2aa8f155 |
|
13-Feb-2024 |
Alex Henrie <alexhenrie24@gmail.com> |
net: ipv6/addrconf: ensure that regen_advance is at least 2 seconds RFC 8981 defines REGEN_ADVANCE as follows: REGEN_ADVANCE = 2 + (TEMP_IDGEN_RETRIES * DupAddrDetectTransmits * RetransTimer / 1000) Thus, allowing it to be less than 2 seconds is technically a protocol violation. Link: https://datatracker.ietf.org/doc/html/rfc8981#name-defined-protocol-parameters Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
004d1383 |
|
12-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
net-sysfs: convert dev->operstate reads to lockless ones operstate_show() can omit dev_base_lock acquisition only to read dev->operstate. Annotate accesses to dev->operstate. Writers still acquire dev_base_lock for mutual exclusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
768e06a8 |
|
08-Feb-2024 |
Kui-Feng Lee <thinker.li@gmail.com> |
net/ipv6: set expires in modify_prefix_route() if RTF_EXPIRES is set. Make the decision to set or clean the expires of a route based on the RTF_EXPIRES flag, rather than the value of the "expires" argument. This patch doesn't make difference logically, but make inet6_addr_modify() and modify_prefix_route() consistent. The function inet6_addr_modify() is the only caller of modify_prefix_route(), and it passes the RTF_EXPIRES flag and an expiration value. The RTF_EXPIRES flag is turned on or off based on the value of valid_lft. The RTF_EXPIRES flag is turned on if valid_lft is a finite value (not infinite, not 0xffffffff). Even if valid_lft is 0, the RTF_EXPIRES flag remains on. The expiration value being passed is equal to the valid_lft value if the flag is on. However, if the valid_lft value is infinite, the expiration value becomes 0 and the RTF_EXPIRES flag is turned off. Despite this, modify_prefix_route() decides to set the expiration value if the received expiration value is not zero. This mixing of infinite and zero cases creates an inconsistency. Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5eb902b8 |
|
08-Feb-2024 |
Kui-Feng Lee <thinker.li@gmail.com> |
net/ipv6: Remove expired routes with a separated list of routes. FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree can be expensive if the number of routes in a table is big, even if most of them are permanent. Checking routes in a separated list of routes having expiration will avoid this potential issue. Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dfd2ee08 |
|
01-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: make addrconf_wq single threaded Both addrconf_verify_work() and addrconf_dad_work() acquire rtnl, there is no point trying to have one thread per cpu. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240201173031.3654257-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
10bfd453 |
|
21-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: fix potential "struct net" leak in inet6_rtm_getaddr() It seems that if userspace provides a correct IFA_TARGET_NETNSID value but no IFA_ADDRESS and IFA_LOCAL attributes, inet6_rtm_getaddr() returns -EINVAL with an elevated "struct net" refcount. Fixes: 6ecf4c37eb3e ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e898e4cd |
|
15-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
ipv6: properly combine dev_base_seq and ipv6.dev_addr_genid net->dev_base_seq and ipv6.dev_addr_genid are monotonically increasing. If we XOR their values, we could miss to detect if both values were changed with the same amount. Fixes: 63998ac24f83 ("ipv6: provide addr and netconf dump consistency info") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8cdafdd9 |
|
29-Dec-2023 |
Alex Henrie <alexhenrie24@gmail.com> |
Revert "net: ipv6/addrconf: clamp preferred_lft to the minimum required" The commit had a bug and might not have been the right approach anyway. Fixes: 629df6701c8a ("net: ipv6/addrconf: clamp preferred_lft to the minimum required") Fixes: ec575f885e3e ("Documentation: networking: explain what happens if temp_prefered_lft is too small or too large") Reported-by: Dan Moulding <dan@danm.net> Closes: https://lore.kernel.org/netdev/20231221231115.12402-1-dan@danm.net/ Link: https://lore.kernel.org/netdev/CAMMLpeTdYhd=7hhPi2Y7pwdPCgnnW5JYh-bu3hSc7im39uxnEA@mail.gmail.com/ Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231230043252.10530-1-alexhenrie24@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
bd4a8167 |
|
06-Dec-2023 |
Maciej Żenczykowski <maze@google.com> |
net: ipv6: support reporting otherwise unknown prefix flags in RTM_NEWPREFIX Lorenzo points out that we effectively clear all unknown flags from PIO when copying them to userspace in the netlink RTM_NEWPREFIX notification. We could fix this one at a time as new flags are defined, or in one fell swoop - I choose the latter. We could either define 6 new reserved flags (reserved1..6) and handle them individually (and rename them as new flags are defined), or we could simply copy the entire unmodified byte over - I choose the latter. This unfortunately requires some anonymous union/struct magic, so we add a static assert on the struct size for a little extra safety. Cc: David Ahern <dsahern@kernel.org> Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
629df670 |
|
24-Oct-2023 |
Alex Henrie <alexhenrie24@gmail.com> |
net: ipv6/addrconf: clamp preferred_lft to the minimum required If the preferred lifetime was less than the minimum required lifetime, ipv6_create_tempaddr would error out without creating any new address. On my machine and network, this error happened immediately with the preferred lifetime set to 1 second, after a few minutes with the preferred lifetime set to 4 seconds, and not at all with the preferred lifetime set to 5 seconds. During my investigation, I found a Stack Exchange post from another person who seems to have had the same problem: They stopped getting new addresses if they lowered the preferred lifetime below 3 seconds, and they didn't really know why. The preferred lifetime is a preference, not a hard requirement. The kernel does not strictly forbid new connections on a deprecated address, nor does it guarantee that the address will be disposed of the instant its total valid lifetime expires. So rather than disable IPv6 privacy extensions altogether if the minimum required lifetime swells above the preferred lifetime, it is more in keeping with the user's intent to increase the temporary address's lifetime to the minimum necessary for the current network conditions. With these fixes, setting the preferred lifetime to 3 or 4 seconds "just works" because the extra fraction of a second is practically unnoticeable. It's even possible to reduce the time before deprecation to 1 or 2 seconds by also disabling duplicate address detection (setting /proc/sys/net/ipv6/conf/*/dad_transmits to 0). I realize that that is a pretty niche use case, but I know at least one person who would gladly sacrifice performance and convenience to be sure that they are getting the maximum possible level of privacy. Link: https://serverfault.com/a/1031168/310447 Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231024212312.299370-3-alexhenrie24@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
bfbf81b3 |
|
24-Oct-2023 |
Alex Henrie <alexhenrie24@gmail.com> |
net: ipv6/addrconf: clamp preferred_lft to the maximum allowed Without this patch, there is nothing to stop the preferred lifetime of a temporary address from being greater than its valid lifetime. If that was the case, the valid lifetime was effectively ignored. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231024212312.299370-2-alexhenrie24@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
473267a4 |
|
25-Sep-2023 |
Patrick Rohr <prohr@google.com> |
net: add sysctl to disable rfc4862 5.5.3e lifetime handling This change adds a sysctl to opt-out of RFC4862 section 5.5.3e's valid lifetime derivation mechanism. RFC4862 section 5.5.3e prescribes that the valid lifetime in a Router Advertisement PIO shall be ignored if it less than 2 hours and to reset the lifetime of the corresponding address to 2 hours. An in-progress 6man draft (see draft-ietf-6man-slaac-renum-07 section 4.2) is currently looking to remove this mechanism. While this draft has not been moving particularly quickly for other reasons, there is widespread consensus on section 4.2 which updates RFC4862 section 5.5.3e. Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Jen Linkova <furry@google.com> Signed-off-by: Patrick Rohr <prohr@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230925214711.959704-1-prohr@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
f31867d0 |
|
31-Aug-2023 |
Alex Henrie <alexhenrie24@gmail.com> |
net: ipv6/addrconf: avoid integer underflow in ipv6_create_tempaddr The existing code incorrectly casted a negative value (the result of a subtraction) to an unsigned value without checking. For example, if /proc/sys/net/ipv6/conf/*/temp_prefered_lft was set to 1, the preferred lifetime would jump to 4 billion seconds. On my machine and network the shortest lifetime that avoided underflow was 3 seconds. Fixes: 76506a986dc3 ("IPv6: fix DESYNC_FACTOR") Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5cb24968 |
|
18-Aug-2023 |
Patrick Rohr <prohr@google.com> |
net: release reference to inet6_dev pointer addrconf_prefix_rcv returned early without releasing the inet6_dev pointer when the PIO lifetime is less than accept_ra_min_lft. Fixes: 5027d54a9c30 ("net: change accept_ra_min_rtr_lft to affect all RA lifetimes") Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: David Ahern <dsahern@kernel.org> Cc: Simon Horman <horms@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Patrick Rohr <prohr@google.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c899710f |
|
08-Aug-2023 |
Joel Granados <joel.granados@gmail.com> |
networking: Update to register_net_sysctl_sz Move from register_net_sysctl to register_net_sysctl_sz for all the networking related files. Do this while making sure to mirror the NULL assignments with a table_size of zero for the unprivileged users. We need to move to the new function in preparation for when we change SIZE_MAX to ARRAY_SIZE() in the register_net_sysctl macro. Failing to do so would erroneously allow ARRAY_SIZE() to be called on a pointer. We hold off the SIZE_MAX to ARRAY_SIZE change until we have migrated all the relevant net sysctl registering functions to register_net_sysctl_sz in subsequent commits. An additional size function was added to the following files in order to calculate the size of an array that is defined in another file: include/net/ipv6.h net/ipv6/icmp.c net/ipv6/route.c net/ipv6/sysctl_net_ipv6.c Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
#
5027d54a |
|
26-Jul-2023 |
Patrick Rohr <prohr@google.com> |
net: change accept_ra_min_rtr_lft to affect all RA lifetimes accept_ra_min_rtr_lft only considered the lifetime of the default route and discarded entire RAs accordingly. This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and applies the value to individual RA sections; in particular, router lifetime, PIO preferred lifetime, and RIO lifetime. If any of those lifetimes are lower than the configured value, the specific RA section is ignored. In order for the sysctl to be useful to Android, it should really apply to all lifetimes in the RA, since that is what determines the minimum frequency at which RAs must be processed by the kernel. Android uses hardware offloads to drop RAs for a fraction of the minimum of all lifetimes present in the RA (some networks have very frequent RAs (5s) with high lifetimes (2h)). Despite this, we have encountered networks that set the router lifetime to 30s which results in very frequent CPU wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the WiFi firmware) entirely on such networks, it seems better to ignore the misconfigured routers while still processing RAs from other IPv6 routers on the same network (i.e. to support IoT applications). The previous implementation dropped the entire RA based on router lifetime. This turned out to be hard to expand to the other lifetimes present in the RA in a consistent manner; dropping the entire RA based on RIO/PIO lifetimes would essentially require parsing the whole thing twice. Fixes: 1671bcfd76fd ("net: add sysctl accept_ra_min_rtr_lft") Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Patrick Rohr <prohr@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230726230701.919212-1-prohr@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7f6c4039 |
|
25-Jul-2023 |
Hangbin Liu <liuhangbin@gmail.com> |
IPv6: add extack info for IPv6 address add/delete Add extack info for IPv6 address add/delete, which would be useful for users to understand the problem without having to read kernel code. Suggested-by: Beniamino Galvani <bgalvani@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1671bcfd |
|
19-Jul-2023 |
Patrick Rohr <prohr@google.com> |
net: add sysctl accept_ra_min_rtr_lft This change adds a new sysctl accept_ra_min_rtr_lft to specify the minimum acceptable router lifetime in an RA. If the received RA router lifetime is less than the configured value (and not 0), the RA is ignored. This is useful for mobile devices, whose battery life can be impacted by networks that configure RAs with a short lifetime. On such networks, the device should never gain IPv6 provisioning and should attempt to drop RAs via hardware offload, if available. Signed-off-by: Patrick Rohr <prohr@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69172f0b |
|
20-Jul-2023 |
Maciej Żenczykowski <maze@google.com> |
ipv6 addrconf: fix bug where deleting a mngtmpaddr can create a new temporary address currently on 6.4 net/main: # ip link add dummy1 type dummy # echo 1 > /proc/sys/net/ipv6/conf/dummy1/use_tempaddr # ip link set dummy1 up # ip -6 addr add 2000::1/64 mngtmpaddr dev dummy1 # ip -6 addr show dev dummy1 11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet6 2000::44f3:581c:8ca:3983/64 scope global temporary dynamic valid_lft 604800sec preferred_lft 86172sec inet6 2000::1/64 scope global mngtmpaddr valid_lft forever preferred_lft forever inet6 fe80::e8a8:a6ff:fed5:56d4/64 scope link valid_lft forever preferred_lft forever # ip -6 addr del 2000::44f3:581c:8ca:3983/64 dev dummy1 (can wait a few seconds if you want to, the above delete isn't [directly] the problem) # ip -6 addr show dev dummy1 11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet6 2000::1/64 scope global mngtmpaddr valid_lft forever preferred_lft forever inet6 fe80::e8a8:a6ff:fed5:56d4/64 scope link valid_lft forever preferred_lft forever # ip -6 addr del 2000::1/64 mngtmpaddr dev dummy1 # ip -6 addr show dev dummy1 11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet6 2000::81c9:56b7:f51a:b98f/64 scope global temporary dynamic valid_lft 604797sec preferred_lft 86169sec inet6 fe80::e8a8:a6ff:fed5:56d4/64 scope link valid_lft forever preferred_lft forever This patch prevents this new 'global temporary dynamic' address from being created by the deletion of the related (same subnet prefix) 'mngtmpaddr' (which is triggered by there already being no temporary addresses). Cc: Jiri Pirko <jiri@resnulli.us> Fixes: 53bd67491537 ("ipv6 addrconf: introduce IFA_F_MANAGETEMPADDR to tell kernel to manage temporary addresses") Reported-by: Xiao Ma <xiaom@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230720160022.1887942-1-maze@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
06a07169 |
|
08-Jul-2023 |
Ziyang Xuan <william.xuanziyang@huawei.com> |
ipv6/addrconf: fix a potential refcount underflow for idev Now in addrconf_mod_rs_timer(), reference idev depends on whether rs_timer is not pending. Then modify rs_timer timeout. There is a time gap in [1], during which if the pending rs_timer becomes not pending. It will miss to hold idev, but the rs_timer is activated. Thus rs_timer callback function addrconf_rs_timer() will be executed and put idev later without holding idev. A refcount underflow issue for idev can be caused by this. if (!timer_pending(&idev->rs_timer)) in6_dev_hold(idev); <--------------[1] mod_timer(&idev->rs_timer, jiffies + when); To fix the issue, hold idev if mod_timer() return 0. Fixes: b7b1bfce0bb6 ("ipv6: split duplicate address detection and router solicitation timer") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f69de8aa |
|
02-Jun-2023 |
Matthieu Baerts <matthieu.baerts@tessares.net> |
ipv6: lower "link become ready"'s level message This following message is printed in the console each time a network device configured with an IPv6 addresses is ready to be used: ADDRCONF(NETDEV_CHANGE): <iface>: link becomes ready When netns are being extensively used -- e.g. by re-creating netns' with veth to discuss with each others for testing purposes like mptcp_join.sh selftest does -- it generates a lot of messages like that: more than 700 when executing mptcp_join.sh with the latest version. It looks like this message is not that helpful after all: maybe it can be used as a sign to know if there is something wrong, e.g. if a device is being regularly reconfigured by accident? But even then, there are better ways to monitor and diagnose such issues. When looking at commit 3c21edbd1137 ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.") which introduces this new message, it seems it had been added to verify that the new feature was working as expected. It could have then used a lower level than "info" from the beginning but it was fine like that back then: 17 years ago. It seems then OK today to simply lower its level, similar to commit 7c62b8dd5ca8 ("net/ipv6: lower the level of "link is not ready" messages") and as suggested by Mat [1], Stephen and David [2]. Link: https://lore.kernel.org/mptcp/614e76ac-184e-c553-af72-084f792e60b0@kernel.org/T/ [1] Link: https://lore.kernel.org/netdev/68035bad-b53e-91cb-0e4a-007f27d62b05@tessares.net/T/ [2] Suggested-by: Mat Martineau <martineau@kernel.org> Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Suggested-by: David Ahern <dsahern@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2df9bf4d |
|
22-Mar-2023 |
Xin Long <lucien.xin@gmail.com> |
ipv6: prevent router_solicitations for team port The issue fixed for bonding in commit c2edacf80e15 ("bonding / ipv6: no addrconf for slaves separately from master") also exists in team driver. However, we can't just disable ipv6 addrconf for team ports, as 'teamd' will need it when nsns_ping watch is used in the user space. Instead of preventing ipv6 addrconf, this patch only prevents RS packets for team ports, as it did in commit b52e1cce31ca ("ipv6: Don't send rs packets to the interface of ARPHRD_TUNNEL"). Note that we do not prevent DAD packets, to avoid the changes getting intricate / hacky. Also, usually sysctl dad_transmits is set to 1 and only 1 DAD packet will be sent, and by now no libteam user complains about DAD packets on team ports, unlike RS packets. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
09eed119 |
|
20-Mar-2023 |
Eric Dumazet <edumazet@google.com> |
neighbour: switch to standard rcu, instead of rcu_bh rcu_bh is no longer a win, especially for objects freed with standard call_rcu(). Switch neighbour code to no longer disable BH when not necessary. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
30e2291f |
|
30-Jan-2023 |
Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> |
ip/ip6_gre: Fix non-point-to-point tunnel not generating IPv6 link local address We recently found that our non-point-to-point tunnels were not generating any IPv6 link local address and instead generating an IPv6 compat address, breaking IPv6 communication on the tunnel. Previously, addrconf_gre_config always would call addrconf_addr_gen and generate a EUI64 link local address for the tunnel. Then commit e5dd729460ca changed the code path so that add_v4_addrs is called but this only generates a compat IPv6 address for non-point-to-point tunnels. I assume the compat address is specifically for SIT tunnels so have kept that only for SIT - GRE tunnels now always generate link local addresses. Fixes: e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
23ca0c2c |
|
30-Jan-2023 |
Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> |
ip/ip6_gre: Fix changing addr gen mode not generating IPv6 link local address For our point-to-point GRE tunnels, they have IN6_ADDR_GEN_MODE_NONE when they are created then we set IN6_ADDR_GEN_MODE_EUI64 when they come up to generate the IPv6 link local address for the interface. Recently we found that they were no longer generating IPv6 addresses. This issue would also have affected SIT tunnels. Commit e5dd729460ca changed the code path so that GRE tunnels generate an IPv6 address based on the tunnel source address. It also changed the code path so GRE tunnels don't call addrconf_addr_gen in addrconf_dev_config which is called by addrconf_sysctl_addr_gen_mode when the IN6_ADDR_GEN_MODE is changed. This patch aims to fix this issue by moving the code in addrconf_notify which calls the addr gen for GRE and SIT into a separate function and calling it in the places that expect the IPv6 address to be generated. The previous addrconf_dev_config is renamed to addrconf_eth_config since it only expected eth type interfaces and follows the addrconf_gre/sit_config format. A part of this changes means that the loopback address will be attempted to be configured when changing addr_gen_mode for lo. This should not be a problem because the address should exist anyway and if does already exist then no error is produced. Fixes: e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8a321cf7 |
|
09-Dec-2022 |
Xin Long <lucien.xin@gmail.com> |
net: add IFF_NO_ADDRCONF and use it in bonding to prevent ipv6 addrconf Currently, in bonding it reused the IFF_SLAVE flag and checked it in ipv6 addrconf to prevent ipv6 addrconf. However, it is not a proper flag to use for no ipv6 addrconf, for bonding it has to move IFF_SLAVE flag setting ahead of dev_open() in bond_enslave(). Also, IFF_MASTER/SLAVE are historical flags used in bonding and eql, as Jiri mentioned, the new devices like Team, Failover do not use this flag. So as Jiri suggested, this patch adds IFF_NO_ADDRCONF in priv_flags of the device to indicate no ipv6 addconf, and uses it in bonding and moves IFF_SLAVE flag setting back to its original place. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
e8a533cb |
|
09-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use get_random_u32_inclusive() when possible These cases were done with this Coccinelle: @@ expression H; expression L; @@ - (get_random_u32_below(H) + L) + get_random_u32_inclusive(L, H + L - 1) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - + E - - E ) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - - E - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - - E + F - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - + E + F - - E ) And then subsequently cleaned up by hand, with several automatic cases rejected if it didn't make sense contextually. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
8032bf12 |
|
09-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use get_random_u32_below() instead of deprecated function This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
1ca69520 |
|
17-Oct-2022 |
Zhengchao Shao <shaozhengchao@huawei.com> |
ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failed If the initialization fails in calling addrconf_init_net(), devconf_all is the pointer that has been released. Then ip6mr_sk_done() is called to release the net, accessing devconf->mc_forwarding directly causes invalid pointer access. The process is as follows: setup_net() ops_init() addrconf_init_net() all = kmemdup(...) ---> alloc "all" ... net->ipv6.devconf_all = all; __addrconf_sysctl_register() ---> failed ... kfree(all); ---> ipv6.devconf_all invalid ... ops_exit_list() ... ip6mr_sk_done() devconf = net->ipv6.devconf_all; //devconf is invalid pointer if (!devconf || !atomic_read(&devconf->mc_forwarding)) The following is the Call Trace information: BUG: KASAN: use-after-free in ip6mr_sk_done+0x112/0x3a0 Read of size 4 at addr ffff888075508e88 by task ip/14554 Call Trace: <TASK> dump_stack_lvl+0x8e/0xd1 print_report+0x155/0x454 kasan_report+0xba/0x1f0 kasan_check_range+0x35/0x1b0 ip6mr_sk_done+0x112/0x3a0 rawv6_close+0x48/0x70 inet_release+0x109/0x230 inet6_release+0x4c/0x70 sock_release+0x87/0x1b0 igmp6_net_exit+0x6b/0x170 ops_exit_list+0xb0/0x170 setup_net+0x7ac/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f7963322547 </TASK> Allocated by task 14554: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0xa1/0xb0 __kmalloc_node_track_caller+0x4a/0xb0 kmemdup+0x28/0x60 addrconf_init_net+0x1be/0x840 ops_init+0xa5/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 14554: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x155/0x1b0 slab_free_freelist_hook+0x11b/0x220 __kmem_cache_free+0xa4/0x360 addrconf_init_net+0x623/0x840 ops_init+0xa5/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 7d9b1b578d67 ("ip6mr: fix use-after-free in ip6mr_sk_done()") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221017080331.16878-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
81895a65 |
|
05-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use prandom_u32_max() when possible, part 1 Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
fd16eb94 |
|
30-Aug-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: add all node mcast address when slave up When a link is enslave to bond, it need to set the interface down first. This makes the slave remove mac multicast address 33:33:00:00:00:01(The IPv6 multicast address ff02::1 is kept even when the interface down). When bond set the slave up, ipv6_mc_up() was not called due to commit c2edacf80e15 ("bonding / ipv6: no addrconf for slaves separately from master"). This is not an issue before we adding the lladdr target feature for bonding, as the mac multicast address will be added back when bond interface up and join group ff02::1. But after adding lladdr target feature for bonding. When user set a lladdr target, the unsolicited NA message with all-nodes multicast dest will be dropped as the slave interface never add 33:33:00:00:00:01 back. Fix this by calling ipv6_mc_up() to add 33:33:00:00:00:01 back when the slave interface up. Reported-by: LiLiang <liali@redhat.com> Fixes: 5e1eeef69c0f ("bonding: NS target should accept link local address") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a5612ca1 |
|
23-Aug-2022 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
net: Fix data-races around sysctl_devconf_inherit_init_net. While reading sysctl_devconf_inherit_init_net, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b66eb3a6 |
|
20-Jul-2022 |
Jaehee Park <jhpark1013@gmail.com> |
net: ipv6: avoid accepting values greater than 2 for accept_untracked_na The accept_untracked_na sysctl changed from a boolean to an integer when a new knob '2' was added. This patch provides a safeguard to avoid accepting values that are not defined in the sysctl. When setting a value greater than 2, the user will get an 'invalid argument' warning. Fixes: aaa5f515b16b ("net: ipv6: new accept_untracked_na option to accept na only if in-network") Signed-off-by: Jaehee Park <jhpark1013@gmail.com> Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Suggested-by: Roopa Prabhu <roopa@nvidia.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20220720183632.376138-1-jhpark1013@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
aaa5f515 |
|
13-Jul-2022 |
Jaehee Park <jhpark1013@gmail.com> |
net: ipv6: new accept_untracked_na option to accept na only if in-network This patch adds a third knob, '2', which extends the accept_untracked_na option to learn a neighbor only if the src ip is in the same subnet as an address configured on the interface that received the neighbor advertisement. This is similar to the arp_accept configuration for ipv4. Signed-off-by: Jaehee Park <jhpark1013@gmail.com> Suggested-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
778964f2 |
|
23-Jun-2022 |
Sam Edwards <cfsworks@gmail.com> |
ipv6/addrconf: fix timing bug in tempaddr regen The addrconf_verify_rtnl() function uses a big if/elseif/elseif/... block to categorize each address by what type of attention it needs. An about-to-expire (RFC 4941) temporary address is one such category, but the previous elseif branch catches addresses that have already run out their prefered_lft. This means that if addrconf_verify_rtnl() fails to run in the necessary time window (i.e. REGEN_ADVANCE time units before the end of the prefered_lft), the temporary address will never be regenerated, and no temporary addresses will be available until each one's valid_lft runs out and manage_tempaddrs() begins anew. Fix this by moving the entire temporary address regeneration case out of that block. That block is supposed to implement the "destructive" part of an address's lifecycle, and regenerating a fresh temporary address is not, semantically speaking, actually tied to any particular lifecycle stage. The age test is also changed from `age >= prefered_lft - regen_advance` to `age + regen_advance >= prefered_lft` instead, to ensure no underflow occurs if the system administrator increases the regen_advance to a value greater than the already-set prefered_lft. Note that this does not fix the problem of addrconf_verify_rtnl() sometimes not running in time, resulting in the race condition described in RFC 4941 section 3.4 - it only ensures that the address is regenerated. Fixing THAT problem may require either using jiffies instead of seconds for all time arithmetic here, or always rounding up when regen_advance is converted to seconds. Signed-off-by: Sam Edwards <CFSworks@gmail.com> Link: https://lore.kernel.org/r/20220623181103.7033-1-CFSworks@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
d62607c3 |
|
07-Jun-2022 |
Jakub Kicinski <kuba@kernel.org> |
net: rename reference+tracking helpers Netdev reference helpers have a dev_ prefix for historic reasons. Renaming the old helpers would be too much churn but we can rename the tracking ones which are relatively recent and should be the default for new code. Rename: dev_hold_track() -> netdev_hold() dev_put_track() -> netdev_put() dev_replace_track() -> netdev_ref_replace() Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
4e43e64d |
|
27-Jun-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6: fix lockdep splat in in6_dump_addrs() As reported by syzbot, we should not use rcu_dereference() when rcu_read_lock() is not held. WARNING: suspicious RCU usage 5.19.0-rc2-syzkaller #0 Not tainted net/ipv6/addrconf.c:5175 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by syz-executor326/3617: #0: ffffffff8d5848e8 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xae/0xc20 net/netlink/af_netlink.c:2223 stack backtrace: CPU: 0 PID: 3617 Comm: syz-executor326 Not tainted 5.19.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 in6_dump_addrs+0x12d1/0x1790 net/ipv6/addrconf.c:5175 inet6_dump_addr+0x9c1/0xb50 net/ipv6/addrconf.c:5300 netlink_dump+0x541/0xc20 net/netlink/af_netlink.c:2275 __netlink_dump_start+0x647/0x900 net/netlink/af_netlink.c:2380 netlink_dump_start include/linux/netlink.h:245 [inline] rtnetlink_rcv_msg+0x73e/0xc90 net/core/rtnetlink.c:6046 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:734 ____sys_sendmsg+0x6eb/0x810 net/socket.c:2492 ___sys_sendmsg+0xf3/0x170 net/socket.c:2546 __sys_sendmsg net/socket.c:2575 [inline] __do_sys_sendmsg net/socket.c:2584 [inline] __se_sys_sendmsg net/socket.c:2582 [inline] __x64_sys_sendmsg+0x132/0x220 net/socket.c:2582 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 88e2ca308094 ("mld: convert ifmcaddr6 to RCU") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20220628121248.858695-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
3b0dc529 |
|
23-Jun-2022 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipv6: take care of disable_policy when restoring routes When routes corresponding to addresses are restored by fixup_permanent_addr(), the dst_nopolicy parameter was not set. The typical use case is a user that configures an address on a down interface and then put this interface up. Let's take care of this flag in addrconf_f6i_alloc(), so that every callers benefit ont it. CC: stable@kernel.org CC: David Forster <dforster@brocade.com> Fixes: df789fe75206 ("ipv6: Provide ipv6 version of "disable_policy" sysctl") Reported-by: Siwar Zitouni <siwar.zitouni@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220623120015.32640-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
3e0b8f52 |
|
30-May-2022 |
Arun Ajith S <aajith@arista.com> |
net/ipv6: Expand and rename accept_unsolicited_na to accept_untracked_na RFC 9131 changes default behaviour of handling RX of NA messages when the corresponding entry is absent in the neighbour cache. The current implementation is limited to accept just unsolicited NAs. However, the RFC is more generic where it also accepts solicited NAs. Both types should result in adding a STALE entry for this case. Expand accept_untracked_na behaviour to also accept solicited NAs to be compliant with the RFC and rename the sysctl knob to accept_untracked_na. Fixes: f9a2fb73318e ("net/ipv6: Introduce accept_unsolicited_na knob to implement router-side changes for RFC9131") Signed-off-by: Arun Ajith S <aajith@arista.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220530101414.65439-1-aajith@arista.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
d4150779 |
|
11-May-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
random32: use real rng for non-deterministic randomness random32.c has two random number generators in it: one that is meant to be used deterministically, with some predefined seed, and one that does the same exact thing as random.c, except does it poorly. The first one has some use cases. The second one no longer does and can be replaced with calls to random.c's proper random number generator. The relatively recent siphash-based bad random32.c code was added in response to concerns that the prior random32.c was too deterministic. Out of fears that random.c was (at the time) too slow, this code was anonymously contributed. Then out of that emerged a kind of shadow entropy gathering system, with its own tentacles throughout various net code, added willy nilly. Stop👏making👏bespoke👏random👏number👏generators👏. Fortunately, recent advances in random.c mean that we can stop playing with this sketchiness, and just use get_random_u32(), which is now fast enough. In micro benchmarks using RDPMC, I'm seeing the same median cycle count between the two functions, with the mean being _slightly_ higher due to batches refilling (which we can optimize further need be). However, when doing *real* benchmarks of the net functions that actually use these random numbers, the mean cycles actually *decreased* slightly (with the median still staying the same), likely because the additional prandom code means icache misses and complexity, whereas random.c is generally already being used by something else nearby. The biggest benefit of this is that there are many users of prandom who probably should be using cryptographically secure random numbers. This makes all of those accidental cases become secure by just flipping a switch. Later on, we can do a tree-wide cleanup to remove the static inline wrapper functions that this commit adds. There are also some low-ish hanging fruits for making this even faster in the future: a get_random_u16() function for use in the networking stack will give a 2x performance boost there, using SIMD for ChaCha20 will let us compute 4 or 8 or 16 blocks of output in parallel, instead of just one, giving us large buffers for cheap, and introducing a get_random_*_bh() function that assumes irqs are already disabled will shave off a few cycles for ordinary calls. These are things we can chip away at down the road. Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
425b9c7f |
|
02-May-2022 |
Vasily Averin <vasily.averin@linux.dev> |
memcg: accounting for objects allocated for new netdevice Creating a new netdevice allocates at least ~50Kb of memory for various kernel objects, but only ~5Kb of them are accounted to memcg. As a result, creating an unlimited number of netdevice inside a memcg-limited container does not fall within memcg restrictions, consumes a significant part of the host's memory, can cause global OOM and lead to random kills of host processes. The main consumers of non-accounted memory are: ~10Kb 80+ kernfs nodes ~6Kb ipv6_add_dev() allocations 6Kb __register_sysctl_table() allocations 4Kb neigh_sysctl_register() allocations 4Kb __devinet_sysctl_register() allocations 4Kb __addrconf_sysctl_register() allocations Accounting of these objects allows to increase the share of memcg-related memory up to 60-70% (~38Kb accounted vs ~54Kb total for dummy netdevice on typical VM with default Fedora 35 kernel) and this should be enough to somehow protect the host from misuse inside container. Other related objects are quite small and may not be taken into account to minimize the expected performance degradation. It should be separately mentonied ~300 bytes of percpu allocation of struct ipstats_mib in snmp6_alloc_dev(), on huge multi-cpu nodes it can become the main consumer of memory. This patch does not enables kernfs accounting as it affects other parts of the kernel and should be discussed separately. However, even without kernfs, this patch significantly improves the current situation and allows to take into account more than half of all netdevice allocations. Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/354a0a5f-9ec3-a25c-3215-304eab2157bc@openvz.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
b52e1cce |
|
28-Apr-2022 |
jianghaoran <jianghaoran@kylinos.cn> |
ipv6: Don't send rs packets to the interface of ARPHRD_TUNNEL ARPHRD_TUNNEL interface can't process rs packets and will generate TX errors ex: ip tunnel add ethn mode ipip local 192.168.1.1 remote 192.168.1.2 ifconfig ethn x.x.x.x ethn: flags=209<UP,POINTOPOINT,RUNNING,NOARP> mtu 1480 inet x.x.x.x netmask 255.255.255.255 destination x.x.x.x inet6 fe80::5efe:ac1e:3cdb prefixlen 64 scopeid 0x20<link> tunnel txqueuelen 1000 (IPIP Tunnel) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 3 dropped 0 overruns 0 carrier 0 collisions 0 Signed-off-by: jianghaoran <jianghaoran@kylinos.cn> Link: https://lore.kernel.org/r/20220429053802.246681-1-jianghaoran@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
d09d3ec0 |
|
19-Apr-2022 |
Arun Ajith S <aajith@arista.com> |
net/ipv6: Enforce limits for accept_unsolicited_na sysctl Fix mistake in the original patch where limits were specified but the handler didn't take care of the limits. Signed-off-by: Arun Ajith S <aajith@arista.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f9a2fb73 |
|
15-Apr-2022 |
Arun Ajith S <aajith@arista.com> |
net/ipv6: Introduce accept_unsolicited_na knob to implement router-side changes for RFC9131 Add a new neighbour cache entry in STALE state for routers on receiving an unsolicited (gratuitous) neighbour advertisement with target link-layer-address option specified. This is similar to the arp_accept configuration for IPv4. A new sysctl endpoint is created to turn on this behaviour: /proc/sys/net/ipv6/conf/interface/accept_unsolicited_na. Signed-off-by: Arun Ajith S <aajith@arista.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51454ea4 |
|
03-Apr-2022 |
Niels Dossche <dossche.niels@gmail.com> |
ipv6: fix locking issues with loops over idev->addr_list idev->addr_list needs to be protected by idev->lock. However, it is not always possible to do so while iterating and performing actions on inet6_ifaddr instances. For example, multiple functions (like addrconf_{join,leave}_anycast) eventually call down to other functions that acquire the idev->lock. The current code temporarily unlocked the idev->lock during the loops, which can cause race conditions. Moving the locks up is also not an appropriate solution as the ordering of lock acquisition will be inconsistent with for example mc_lock. This solution adds an additional field to inet6_ifaddr that is used to temporarily add the instances to a temporary list while holding idev->lock. The temporary list can then be traversed without holding idev->lock. This change was done in two places. In addrconf_ifdown, the list_for_each_entry_safe variant of the list loop is also no longer necessary as there is no deletion within that specific loop. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220403231523.45843-1-dossche.niels@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
47f0bd50 |
|
17-Feb-2022 |
Jacques de Laval <Jacques.De.Laval@westermo.com> |
net: Add new protocol attribute to IP addresses This patch adds a new protocol attribute to IPv4 and IPv6 addresses. Inspiration was taken from the protocol attribute of routes. User space applications like iproute2 can set/get the protocol with the Netlink API. The attribute is stored as an 8-bit unsigned integer. The protocol attribute is set by kernel for these categories: - IPv4 and IPv6 loopback addresses - IPv6 addresses generated from router announcements - IPv6 link local addresses User space may pass custom protocols, not defined by the kernel. Grouping addresses on their origin is useful in scenarios where you want to distinguish between addresses based on who added them, e.g. kernel vs. user space. Tagging addresses with a string label is an existing feature that could be used as a solution. Unfortunately the max length of a label is 15 characters, and for compatibility reasons the label must be prefixed with the name of the device followed by a colon. Since device names also have a max length of 15 characters, only -1 characters is guaranteed to be available for any origin tag, which is not that much. A reference implementation of user space setting and getting protocols is available for iproute2: https://github.com/westermo/iproute2/commit/9a6ea18bd79f47f293e5edc7780f315ea42ff540 Signed-off-by: Jacques de Laval <Jacques.De.Laval@westermo.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220217150202.80802-1-Jacques.De.Laval@westermo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
be6b41c1 |
|
16-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6/addrconf: ensure addrconf_verify_rtnl() has completed Before freeing the hash table in addrconf_exit_net(), we need to make sure the work queue has completed, or risk NULL dereference or UAF. Thus, use cancel_delayed_work_sync() to enforce this. We do not hold RTNL in addrconf_exit_net(), making this safe. Fixes: 8805d13ff1b2 ("ipv6/addrconf: use one delayed work per netns") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220216182037.3742-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
dd263a8c |
|
13-Feb-2022 |
Ido Schimmel <idosch@nvidia.com> |
ipv6: blackhole_netdev needs snmp6 counters Whenever rt6_uncached_list_flush_dev() swaps rt->rt6_idev to the blackhole device, parts of IPv6 stack might still need to increment one SNMP counter. Root cause, patch from Ido, changelog from Eric :) This bug suggests that we need to audit rt->rt6_idev usages and make sure they are properly using RCU protection. Fixes: e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5f80fcf |
|
10-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6: give an IPv6 dev to blackhole_netdev IPv6 addrconf notifiers wants the loopback device to be the last device being dismantled at netns deletion. This caused many limitations and work arounds. Back in linux-5.3, Mahesh added a per host blackhole_netdev that can be used whenever we need to make sure objects no longer refer to a disappearing device. If we attach to blackhole_netdev an ip6_ptr (allocate an idev), then we can use this special device (which is never freed) in place of the loopback_dev (which can be freed). This will permit improvements in netdev_run_todo() and other parts of the stack where had steps to make sure loopback_dev was the last device to disappear. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e66d1172 |
|
07-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6/addrconf: switch to per netns inet6_addr_lst hash table IPv6 does not scale very well with the number of IPv6 addresses. It uses a global (shared by all netns) hash table with 256 buckets. Some functions like addrconf_verify_rtnl() and addrconf_ifdown() have to iterate all addresses in the hash table. I have seen addrconf_verify_rtnl() holding the cpu for 10ms or more. Switch to the per netns hashtable (and spinlock) added in prior patches. This considerably speeds up netns dismantle times on hosts with thousands of netns. This also has an impact on regular (fast path) IPv6 processing. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8805d13f |
|
07-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6/addrconf: use one delayed work per netns Next step for using per netns inet6_addr_lst is to have per netns work item to ultimately call addrconf_verify_rtnl() and addrconf_verify() with a new 'struct net*' argument. Everything is still using the global inet6_addr_lst[] table. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
21a216a8 |
|
07-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6/addrconf: allocate a per netns hash table Add a per netns hash table and a dedicated spinlock, first step to get rid of the global inet6_addr_lst[] one. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7d9b1b57 |
|
06-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ip6mr: fix use-after-free in ip6mr_sk_done() Apparently addrconf_exit_net() is called before igmp6_net_exit() and ndisc_net_exit() at netns dismantle time: net_namespace: call ip6table_mangle_net_exit() net_namespace: call ip6_tables_net_exit() net_namespace: call ipv6_sysctl_net_exit() net_namespace: call ioam6_net_exit() net_namespace: call seg6_net_exit() net_namespace: call ping_v6_proc_exit_net() net_namespace: call tcpv6_net_exit() ip6mr_sk_done sk=ffffa354c78a74c0 net_namespace: call ipv6_frags_exit_net() net_namespace: call addrconf_exit_net() net_namespace: call ip6addrlbl_net_exit() net_namespace: call ip6_flowlabel_net_exit() net_namespace: call ip6_route_net_exit_late() net_namespace: call fib6_rules_net_exit() net_namespace: call xfrm6_net_exit() net_namespace: call fib6_net_exit() net_namespace: call ip6_route_net_exit() net_namespace: call ipv6_inetpeer_exit() net_namespace: call if6_proc_net_exit() net_namespace: call ipv6_proc_exit_net() net_namespace: call udplite6_proc_exit_net() net_namespace: call raw6_exit_net() net_namespace: call igmp6_net_exit() ip6mr_sk_done sk=ffffa35472b2a180 ip6mr_sk_done sk=ffffa354c78a7980 net_namespace: call ndisc_net_exit() ip6mr_sk_done sk=ffffa35472b2ab00 net_namespace: call ip6mr_net_exit() net_namespace: call inet6_net_exit() This was fine because ip6mr_sk_done() would not reach the point decreasing net->ipv6.devconf_all->mc_forwarding until my patch in ip6mr_sk_done(). To fix this without changing struct pernet_operations ordering, we can clear net->ipv6.devconf_dflt and net->ipv6.devconf_all when they are freed from addrconf_exit_net() BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:71 [inline] BUG: KASAN: use-after-free in atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] BUG: KASAN: use-after-free in ip6mr_sk_done+0x11b/0x410 net/ipv6/ip6mr.c:1578 Read of size 4 at addr ffff88801ff08688 by task kworker/u4:4/963 CPU: 0 PID: 963 Comm: kworker/u4:4 Not tainted 5.17.0-rc2-syzkaller-00650-g5a8fb33e5305 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:459 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline] ip6mr_sk_done+0x11b/0x410 net/ipv6/ip6mr.c:1578 rawv6_close+0x58/0x80 net/ipv6/raw.c:1201 inet_release+0x12e/0x280 net/ipv4/af_inet.c:428 inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478 __sock_release net/socket.c:650 [inline] sock_release+0x87/0x1b0 net/socket.c:678 inet_ctl_sock_destroy include/net/inet_common.h:65 [inline] igmp6_net_exit+0x6b/0x170 net/ipv6/mcast.c:3173 ops_exit_list+0xb0/0x170 net/core/net_namespace.c:168 cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:600 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307 worker_thread+0x657/0x1110 kernel/workqueue.c:2454 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> Fixes: f2f2325ec799 ("ip6mr: ip6mr_sk_done() can exit early in common cases") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
145c7a79 |
|
04-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
ipv6: make mc_forwarding atomic This fixes minor data-races in ip6_mc_input() and batadv_mcast_mla_rtr_flags_softif_get_ipv6() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9995b408 |
|
24-Feb-2022 |
j.nixdorf@avm.de <j.nixdorf@avm.de> |
net: ipv6: ensure we call ipv6_mc_down() at most once There are two reasons for addrconf_notify() to be called with NETDEV_DOWN: either the network device is actually going down, or IPv6 was disabled on the interface. If either of them stays down while the other is toggled, we repeatedly call the code for NETDEV_DOWN, including ipv6_mc_down(), while never calling the corresponding ipv6_mc_up() in between. This will cause a new entry in idev->mc_tomb to be allocated for each multicast group the interface is subscribed to, which in turn leaks one struct ifmcaddr6 per nontrivial multicast group the interface is subscribed to. The following reproducer will leak at least $n objects: ip addr add ff2e::4242/32 dev eth0 autojoin sysctl -w net.ipv6.conf.eth0.disable_ipv6=1 for i in $(seq 1 $n); do ip link set up eth0; ip link set down eth0 done Joining groups with IPV6_ADD_MEMBERSHIP (unprivileged) or setting the sysctl net.ipv6.conf.eth0.forwarding to 1 (=> subscribing to ff02::2) can also be used to create a nontrivial idev->mc_list, which will the leak objects with the right up-down-sequence. Based on both sources for NETDEV_DOWN events the interface IPv6 state should be considered: - not ready if the network interface is not ready OR IPv6 is disabled for it - ready if the network interface is ready AND IPv6 is enabled for it The functions ipv6_mc_up() and ipv6_down() should only be run when this state changes. Implement this by remembering when the IPv6 state is ready, and only run ipv6_mc_down() if it actually changed from ready to not ready. The other direction (not ready -> ready) already works correctly, as: - the interface notification triggered codepath for NETDEV_UP / NETDEV_CHANGE returns early if ipv6 is disabled, and - the disable_ipv6=0 triggered codepath skips fully initializing the interface as long as addrconf_link_ready(dev) returns false - calling ipv6_mc_up() repeatedly does not leak anything Fixes: 3ce62a84d53c ("ipv6: exit early in addrconf_notify() if IPv6 is disabled") Signed-off-by: Johannes Nixdorf <j.nixdorf@avm.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c0d8833 |
|
23-Feb-2022 |
Niels Dossche <dossche.niels@gmail.com> |
ipv6: prevent a possible race condition with lifetimes valid_lft, prefered_lft and tstamp are always accessed under the lock "lock" in other places. Reading these without taking the lock may result in inconsistencies regarding the calculation of the valid and preferred variables since decisions are taken on these fields for those variables. Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Niels Dossche <niels.dossche@ugent.be> Link: https://lore.kernel.org/r/20220223131954.6570-1-niels.dossche@ugent.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
26394fc1 |
|
11-Feb-2022 |
Ignat Korchagin <ignat@cloudflare.com> |
ipv6: mcast: use rcu-safe version of ipv6_get_lladdr() Some time ago 8965779d2c0e ("ipv6,mcast: always hold idev->lock before mca_lock") switched ipv6_get_lladdr() to __ipv6_get_lladdr(), which is rcu-unsafe version. That was OK, because idev->lock was held for these codepaths. In 88e2ca308094 ("mld: convert ifmcaddr6 to RCU") these external locks were removed, so we probably need to restore the original rcu-safe call. Otherwise, we occasionally get a machine crashed/stalled with the following in dmesg: [ 3405.966610][T230589] general protection fault, probably for non-canonical address 0xdead00000000008c: 0000 [#1] SMP NOPTI [ 3405.982083][T230589] CPU: 44 PID: 230589 Comm: kworker/44:3 Tainted: G O 5.15.19-cloudflare-2022.2.1 #1 [ 3405.998061][T230589] Hardware name: SUPA-COOL-SERV [ 3406.009552][T230589] Workqueue: mld mld_ifc_work [ 3406.017224][T230589] RIP: 0010:__ipv6_get_lladdr+0x34/0x60 [ 3406.025780][T230589] Code: 57 10 48 83 c7 08 48 89 e5 48 39 d7 74 3e 48 8d 82 38 ff ff ff eb 13 48 8b 90 d0 00 00 00 48 8d 82 38 ff ff ff 48 39 d7 74 22 <66> 83 78 32 20 77 1b 75 e4 89 ca 23 50 2c 75 dd 48 8b 50 08 48 8b [ 3406.055748][T230589] RSP: 0018:ffff94e4b3fc3d10 EFLAGS: 00010202 [ 3406.065617][T230589] RAX: dead00000000005a RBX: ffff94e4b3fc3d30 RCX: 0000000000000040 [ 3406.077477][T230589] RDX: dead000000000122 RSI: ffff94e4b3fc3d30 RDI: ffff8c3a31431008 [ 3406.089389][T230589] RBP: ffff94e4b3fc3d10 R08: 0000000000000000 R09: 0000000000000000 [ 3406.101445][T230589] R10: ffff8c3a31430000 R11: 000000000000000b R12: ffff8c2c37887100 [ 3406.113553][T230589] R13: ffff8c3a39537000 R14: 00000000000005dc R15: ffff8c3a31431000 [ 3406.125730][T230589] FS: 0000000000000000(0000) GS:ffff8c3b9fc80000(0000) knlGS:0000000000000000 [ 3406.138992][T230589] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3406.149895][T230589] CR2: 00007f0dfea1db60 CR3: 000000387b5f2000 CR4: 0000000000350ee0 [ 3406.162421][T230589] Call Trace: [ 3406.170235][T230589] <TASK> [ 3406.177736][T230589] mld_newpack+0xfe/0x1a0 [ 3406.186686][T230589] add_grhead+0x87/0xa0 [ 3406.195498][T230589] add_grec+0x485/0x4e0 [ 3406.204310][T230589] ? newidle_balance+0x126/0x3f0 [ 3406.214024][T230589] mld_ifc_work+0x15d/0x450 [ 3406.223279][T230589] process_one_work+0x1e6/0x380 [ 3406.232982][T230589] worker_thread+0x50/0x3a0 [ 3406.242371][T230589] ? rescuer_thread+0x360/0x360 [ 3406.252175][T230589] kthread+0x127/0x150 [ 3406.261197][T230589] ? set_kthread_struct+0x40/0x40 [ 3406.271287][T230589] ret_from_fork+0x22/0x30 [ 3406.280812][T230589] </TASK> [ 3406.288937][T230589] Modules linked in: ... [last unloaded: kheaders] [ 3406.476714][T230589] ---[ end trace 3525a7655f2f3b9e ]--- Fixes: 88e2ca308094 ("mld: convert ifmcaddr6 to RCU") Reported-by: David Pinilla Caparros <dpini@cloudflare.com> Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
36268983 |
|
26-Jan-2022 |
Guillaume Nault <gnault@redhat.com> |
Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values" This reverts commit b75326c201242de9495ff98e5d5cff41d7fc0d9d. This commit breaks Linux compatibility with USGv6 tests. The RFC this commit was based on is actually an expired draft: no published RFC currently allows the new behaviour it introduced. Without full IETF endorsement, the flash renumbering scenario this patch was supposed to enable is never going to work, as other IPv6 equipements on the same LAN will keep the 2 hours limit. Fixes: b75326c20124 ("ipv6: Honor all IPv6 PIO Valid Lifetime values") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8c727003 |
|
04-Dec-2021 |
Eric Dumazet <edumazet@google.com> |
ipv6: add net device refcount tracker to struct inet6_dev Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
18ac597a |
|
01-Nov-2021 |
James Prestwood <prestwoj@gmail.com> |
net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter In most situations the neighbor discovery cache should be cleared on a NOCARRIER event which is currently done unconditionally. But for wireless roams the neighbor discovery cache can and should remain intact since the underlying network has not changed. This patch introduces a sysctl option ndisc_evict_nocarrier which can be disabled by a wireless supplicant during a roam. This allows packets to be sent after a roam immediately without having to wait for neighbor discovery. A user reported roughly a 1 second delay after a roam before packets could be sent out (note, on IPv4). This delay was due to the ARP cache being cleared. During testing of this same scenario using IPv6 no delay was noticed, but regardless there is no reason to clear the ndisc cache for wireless roams. Signed-off-by: James Prestwood <prestwoj@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
61e18ce7 |
|
20-Oct-2021 |
Stephen Suryaputra <ssuryaextr@gmail.com> |
gre/sit: Don't generate link-local addr if addr_gen_mode is IN6_ADDR_GEN_MODE_NONE When addr_gen_mode is set to IN6_ADDR_GEN_MODE_NONE, the link-local addr should not be generated. But it isn't the case for GRE (as well as GRE6) and SIT tunnels. Make it so that tunnels consider the addr_gen_mode, especially for IN6_ADDR_GEN_MODE_NONE. Do this in add_v4_addrs() to cover both GRE and SIT only if the addr scope is link. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Acked-by: Antonio Quartulli <a@unstable.cc> Link: https://lore.kernel.org/r/20211020200618.467342-1-ssuryaextr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
1a8a23d2 |
|
12-Oct-2021 |
Jakub Kicinski <kuba@kernel.org> |
ipv6: constify dev_addr passing In preparation for netdev->dev_addr being constant make all relevant arguments in ndisc constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
e5dd7294 |
|
03-Sep-2021 |
Antonio Quartulli <a@unstable.cc> |
ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address GRE interfaces are not Ether-like and therefore it is not possible to generate the v6LL address the same way as (for example) GRETAP devices. With default settings, a GRE interface will attempt generating its v6LL address using the EUI64 approach, but this will fail when the local endpoint of the GRE tunnel is set to "any". In this case the GRE interface will end up with no v6LL address, thus violating RFC4291. SIT interfaces already implement a different logic to ensure that a v6LL address is always computed. Change the GRE v6LL generation logic to follow the same approach as SIT. This way GRE interfaces will always have a v6LL address as well. Behaviour of GRETAP interfaces has not been changed as they behave like classic Ether-like interfaces. To avoid code duplication sit_add_v4_addrs() has been renamed to add_v4_addrs() and adapted to handle also the IP6GRE/GRE cases. Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
49b99da2 |
|
27-Aug-2021 |
Rocco Yue <rocco.yue@mediatek.com> |
ipv6: add IFLA_INET6_RA_MTU to expose mtu value The kernel provides a "/proc/sys/net/ipv6/conf/<iface>/mtu" file, which can temporarily record the mtu value of the last received RA message when the RA mtu value is lower than the interface mtu, but this proc has following limitations: (1) when the interface mtu (/sys/class/net/<iface>/mtu) is updeated, mtu6 (/proc/sys/net/ipv6/conf/<iface>/mtu) will be updated to the value of interface mtu; (2) mtu6 (/proc/sys/net/ipv6/conf/<iface>/mtu) only affect ipv6 connection, and not affect ipv4. Therefore, when the mtu option is carried in the RA message, there will be a problem that the user sometimes cannot obtain RA mtu value correctly by reading mtu6. After this patch set, if a RA message carries the mtu option, you can send a netlink msg which nlmsg_type is RTM_GETLINK, and then by parsing the attribute of IFLA_INET6_RA_MTU to get the mtu value carried in the RA message received on the inet6 device. In addition, you can also get a link notification when ra_mtu is updated so it doesn't have to poll. In this way, if the MTU values that the device receives from the network in the PCO IPv4 and the RA IPv6 procedures are different, the user can obtain the correct ipv6 ra_mtu value and compare the value of ra_mtu and ipv4 mtu, then the device can use the lower MTU value for both IPv4 and IPv6. Signed-off-by: Rocco Yue <rocco.yue@mediatek.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210827150412.9267-1-rocco.yue@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
1160dfa1 |
|
05-Aug-2021 |
Yajun Deng <yajun.deng@linux.dev> |
net: Remove redundant if statements The 'if (dev)' statement already move into dev_{put , hold}, so remove redundant if statements. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8679c31e |
|
03-Aug-2021 |
Rocco Yue <rocco.yue@mediatek.com> |
net: add extack arg for link ops Pass extack arg to validate_linkmsg and validate_link_af callbacks. If a netlink attribute has a reject_message, use the extended ack mechanism to carry the message back to user space. Signed-off-by: Rocco Yue <rocco.yue@mediatek.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
176f716c |
|
22-Jul-2021 |
Matthieu Baerts <matthieu.baerts@tessares.net> |
ipv6: fix "'ioam6_if_id_max' defined but not used" warn When compiling without CONFIG_SYSCTL, this warning appears: net/ipv6/addrconf.c:99:12: error: 'ioam6_if_id_max' defined but not used [-Werror=unused-variable] 99 | static u32 ioam6_if_id_max = U16_MAX; | ^~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Simply moving the declaration of this variable under ... #ifdef CONFIG_SYSCTL ... with other similar variables fixes the issue. Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ee11f0f |
|
20-Jul-2021 |
Justin Iurman <justin.iurman@uliege.be> |
ipv6: ioam: Data plane support for Pre-allocated Trace Implement support for processing the IOAM Pre-allocated Trace with IPv6, see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option, see IANA [3]. A new per-interface sysctl is introduced. The value is a boolean to accept (=1) or ignore (=0, by default) IPv6 IOAM options on ingress for an interface: - net.ipv6.conf.XXX.ioam6_enabled Two other sysctls are introduced to define IOAM IDs, represented by an integer. They are respectively per-namespace and per-interface: - net.ipv6.ioam6_id - net.ipv6.conf.XXX.ioam6_id The value of the first one represents the IOAM ID of the node itself (u32; max and default value = U32_MAX>>8, due to hop limit concatenation) while the other represents the IOAM ID of an interface (u16; max and default value = U16_MAX). Each "ioam6_id" sysctl has a "_wide" equivalent: - net.ipv6.ioam6_id_wide - net.ipv6.conf.XXX.ioam6_id_wide The value of the first one represents the wide IOAM ID of the node itself (u64; max and default value = U64_MAX>>8, due to hop limit concatenation) while the other represents the wide IOAM ID of an interface (u32; max and default value = U32_MAX). The use of short and wide equivalents is not exclusive, a deployment could choose to leverage both. For example, net.ipv6.conf.XXX.ioam6_id (short format) could be an identifier for a physical interface, whereas net.ipv6.conf.XXX.ioam6_id_wide (wide format) could be an identifier for a logical sub-interface. Documentation about new sysctls is provided at the end of this patchset. Two relativistic hash tables are used: one for IOAM namespaces, the other for IOAM schemas. A namespace can only have a single active schema and a schema can only be attached to a single namespace (1:1 relationship). [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data [3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2 Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6126891c |
|
19-Jul-2021 |
Vasily Averin <vvs@virtuozzo.com> |
memcg: enable accounting for IP address and routing-related objects An netadmin inside container can use 'ip a a' and 'ip r a' to assign a large number of ipv4/ipv6 addresses and routing entries and force kernel to allocate megabytes of unaccounted memory for long-lived per-netdevice related kernel objects: 'struct in_ifaddr', 'struct inet6_ifaddr', 'struct fib6_node', 'struct rt6_info', 'struct fib_rules' and ip_fib caches. These objects can be manually removed, though usually they lives in memory till destroy of its net namespace. It makes sense to account for them to restrict the host's memory consumption from inside the memcg-limited container. One of such objects is the 'struct fib6_node' mostly allocated in net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section: write_lock_bh(&table->tb6_lock); err = fib6_add(&table->tb6_root, rt, info, mxc); write_unlock_bh(&table->tb6_lock); In this case it is not enough to simply add SLAB_ACCOUNT to corresponding kmem cache. The proper memory cgroup still cannot be found due to the incorrect 'in_interrupt()' check used in memcg_kmem_bypass(). Obsoleted in_interrupt() does not describe real execution context properly. >From include/linux/preempt.h: The following macros are deprecated and should not be used in new code: in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled To verify the current execution context new macro should be used instead: in_task() - We're in task context Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
87117baf |
|
15-Jul-2021 |
Rocco Yue <rocco.yue@mediatek.com> |
ipv6: remove unnecessary local variable The local variable "struct net *net" in the two functions of inet6_rtm_getaddr() and inet6_dump_addr() are actually useless, so remove them. Signed-off-by: Rocco Yue <rocco.yue@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
12e64b3b |
|
30-May-2021 |
Rocco Yue <rocco.yue@mediatek.com> |
ipv6: align code with context The Tab key is used three times, causing the code block to be out of alignment with the context. Signed-off-by: Rocco Yue <rocco.yue@mediatek.com> Link: https://lore.kernel.org/r/20210530113811.8817-1-rocco.yue@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
5ac6b198 |
|
07-Jun-2021 |
Zheng Yongjun <zhengyongjun3@huawei.com> |
net: ipv4: Remove unneed BUG() function When 'nla_parse_nested_deprecated' failed, it's no need to BUG() here, return -EINVAL is ok. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aa8caa76 |
|
16-Apr-2021 |
Taehee Yoo <ap420073@gmail.com> |
mld: fix suspicious RCU usage in __ipv6_dev_mc_dec() __ipv6_dev_mc_dec() internally uses sleepable functions so that caller must not acquire atomic locks. But caller, which is addrconf_verify_rtnl() acquires rcu_read_lock_bh(). So this warning occurs in the __ipv6_dev_mc_dec(). Test commands: ip netns add A ip link add veth0 type veth peer name veth1 ip link set veth1 netns A ip link set veth0 up ip netns exec A ip link set veth1 up ip a a 2001:db8::1/64 dev veth0 valid_lft 2 preferred_lft 1 Splat looks like: ============================ WARNING: suspicious RCU usage 5.12.0-rc6+ #515 Not tainted ----------------------------- kernel/sched/core.c:8294 Illegal context switch in RCU-bh read-side critical section! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 4 locks held by kworker/4:0/1997: #0: ffff88810bd72d48 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_one_work+0x761/0x1440 #1: ffff888105c8fe00 ((addr_chk_work).work){+.+.}-{0:0}, at: process_one_work+0x795/0x1440 #2: ffffffffb9279fb0 (rtnl_mutex){+.+.}-{3:3}, at: addrconf_verify_work+0xa/0x20 #3: ffffffffb8e30860 (rcu_read_lock_bh){....}-{1:2}, at: addrconf_verify_rtnl+0x23/0xc60 stack backtrace: CPU: 4 PID: 1997 Comm: kworker/4:0 Not tainted 5.12.0-rc6+ #515 Workqueue: ipv6_addrconf addrconf_verify_work Call Trace: dump_stack+0xa4/0xe5 ___might_sleep+0x27d/0x2b0 __mutex_lock+0xc8/0x13f0 ? lock_downgrade+0x690/0x690 ? __ipv6_dev_mc_dec+0x49/0x2a0 ? mark_held_locks+0xb7/0x120 ? mutex_lock_io_nested+0x1270/0x1270 ? lockdep_hardirqs_on_prepare+0x12c/0x3e0 ? _raw_spin_unlock_irqrestore+0x47/0x50 ? trace_hardirqs_on+0x41/0x120 ? __wake_up_common_lock+0xc9/0x100 ? __wake_up_common+0x620/0x620 ? memset+0x1f/0x40 ? netlink_broadcast_filtered+0x2c4/0xa70 ? __ipv6_dev_mc_dec+0x49/0x2a0 __ipv6_dev_mc_dec+0x49/0x2a0 ? netlink_broadcast_filtered+0x2f6/0xa70 addrconf_leave_solict.part.64+0xad/0xf0 ? addrconf_join_solict.part.63+0xf0/0xf0 ? nlmsg_notify+0x63/0x1b0 __ipv6_ifa_notify+0x22c/0x9c0 ? inet6_fill_ifaddr+0xbe0/0xbe0 ? lockdep_hardirqs_on_prepare+0x12c/0x3e0 ? __local_bh_enable_ip+0xa5/0xf0 ? ipv6_del_addr+0x347/0x870 ipv6_del_addr+0x3b1/0x870 ? addrconf_ifdown+0xfe0/0xfe0 ? rcu_read_lock_any_held.part.27+0x20/0x20 addrconf_verify_rtnl+0x8a9/0xc60 addrconf_verify_work+0xf/0x20 process_one_work+0x84c/0x1440 In order to avoid this problem, it uses rcu_read_unlock_bh() for a short time. RCU is used for avoiding freeing ifp(struct *inet6_ifaddr) while ifp is being used. But this will not be released even if rcu_read_unlock_bh() is used. Because before rcu_read_unlock_bh(), it uses in6_ifa_hold(ifp). So this is safe. Fixes: 63ed8de4be81 ("mld: add mc_lock for protecting per-interface mld data") Suggested-by: Eric Dumazet <edumazet@google.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
912b519a |
|
26-Mar-2021 |
Bhaskar Chowdhury <unixbhaskar@gmail.com> |
ipv6: addrconf.c: Fix a typo s/Identifers/Identifiers/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
88e2ca30 |
|
25-Mar-2021 |
Taehee Yoo <ap420073@gmail.com> |
mld: convert ifmcaddr6 to RCU The ifmcaddr6 has been protected by inet6_dev->lock(rwlock) so that the critical section is atomic context. In order to switch this context, changing locking is needed. The ifmcaddr6 actually already protected by RTNL So if it's converted to use RCU, its control path context can be switched to sleepable. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3583a4e8 |
|
07-Apr-2021 |
Stephen Hemminger <stephen@networkplumber.org> |
ipv6: report errors for iftoken via netlink extack Setting iftoken can fail for several different reasons but there and there was no report to user as to the cause. Add netlink extended errors to the processing of the request. This requires adding additional argument through rtnl_af_ops set_link_af callback. Reported-by: Hongren Zheng <li@zenithal.me> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b2e04bc |
|
25-Jan-2021 |
Praveen Chaudhary <praveen5582@gmail.com> |
net: allow user to set metric on default route learned via Router Advertisement For IPv4, default route is learned via DHCPv4 and user is allowed to change metric using config etc/network/interfaces. But for IPv6, default route can be learned via RA, for which, currently a fixed metric value 1024 is used. Ideally, user should be able to configure metric on default route for IPv6 similar to IPv4. This patch adds sysctl for the same. Logs: For IPv4: Config in etc/network/interfaces: auto eth0 iface eth0 inet dhcp metric 4261413864 IPv4 Kernel Route Table: $ ip route list default via 172.21.47.1 dev eth0 metric 4261413864 FRR Table, if a static route is configured: [In real scenario, it is useful to prefer BGP learned default route over DHCPv4 default route.] Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route S>* 0.0.0.0/0 [20/0] is directly connected, eth0, 00:00:03 K 0.0.0.0/0 [254/1000] via 172.21.47.1, eth0, 6d08h51m i.e. User can prefer Default Router learned via Routing Protocol in IPv4. Similar behavior is not possible for IPv6, without this fix. After fix [for IPv6]: sudo sysctl -w net.ipv6.conf.eth0.net.ipv6.conf.eth0.ra_defrtr_metric=1996489705 IP monitor: [When IPv6 RA is received] default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 pref high Kernel IPv6 routing table $ ip -6 route list default via fe80::be16:65ff:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 21sec hoplimit 64 pref high FRR Table, if a static route is configured: [In real scenario, it is useful to prefer BGP learned default route over IPv6 RA default route.] Codes: K - kernel route, C - connected, S - static, R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP, > - selected route, * - FIB route S>* ::/0 [20/0] is directly connected, eth0, 00:00:06 K ::/0 [119/1001] via fe80::xx16:xxxx:feb3:ce8e, eth0, 6d07h43m If the metric is changed later, the effect will be seen only when next IPv6 RA is received, because the default route must be fully controlled by RA msg. Below metric is changed from 1996489705 to 1996489704. $ sudo sysctl -w net.ipv6.conf.eth0.ra_defrtr_metric=1996489704 net.ipv6.conf.eth0.ra_defrtr_metric = 1996489704 IP monitor: [On next IPv6 RA msg, Kernel deletes prev route and installs new route with updated metric] Deleted default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489705 expires 3sec hoplimit 64 pref high default via fe80::xx16:xxxx:feb3:ce8e dev eth0 proto ra metric 1996489704 pref high Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com> Signed-off-by: Zhenggen Xu <zxu@linkedin.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210125214430.24079-1-pchaudhary@linkedin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ceed9038 |
|
15-Jan-2021 |
Matteo Croce <mcroce@microsoft.com> |
ipv6: set multicast flag on the multicast route The multicast route ff00::/8 is created with type RTN_UNICAST: $ ip -6 -d route unicast ::1 dev lo proto kernel scope global metric 256 pref medium unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium unicast ff00::/8 dev eth0 proto kernel scope global metric 256 pref medium Set the type to RTN_MULTICAST which is more appropriate. Fixes: e8478e80e5a7 ("net/ipv6: Save route type in rt6_info") Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
a826b043 |
|
15-Jan-2021 |
Matteo Croce <mcroce@microsoft.com> |
ipv6: create multicast route with RTPROT_KERNEL The ff00::/8 multicast route is created without specifying the fc_protocol field, so the default RTPROT_BOOT value is used: $ ip -6 -d route unicast ::1 dev lo proto kernel scope global metric 256 pref medium unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium unicast ff00::/8 dev eth0 proto boot scope global metric 256 pref medium As the documentation says, this value identifies routes installed during boot, but the route is created when interface is set up. Change the value to RTPROT_KERNEL which is a better value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ceb736e1 |
|
12-Nov-2020 |
Zhang Qilong <zhangqilong3@huawei.com> |
ipv6: Fix error path to cancel the meseage genlmsg_cancel() needs to be called in the error path of inet6_fill_ifmcaddr and inet6_fill_ifacaddr to cancel the message. Fixes: 6ecf4c37eb3e ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20201112080950.1476302-1-zhangqilong3@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
2c4de211 |
|
31-Oct-2020 |
Xin Long <lucien.xin@gmail.com> |
net: ipv6: For kerneldoc warnings with W=1 net/ipv6/addrconf.c:2005: warning: Function parameter or member 'dev' not described in 'ipv6_dev_find' net/ipv6/ip6_vti.c:138: warning: Function parameter or member 'ip6n' not described in 'vti6_tnl_bucket' net/ipv6/ip6_tunnel.c:218: warning: Function parameter or member 'ip6n' not described in 'ip6_tnl_bucket' net/ipv6/ip6_tunnel.c:238: warning: Function parameter or member 'ip6n' not described in 'ip6_tnl_link' net/ipv6/ip6_tunnel.c:254: warning: Function parameter or member 'ip6n' not described in 'ip6_tnl_unlink' net/ipv6/ip6_tunnel.c:427: warning: Function parameter or member 'raw' not described in 'ip6_tnl_parse_tlv_enc_lim' net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'skb' not described in 'ip6_tnl_err' net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'ipproto' not described in 'ip6_tnl_err' net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'opt' not described in 'ip6_tnl_err' net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'type' not described in 'ip6_tnl_err' net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'code' not described in 'ip6_tnl_err' net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'msg' not described in 'ip6_tnl_err' net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'info' not described in 'ip6_tnl_err' net/ipv6/ip6_tunnel.c:499: warning: Function parameter or member 'offset' not described in 'ip6_tnl_err' ip6_tnl_err() is an internal function, so remove the kerneldoc. For the others, add the missing parameters. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20201031183044.1082193-1-andrew@lunn.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
4ef1a7cb |
|
17-Aug-2020 |
Xin Long <lucien.xin@gmail.com> |
ipv6: some fixes for ipv6_dev_find() This patch is to do 3 things for ipv6_dev_find(): As David A. noticed, - rt6_lookup() is not really needed. Different from __ip_dev_find(), ipv6_dev_find() doesn't have a compatibility problem, so remove it. As Hideaki suggested, - "valid" (non-tentative) check for the address is also needed. ipv6_chk_addr() calls ipv6_chk_addr_and_flags(), which will traverse the address hash list, but it's heavy to be called inside ipv6_dev_find(). This patch is to reuse the code of ipv6_chk_addr_and_flags() for ipv6_dev_find(). - dev parameter is passed into ipv6_dev_find(), as link-local addresses from user space has sin6_scope_id set and the dev lookup needs it. Fixes: 81f6cb31222d ("ipv6: add ipv6_dev_find()") Suggested-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
81f6cb31 |
|
03-Aug-2020 |
Xin Long <lucien.xin@gmail.com> |
ipv6: add ipv6_dev_find() This is to add an ip_dev_find like function for ipv6, used to find the dev by saddr. It will be used by TIPC protocol. So also export it. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae79dbf6 |
|
31-Jul-2020 |
Florent Fourcot <florent.fourcot@wifirst.fr> |
ipv6/addrconf: use a boolean to choose between UNREGISTER/DOWN "how" was used as a boolean. Change the type to bool, and improve variable name Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d208a42a |
|
31-Jul-2020 |
Florent Fourcot <florent.fourcot@wifirst.fr> |
ipv6/addrconf: call addrconf_ifdown with consistent values Second parameter of addrconf_ifdown "how" is used as a boolean internally. It does not make sense to call it with something different of 0 or 1. This value is set to 2 in all git history. Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e3db0bb |
|
19-May-2020 |
Christoph Hellwig <hch@lst.de> |
ipv6: use ->ndo_tunnel_ctl in addrconf_set_dstaddr Use the new ->ndo_tunnel_ctl instead of overriding the address limit and using ->ndo_do_ioctl just to do a pointless user copy. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
68ad6886 |
|
19-May-2020 |
Christoph Hellwig <hch@lst.de> |
ipv6: streamline addrconf_set_dstaddr Factor out a addrconf_set_sit_dstaddr helper for the actual work if we found a SIT device, and only hold the rtnl lock around the device lookup and that new helper, as there is no point in holding it over a copy_from_user call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f0988460 |
|
19-May-2020 |
Christoph Hellwig <hch@lst.de> |
ipv6: stub out even more of addrconf_set_dstaddr if SIT is disabled There is no point in copying the structure from userspace or looking up a device if SIT support is not disabled and we'll eventually return -ENODEV anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9efd6a3c |
|
13-May-2020 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
netns: enable to inherit devconf from current netns The goal is to be able to inherit the initial devconf parameters from the current netns, ie the netns where this new netns has been created. This is useful in a containers environment where /proc/sys is read only. For example, if a pod is created with specifics devconf parameters and has the capability to create netns, the user expects to get the same parameters than his 'init_net', which is not the real init_net in this case. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b0b0fa2 |
|
02-May-2020 |
Eric Biggers <ebiggers@google.com> |
crypto: lib/sha1 - rename "sha" to "sha1" The library implementation of the SHA-1 compression function is confusingly called just "sha_transform()". Alongside it are some "SHA_" constants and "sha_init()". Presumably these are left over from a time when SHA just meant SHA-1. But now there are also SHA-2 and SHA-3, and moreover SHA-1 is now considered insecure and thus shouldn't be used. Therefore, rename these functions and constants to make it very clear that they are for SHA-1. Also add a comment to make it clear that these shouldn't be used. For the extra-misleadingly named "SHA_MESSAGE_BYTES", rename it to SHA1_BLOCK_SIZE and define it to just '64' rather than '(512/8)' so that it matches the same definition in <crypto/sha.h>. This prepares for merging <linux/cryptohash.h> into <crypto/sha.h>. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
#
969c5464 |
|
30-Apr-2020 |
Fernando Gont <fgont@si6networks.com> |
ipv6: Implement draft-ietf-6man-rfc4941bis Implement the upcoming rev of RFC4941 (IPv6 temporary addresses): https://tools.ietf.org/html/draft-ietf-6man-rfc4941bis-09 * Reduces the default Valid Lifetime to 2 days The number of extra addresses employed when Valid Lifetime was 7 days exacerbated the stress caused on network elements/devices. Additionally, the motivation for temporary addresses is indeed privacy and reduced exposure. With a default Valid Lifetime of 7 days, an address that becomes revealed by active communication is reachable and exposed for one whole week. The only use case for a Valid Lifetime of 7 days could be some application that is expecting to have long lived connections. But if you want to have a long lived connections, you shouldn't be using a temporary address in the first place. Additionally, in the era of mobile devices, general applications should nevertheless be prepared and robust to address changes (e.g. nodes swap wifi <-> 4G, etc.) * Employs different IIDs for different prefixes To avoid network activity correlation among addresses configured for different prefixes * Uses a simpler algorithm for IID generation No need to store "history" anywhere Signed-off-by: Fernando Gont <fgont@si6networks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
11dd74b3 |
|
27-Apr-2020 |
Roopa Prabhu <roopa@cumulusnetworks.com> |
net: ipv6: new arg skip_notify to ip6_rt_del Used in subsequent work to skip route delete notifications on nexthop deletes. Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
32927393 |
|
24-Apr-2020 |
Christoph Hellwig <hch@lst.de> |
sysctl: pass kernel pointers to ->proc_handler Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
b75326c2 |
|
19-Apr-2020 |
Fernando Gont <fgont@si6networks.com> |
ipv6: Honor all IPv6 PIO Valid Lifetime values RFC4862 5.5.3 e) prevents received Router Advertisements from reducing the Valid Lifetime of configured addresses to less than two hours, thus preventing hosts from reacting to the information provided by a router that has positive knowledge that a prefix has become invalid. This patch makes hosts honor all Valid Lifetime values, as per draft-gont-6man-slaac-renum-06, Section 4.2. This is meant to help mitigate the problem discussed in draft-ietf-v6ops-slaac-renum. Note: Attacks aiming at disabling an advertised prefix via a Valid Lifetime of 0 are not really more harmful than other attacks that can be performed via forged RA messages, such as those aiming at completely disabling a next-hop router via an RA that advertises a Router Lifetime of 0, or performing a Denial of Service (DoS) attack by advertising illegitimate prefixes via forged PIOs. In scenarios where RA-based attacks are of concern, proper mitigations such as RA-Guard [RFC6105] [RFC7113] should be implemented. Signed-off-by: Fernando Gont <fgont@si6networks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
19e16d22 |
|
01-Apr-2020 |
Hangbin Liu <liuhangbin@gmail.com> |
neigh: support smaller retrans_time settting Currently, we limited the retrans_time to be greater than HZ/2. i.e. setting retrans_time less than 500ms will not work. This makes the user unable to achieve a more accurate control for bonding arp fast failover. Update the sanity check to HZ/100, which is 10ms, to let users have more ability on the retrans_time control. v3: sync the behavior with IPv6 and update all the timer handler v2: use HZ instead of hard code number Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
744fdc82 |
|
30-Mar-2020 |
Jarod Wilson <jarod@redhat.com> |
ipv6: don't auto-add link-local address to lag ports Bonding slave and team port devices should not have link-local addresses automatically added to them, as it can interfere with openvswitch being able to properly add tc ingress. Basic reproducer, courtesy of Marcelo: $ ip link add name bond0 type bond $ ip link set dev ens2f0np0 master bond0 $ ip link set dev ens2f1np2 master bond0 $ ip link set dev bond0 up $ ip a s 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff 5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff 11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link valid_lft forever preferred_lft forever (above trimmed to relevant entries, obviously) $ sysctl net.ipv6.conf.ens2f0np0.addr_gen_mode=0 net.ipv6.conf.ens2f0np0.addr_gen_mode = 0 $ sysctl net.ipv6.conf.ens2f1np2.addr_gen_mode=0 net.ipv6.conf.ens2f1np2.addr_gen_mode = 0 $ ip a l ens2f0np0 2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative valid_lft forever preferred_lft forever $ ip a l ens2f1np2 5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN group default qlen 1000 link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative valid_lft forever preferred_lft forever Looks like addrconf_sysctl_addr_gen_mode() bypasses the original "is this a slave interface?" check added by commit c2edacf80e15, and results in an address getting added, while w/the proposed patch added, no address gets added. This simply adds the same gating check to another code path, and thus should prevent the same devices from erroneously obtaining an ipv6 link-local address. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Reported-by: Moshe Levi <moshele@mellanox.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Marcelo Ricardo Leitner <mleitner@redhat.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8610c7c6 |
|
27-Mar-2020 |
Alexander Aring <alex.aring@gmail.com> |
net: ipv6: add support for rpl sr exthdr This patch adds rpl source routing receive handling. Everything works only if sysconf "rpl_seg_enabled" and source routing is enabled. Mostly the same behaviour as IPv6 segmentation routing. To handle compression and uncompression a rpl.c file is created which contains the necessary functionality. The receive handling will also care about IPv6 encapsulated so far it's specified as possible nexthdr in RFC 6554. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f37c6059 |
|
27-Mar-2020 |
Alexander Aring <alex.aring@gmail.com> |
addrconf: add functionality to check on rpl requirements This patch adds a functionality to addrconf to check on a specific RPL address configuration. According to RFC 6554: To detect loops in the SRH, a router MUST determine if the SRH includes multiple addresses assigned to any interface on that router. If such addresses appear more than once and are separated by at least one address not assigned to that router. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a8eceea8 |
|
12-Mar-2020 |
Joe Perches <joe@perches.com> |
inet: Use fallthrough; Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/ And by hand: net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block that causes gcc to emit a warning if converted in-place. So move the new fallthrough; inside the containing #ifdef/#endif too. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
60380488 |
|
10-Mar-2020 |
Hangbin Liu <liuhangbin@gmail.com> |
ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface Rafał found an issue that for non-Ethernet interface, if we down and up frequently, the memory will be consumed slowly. The reason is we add allnodes/allrouters addressed in multicast list in ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up() for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb getting bigger and bigger. The call stack looks like: addrconf_notify(NETDEV_REGISTER) ipv6_add_dev ipv6_dev_mc_inc(ff01::1) ipv6_dev_mc_inc(ff02::1) ipv6_dev_mc_inc(ff02::2) addrconf_notify(NETDEV_UP) addrconf_dev_config /* Alas, we support only Ethernet autoconfiguration. */ return; addrconf_notify(NETDEV_DOWN) addrconf_ifdown ipv6_mc_down igmp6_group_dropped(ff02::2) mld_add_delrec(ff02::2) igmp6_group_dropped(ff02::1) igmp6_group_dropped(ff01::1) After investigating, I can't found a rule to disable multicast on non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM, tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up() in inetdev_event(). Even for IPv6, we don't check the dev type and call ipv6_add_dev(), ipv6_dev_mc_inc() after register device. So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for non-Ethernet interface. v2: Also check IFF_MULTICAST flag to make sure the interface supports multicast Reported-by: Rafał Miłecki <zajec5@gmail.com> Tested-by: Rafał Miłecki <zajec5@gmail.com> Fixes: 74235a25c673 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels") Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when set link down") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d0098e4c |
|
02-Mar-2020 |
Hangbin Liu <liuhangbin@gmail.com> |
net/ipv6: remove the old peer route if change it to a new one When we modify the peer route and changed it to a new one, we should remove the old route first. Before the fix: + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 256 pref medium 2001:db8::2 proto kernel metric 256 pref medium + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 256 pref medium 2001:db8::2 proto kernel metric 256 pref medium After the fix: + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 256 pref medium 2001:db8::3 proto kernel metric 256 pref medium This patch depend on the previous patch "net/ipv6: need update peer route when modify metric" to update new peer route after delete old one. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
61794012 |
|
02-Mar-2020 |
Hangbin Liu <liuhangbin@gmail.com> |
net/ipv6: need update peer route when modify metric When we modify the route metric, the peer address's route need also be updated. Before the fix: + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2 metric 60 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 60 pref medium 2001:db8::2 proto kernel metric 60 pref medium + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 61 pref medium 2001:db8::2 proto kernel metric 60 pref medium After the fix: + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61 + ip -6 route show dev dummy1 2001:db8::1 proto kernel metric 61 pref medium 2001:db8::2 proto kernel metric 61 pref medium Fixes: 8308f3ff1753 ("net/ipv6: Add support for specifying metric of connected routes") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07758eb9 |
|
29-Feb-2020 |
Hangbin Liu <liuhangbin@gmail.com> |
net/ipv6: use configured metric when add peer route When we add peer address with metric configured, IPv4 could set the dest metric correctly, but IPv6 do not. e.g. ]# ip addr add 192.0.2.1 peer 192.0.2.2/32 dev eth1 metric 20 ]# ip route show dev eth1 192.0.2.2 proto kernel scope link src 192.0.2.1 metric 20 ]# ip addr add 2001:db8::1 peer 2001:db8::2/128 dev eth1 metric 20 ]# ip -6 route show dev eth1 2001:db8::1 proto kernel metric 20 pref medium 2001:db8::2 proto kernel metric 256 pref medium Fix this by using configured metric instead of default one. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 8308f3ff1753 ("net/ipv6: Add support for specifying metric of connected routes") Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
db3fa271 |
|
07-Feb-2020 |
Eric Dumazet <edumazet@google.com> |
ipv6/addrconf: fix potential NULL deref in inet6_set_link_af() __in6_dev_get(dev) called from inet6_set_link_af() can return NULL. The needed check has been recently removed, let's add it back. While do_setlink() does call validate_linkmsg() : ... err = validate_linkmsg(dev, tb); /* OK at this point */ ... It is possible that the following call happening before the ->set_link_af() removes IPv6 if MTU is less than 1280 : if (tb[IFLA_MTU]) { err = dev_set_mtu_ext(dev, nla_get_u32(tb[IFLA_MTU]), extack); if (err < 0) goto errout; status |= DO_SETLINK_MODIFIED; } ... if (tb[IFLA_AF_SPEC]) { ... err = af_ops->set_link_af(dev, af); ->inet6_set_link_af() // CRASH because idev is NULL Please note that IPv4 is immune to the bug since inet_set_link_af() does : struct in_device *in_dev = __in_dev_get_rcu(dev); if (!in_dev) return -EAFNOSUPPORT; This problem has been mentioned in commit cf7afbfeb8ce ("rtnl: make link af-specific updates atomic") changelog : This method is not fail proof, while it is currently sufficient to make set_link_af() inerrable and thus 100% atomic, the validation function method will not be able to detect all error scenarios in the future, there will likely always be errors depending on states which are f.e. not protected by rtnl_mutex and thus may change between validation and setting. IPv6: ADDRCONF(NETDEV_CHANGE): lo: link becomes ready general protection fault, probably for non-canonical address 0xdffffc0000000056: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000002b0-0x00000000000002b7] CPU: 0 PID: 9698 Comm: syz-executor712 Not tainted 5.5.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733 Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00 RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6 RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0 RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0 R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000 FS: 0000000000c46880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f0494ca0d0 CR3: 000000009e4ac000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: do_setlink+0x2a9f/0x3720 net/core/rtnetlink.c:2754 rtnl_group_changelink net/core/rtnetlink.c:3103 [inline] __rtnl_newlink+0xdd1/0x1790 net/core/rtnetlink.c:3257 rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3377 rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5438 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5456 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 ____sys_sendmsg+0x753/0x880 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0x105/0x1d0 net/socket.c:2430 __do_sys_sendmsg net/socket.c:2439 [inline] __se_sys_sendmsg net/socket.c:2437 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4402e9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffd62fbcf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004402e9 RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000008 R09: 00000000004002c8 R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000401b70 R13: 0000000000401c00 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: ---[ end trace cfa7664b8fdcdff3 ]--- RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733 Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00 RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6 RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0 RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0 R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000 FS: 0000000000c46880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000004 CR3: 000000009e4ac000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 7dc2bccab0ee ("Validate required parameters in inet6_validate_link_af") Signed-off-by: Eric Dumazet <edumazet@google.com> Bisected-and-reported-by: syzbot <syzkaller@googlegroups.com> Cc: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2beb6d29 |
|
11-Dec-2019 |
Hangbin Liu <liuhangbin@gmail.com> |
ipv6/addrconf: only check invalid header values when NETLINK_F_STRICT_CHK is set In commit 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for doit handlers") we add strict check for inet6_rtm_getaddr(). But we did the invalid header values check before checking if NETLINK_F_STRICT_CHK is set. This may break backwards compatibility if user already set the ifm->ifa_prefixlen, ifm->ifa_flags, ifm->ifa_scope in their netlink code. I didn't move the nlmsg_len check because I thought it's a valid check. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 4b1373de73a3 ("net: ipv6: addr: perform strict checks also for doit handlers") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
#
bd74708c |
|
16-Oct-2019 |
Mahesh Bandewar <maheshb@google.com> |
Revert "blackhole_netdev: fix syzkaller reported issue" This reverts commit b0818f80c8c1bc215bba276bd61c216014fab23b. Started seeing weird behavior after this patch especially in the IPv6 code path. Haven't root caused it, but since this was applied to net branch, taking a precautionary measure to revert it and look / analyze those failures Revert this now and I'll send a better fix after analysing / fixing the weirdness observed. CC: Eric Dumazet <edumazet@google.com> CC: Wei Wang <weiwan@google.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b0818f80 |
|
11-Oct-2019 |
Mahesh Bandewar <maheshb@google.com> |
blackhole_netdev: fix syzkaller reported issue While invalidating the dst, we assign backhole_netdev instead of loopback device. However, this device does not have idev pointer and hence no ip6_ptr even if IPv6 is enabled. Possibly this has triggered the syzbot reported crash. The syzbot report does not have reproducer, however, this is the only device that doesn't have matching idev created. Crash instruction is : static inline bool ip6_ignore_linkdown(const struct net_device *dev) { const struct inet6_dev *idev = __in6_dev_get(dev); return !!idev->cnf.ignore_routes_with_linkdown; <= crash } Also ipv6 always assumes presence of idev and never checks for it being NULL (as does the above referenced code). So adding a idev for the blackhole_netdev to avoid this class of crashes in the future. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2d819d25 |
|
04-Oct-2019 |
David Ahern <dsahern@gmail.com> |
ipv6: Handle missing host route in __ipv6_ifa_notify Rajendra reported a kernel panic when a link was taken down: [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290 <snip> [ 6870.570501] Call Trace: [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40 [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0 [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260 [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430 [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430 [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490 [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430 [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0 [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60 [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70 [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0 addrconf_dad_work is kicked to be scheduled when a device is brought up. There is a race between addrcond_dad_work getting scheduled and taking the rtnl lock and a process taking the link down (under rtnl). The latter removes the host route from the inet6_addr as part of addrconf_ifdown which is run for NETDEV_DOWN. The former attempts to use the host route in __ipv6_ifa_notify. If the down event removes the host route due to the race to the rtnl, then the BUG listed above occurs. Since the DAD sequence can not be aborted, add a check for the missing host route in __ipv6_ifa_notify. The only way this should happen is due to the previously mentioned race. The host route is created when the address is added to an interface; it is only removed on a down event where the address is kept. Add a warning if the host route is missing AND the device is up; this is a situation that should never happen. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8ae72cbf |
|
03-Oct-2019 |
David Ahern <dsahern@gmail.com> |
Revert "ipv6: Handle race in addrconf_dad_work" This reverts commit a3ce2a21bb8969ae27917281244fa91bf5f286d7. Eric reported tests failings with commit. After digging into it, the bottom line is that the DAD sequence is not to be messed with. There are too many cases that are expected to proceed regardless of whether a device is up. Revert the patch and I will send a different solution for the problem Rajendra reported. Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a3ce2a21 |
|
30-Sep-2019 |
David Ahern <dsahern@gmail.com> |
ipv6: Handle race in addrconf_dad_work Rajendra reported a kernel panic when a link was taken down: [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290 <snip> [ 6870.570501] Call Trace: [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40 [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0 [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260 [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430 [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430 [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490 [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430 [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0 [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70 [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60 [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70 [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0 addrconf_dad_work is kicked to be scheduled when a device is brought up. There is a race between addrcond_dad_work getting scheduled and taking the rtnl lock and a process taking the link down (under rtnl). The latter removes the host route from the inet6_addr as part of addrconf_ifdown which is run for NETDEV_DOWN. The former attempts to use the host route in ipv6_ifa_notify. If the down event removes the host route due to the race to the rtnl, then the BUG listed above occurs. This scenario does not occur when the ipv6 address is not kept (net.ipv6.conf.all.keep_addr_on_down = 0) as addrconf_ifdown sets the state of the ifp to DEAD. Handle when the addresses are kept by checking IF_READY which is reset by addrconf_ifdown. The 'dead' flag for an inet6_addr is set only under rtnl, in addrconf_ifdown and it means the device is getting removed (or IPv6 is disabled). The interesting cases for changing the idev flag are addrconf_notify (NETDEV_UP and NETDEV_CHANGE) and addrconf_ifdown (reset the flag). The former does not have the idev lock - only rtnl; the latter has both. Based on that the existing dead + IF_READY check can be moved to right after the rtnl_lock in addrconf_dad_work. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0d7982ce |
|
30-Sep-2019 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipv6: minor code reorg in inet6_fill_ifla6_attrs() Just put related code together to ease code reading: the memcpy() is related to the nla_reserve(). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
db0b99f5 |
|
23-Aug-2019 |
Sabrina Dubroca <sd@queasysnail.net> |
ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails, ignoring the specific error value. This results in addrconf_add_dev returning ENOBUFS in all cases, which is unfortunate in cases such as: # ip link add dummyX type dummy # ip link set dummyX mtu 1200 up # ip addr add 2000::/64 dev dummyX RTNETLINK answers: No buffer space available Commit a317a2f19da7 ("ipv6: fail early when creating netdev named all or default") introduced error returns in ipv6_add_dev. Before that, that function would simply return NULL for all failures. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f17f7648 |
|
19-Aug-2019 |
Hangbin Liu <liuhangbin@gmail.com> |
ipv6/addrconf: allow adding multicast addr if IFA_F_MCAUTOJOIN is set In commit 93a714d6b53d ("multicast: Extend ip address command to enable multicast group join/leave on") we added a new flag IFA_F_MCAUTOJOIN to make user able to add multicast address on ethernet interface. This works for IPv4, but not for IPv6. See the inet6_addr_add code. static int inet6_addr_add() { ... if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, true...) } ifp = ipv6_add_addr(idev, cfg, true, extack); <- always fail with maddr if (!IS_ERR(ifp)) { ... } else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, false...) } } But in ipv6_add_addr() it will check the address type and reject multicast address directly. So this feature is never worked for IPv6. We should not remove the multicast address check totally in ipv6_add_addr(), but could accept multicast address only when IFA_F_MCAUTOJOIN flag supplied. v2: update commit description Fixes: 93a714d6b53d ("multicast: Extend ip address command to enable multicast group join/leave on") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eec4844f |
|
18-Jul-2019 |
Matteo Croce <mcroce@redhat.com> |
proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f88d8ea6 |
|
03-Jun-2019 |
David Ahern <dsahern@gmail.com> |
ipv6: Plumb support for nexthop object in a fib6_info Add struct nexthop and nh_list list_head to fib6_info. nh_list is the fib6_info side of the nexthop <-> fib_info relationship. Since a fib6_info referencing a nexthop object can not have 'sibling' entries (the old way of doing multipath routes), the nh_list is a union with fib6_siblings. Add f6i_list list_head to 'struct nexthop' to track fib6_info entries using a nexthop instance. Update __remove_nexthop_fib to walk f6_list and delete fib entries using the nexthop. Add a few nexthop helpers for use when a nexthop is added to fib6_info: - nexthop_fib6_nh - return first fib6_nh in a nexthop object - fib6_info_nh_dev moved to nexthop.h and updated to use nexthop_fib6_nh if the fib6_info references a nexthop object - nexthop_path_fib6_result - similar to ipv4, select a path within a multipath nexthop object. If the nexthop is a blackhole, set fib6_result type to RTN_BLACKHOLE, and set the REJECT flag Update the fib6_info references to check for nh and take a different path as needed: - rt6_qualify_for_ecmp - if a fib entry uses a nexthop object it can NOT be coalesced with other fib entries into a multipath route - rt6_duplicate_nexthop - use nexthop_cmp if either fib6_info references a nexthop - addrconf (host routes), RA's and info entries (anything configured via ndisc) does not use nexthop objects - fib6_info_destroy_rcu - put reference to nexthop object - fib6_purge_rt - drop fib6_info from f6i_list - fib6_select_path - update to use the new nexthop_path_fib6_result when fib entry uses a nexthop object - rt6_device_match - update to catch use of nexthop object as a blackhole and set fib6_type and flags. - ip6_route_info_create - don't add space for fib6_nh if fib entry is going to reference a nexthop object, take a reference to nexthop object, disallow use of source routing - rt6_nlmsg_size - add space for RTA_NH_ID - add rt6_fill_node_nexthop to add nexthop data on a dump As with ipv4, most of the changes push existing code into the else branch of whether the fib entry uses a nexthop object. Update the nexthop code to walk f6i_list on a nexthop deleted to remove fib entries referencing it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cd5a411d |
|
31-May-2019 |
Florian Westphal <fw@strlen.de> |
net: use new in_dev_ifa iterators Use in_dev_for_each_ifa_rcu/rtnl instead. This prevents sparse warnings once proper __rcu annotations are added. Signed-off-by: Florian Westphal <fw@strlen.de> t di# Last commands done (6 commands done): Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2874c5fd |
|
27-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
1cf844c7 |
|
22-May-2019 |
David Ahern <dsahern@gmail.com> |
ipv6: Make fib6_nh optional at the end of fib6_info Move fib6_nh to the end of fib6_info and make it an array of size 0. Pass a flag to fib6_info_alloc indicating if the allocation needs to add space for a fib6_nh. The current code path always has a fib6_nh allocated with a fib6_info; with nexthop objects they will be separate. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f40b6ae2 |
|
22-May-2019 |
David Ahern <dsahern@gmail.com> |
ipv6: Move pcpu cached routes to fib6_nh rt6_info are specific instances of a fib entry and are tied to a device and gateway - ie., a nexthop. Before nexthop objects, IPv6 fib entries have separate fib6_info for each nexthop in a multipath route, so the location of the pcpu cache in the fib6_info struct worked. However, with nexthop objects a fib6_info can point to a set of nexthops (yet another alignment of ipv6 with ipv4). Accordingly, the pcpu cache needs to be moved to the fib6_nh struct so the cached entries are local to the nexthop specification used to create the rt6_info. Initialization and free of the pcpu entries moved to fib6_nh_init and fib6_nh_release. Change in location only, from fib6_info down to fib6_nh; no other functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7dc2bcca |
|
21-May-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
Validate required parameters in inet6_validate_link_af inet6_set_link_af requires that at least one of IFLA_INET6_TOKEN or IFLA_INET6_ADDR_GET_MODE is passed. If none of them is passed, it returns -EINVAL, which may cause do_setlink() to fail in the middle of processing other commands and give the following warning message: A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check. Check the presence of at least one of them in inet6_validate_link_af to detect invalid parameters at an early stage, before do_setlink does anything. Also validate the address generation mode at an early stage. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8cb08174 |
|
26-Apr-2019 |
Johannes Berg <johannes.berg@intel.com> |
netlink: make validation more configurable for future strictness We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be *everything*. The current *_strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to *_parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae0be8de |
|
26-Apr-2019 |
Michal Kubecek <mkubecek@suse.cz> |
netlink: make nla_nest_start() add NLA_F_NESTED flag Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bdf00467 |
|
05-Apr-2019 |
David Ahern <dsahern@gmail.com> |
net: Replace nhc_has_gw with nhc_gw_family Allow the gateway in a fib_nh_common to be from a different address family than the outer fib{6}_nh. To that end, replace nhc_has_gw with nhc_gw_family and update users of nhc_has_gw to check nhc_gw_family. Now nhc_family is used to know if the nh_common is part of a fib_nh or fib6_nh (used for container_of to get to route family specific data), and nhc_gw_family represents the address family for the gateway. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ad1601ae |
|
27-Mar-2019 |
David Ahern <dsahern@gmail.com> |
ipv6: Rename fib6_nh entries Rename fib6_nh entries that will be moved to a fib_nh_common struct. Specifically, the device, gateway, flags, and lwtstate are common with all nexthop definitions. In some places new temporary variables are declared or local variables renamed to maintain line lengths. Rename only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2b2450ca |
|
27-Mar-2019 |
David Ahern <dsahern@gmail.com> |
ipv6: Move gateway checks to a fib6_nh setting The gateway setting is not per fib6_info entry but per-fib6_nh. Add a new fib_nh_has_gw flag to fib6_nh and convert references to RTF_GATEWAY to the new flag. For IPv6 address the flag is cheaper than checking that nh_gw is non-0 like IPv4 does. While this increases fib6_nh by 8-bytes, the effective allocation size of a fib6_info is unchanged. The 8 bytes is recovered later with a fib_nh_common change. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a154d5d8 |
|
04-Mar-2019 |
Arnd Bergmann <arnd@arndb.de> |
net: ignore sysctl_devconf_inherit_init_net without SYSCTL When CONFIG_SYSCTL is turned off, we get a link failure for the newly introduced tuning knob. net/ipv6/addrconf.o: In function `addrconf_init_net': addrconf.c:(.text+0x31dc): undefined reference to `sysctl_devconf_inherit_init_net' Add an IS_ENABLED() check to fall back to the default behavior (sysctl_devconf_inherit_init_net=0) here. Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e75913c9 |
|
10-Feb-2019 |
Zhiqiang Liu <liuzhiqiang26@huawei.com> |
net: fix IPv6 prefix route residue Follow those steps: # ip addr add 2001:123::1/32 dev eth0 # ip addr add 2001:123:456::2/64 dev eth0 # ip addr del 2001:123::1/32 dev eth0 # ip addr del 2001:123:456::2/64 dev eth0 and then prefix route of 2001:123::1/32 will still exist. This is because ipv6_prefix_equal in check_cleanup_prefix_route func does not check whether two IPv6 addresses have the same prefix length. If the prefix of one address starts with another shorter address prefix, even though their prefix lengths are different, the return value of ipv6_prefix_equal is true. Here I add a check of whether two addresses have the same prefix to decide whether their prefixes are equal. Fixes: 5b84efecb7d9 ("ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE") Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Reported-by: Wenhao Zhang <zhangwenhao8@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7c62b8dd |
|
21-Jan-2019 |
Lubomir Rintel <lkundrak@v3.sk> |
net/ipv6: lower the level of "link is not ready" messages This message gets logged far too often for how interesting is it. Most distributions nowadays configure NetworkManager to use randomly generated MAC addresses for Wi-Fi network scans. The interfaces end up being periodically brought down for the address change. When they're subsequently brought back up, the message is logged, eventually flooding the log. Perhaps the message is not all that helpful: it seems to be more interesting to hear when the addrconf actually start, not when it does not. Let's lower its level. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Acked-By: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1518039f |
|
22-Jan-2019 |
Jakub Kicinski <kuba@kernel.org> |
net/ipv6: don't return positive numbers when nothing was dumped in6_dump_addrs() returns a positive 1 if there was nothing to dump. This return value can not be passed as return from inet6_dump_addr() as is, because it will confuse rtnetlink, resulting in NLMSG_DONE never getting set: $ ip addr list dev lo EOF on netlink Dump terminated v2: flip condition to avoid a new goto (DaveA) Fixes: 7c1e8a3817c5 ("netlink: fixup regression in RTM_GETADDR") Reported-by: Brendan Galloway <brendan.galloway@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
856c395c |
|
18-Jan-2019 |
Cong Wang <xiyou.wangcong@gmail.com> |
net: introduce a knob to control whether to inherit devconf config There have been many people complaining about the inconsistent behaviors of IPv4 and IPv6 devconf when creating new network namespaces. Currently, for IPv4, we inherit all current settings from init_net, but for IPv6 we reset all setting to default. This patch introduces a new /proc file /proc/sys/net/core/devconf_inherit_init_net to control the behavior of whether to inhert sysctl current settings from init_net. This file itself is only available in init_net. As demonstrated below: Initial setup in init_net: # cat /proc/sys/net/ipv4/conf/all/rp_filter 2 # cat /proc/sys/net/ipv6/conf/all/accept_dad 1 Default value 0 (current behavior): # ip netns del test # ip netns add test # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter 2 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad 0 Set to 1 (inherit from init_net): # echo 1 > /proc/sys/net/core/devconf_inherit_init_net # ip netns del test # ip netns add test # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter 2 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad 1 Set to 2 (reset to default): # echo 2 > /proc/sys/net/core/devconf_inherit_init_net # ip netns del test # ip netns add test # ip netns exec test cat /proc/sys/net/ipv4/conf/all/rp_filter 0 # ip netns exec test cat /proc/sys/net/ipv6/conf/all/accept_dad 0 Set to a value out of range (invalid): # echo 3 > /proc/sys/net/core/devconf_inherit_init_net -bash: echo: write error: Invalid argument # echo -1 > /proc/sys/net/core/devconf_inherit_init_net -bash: echo: write error: Invalid argument Reported-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> Reported-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
38d51810 |
|
18-Jan-2019 |
Jakub Kicinski <kuba@kernel.org> |
net: ipv6: netconf: perform strict checks also for doit handlers Make RTM_GETNETCONF's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4b1373de |
|
18-Jan-2019 |
Jakub Kicinski <kuba@kernel.org> |
net: ipv6: addr: perform strict checks also for doit handlers Make RTM_GETADDR's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7c1e8a38 |
|
30-Dec-2018 |
Arthur Gautier <baloo@gandi.net> |
netlink: fixup regression in RTM_GETADDR This commit fixes a regression in AF_INET/RTM_GETADDR and AF_INET6/RTM_GETADDR. Before this commit, the kernel would stop dumping addresses once the first skb was full and end the stream with NLMSG_DONE(-EMSGSIZE). The error shouldn't be sent back to netlink_dump so the callback is kept alive. The userspace is expected to call back with a new empty skb. Changes from V1: - The error is not handled in netlink_dump anymore but rather in inet_dump_ifaddr and inet6_dump_addr directly as suggested by David Ahern. Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to bad attributes") Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to bad attributes") Cc: David Ahern <dsahern@gmail.com> Cc: "David S . Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Arthur Gautier <baloo@gandi.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
178fe944 |
|
28-Dec-2018 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
net/ipv6: Fix a test against 'ipv6_find_idev()' return value 'ipv6_find_idev()' returns NULL on error, not an error pointer. Update the test accordingly and return -ENOBUFS, as already done in 'addrconf_add_dev()', if NULL is returned. Fixes: ("ipv6: allow userspace to add IFA_F_OPTIMISTIC addresses") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
00f54e68 |
|
06-Dec-2018 |
Petr Machata <petrm@mellanox.com> |
net: core: dev: Add extack argument to dev_open() In order to pass extack together with NETDEV_PRE_UP notifications, it's necessary to route the extack to __dev_open() from diverse (possibly indirect) callers. One prominent API through which the notification is invoked is dev_open(). Therefore extend dev_open() with and extra extack argument and update all users. Most of the calls end up just encoding NULL, but bond and team drivers have the extack readily available. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
896585d4 |
|
21-Nov-2018 |
Hangbin Liu <liuhangbin@gmail.com> |
net/ipv6: re-do dad when interface has IFF_NOARP flag change When we add a new IPv6 address, we should also join corresponding solicited-node multicast address, unless the interface has IFF_NOARP flag, as function addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do not do dad and add the mcast address. So we will drop corresponding neighbour discovery message that came from other nodes. A typical example is after creating a ipvlan with mode l3, setting up an ipv6 address and changing the mode to l2. Then we will not be able to ping this address as the interface doesn't join related solicited-node mcast address. Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add corresponding mcast group and check if there is a duplicate address on the network. Reported-by: Jianlin Shi <jishi@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bf4cc40e |
|
25-Oct-2018 |
Bjørn Mork <bjorn@mork.no> |
net/{ipv4,ipv6}: Do not put target net if input nsid is invalid The cleanup path will put the target net when netnsid is set. So we must reset netnsid if the input is invalid. Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to bad attributes") Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to bad attributes") Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
242afaa6 |
|
24-Oct-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Put target net when address dump fails due to bad attributes If tgt_net is set based on IFA_TARGET_NETNSID attribute in the dump request, make sure all error paths call put_net. Fixes: 6371a71f3a3b ("net/ipv6: Add support for dumping addresses for a specific device") Fixes: ed6eff11790a ("net/ipv6: Update inet6_dump_addr for strict data checking") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6371a71f |
|
19-Oct-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Add support for dumping addresses for a specific device If an RTM_GETADDR dump request has ifa_index set in the ifaddrmsg header, then return only the addresses for that device. Since inet6_dump_addr is reused for multicast and anycast addresses, this adds support for device specfic dumps of RTM_GETMULTICAST and RTM_GETANYCAST as well. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fe884c2b |
|
19-Oct-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Remove ip_idx arg to in6_dump_addrs ip_idx is always 0 going into in6_dump_addrs; it is passed as a pointer to save the last good index into cb. Since cb is already argument to in6_dump_addrs, just save the value there. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4ba4c566 |
|
19-Oct-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs The loop wants to skip previously dumped addresses, so loops until current index >= saved index. If the message fills it wants to save the index for the next address to dump - ie., the one that did not fit in the current message. Currently, it is incrementing the index counter before comparing to the saved index, and then the saved index is off by 1 - it assumes the current address is going to fit in the message. Change the index handling to increment only after a succesful dump. Fixes: 502a2ffd7376a ("ipv6: convert idev_list to list macros") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
addd383f |
|
07-Oct-2018 |
David Ahern <dsahern@gmail.com> |
net: Update netconf dump handlers for strict data checking Update inet_netconf_dump_devconf, inet6_netconf_dump_devconf, and mpls_netconf_dump_devconf for strict data checking. If the flag is set, the dump request is expected to have an netconfmsg struct as the header. The struct only has the family member and no attributes can be appended. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
786e0007 |
|
07-Oct-2018 |
David Ahern <dsahern@gmail.com> |
rtnetlink: Update inet6_dump_ifinfo for strict data checking Update inet6_dump_ifinfo for strict data checking. If the flag is set, the dump request is expected to have an ifinfomsg struct as the header. All elements of the struct are expected to be 0 and no attributes can be appended. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ed6eff11 |
|
07-Oct-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Update inet6_dump_addr for strict data checking Update inet6_dump_addr for strict data checking. If the flag is set, the dump request is expected to have an ifaddrmsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values suppored by the dump handler are allowed to be non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID attribute is supported. Follow on patches can add support for other fields (e.g., honor ifa_index and only return data for the given device index). Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ba1e6e8 |
|
07-Oct-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Refactor address dump to push inet6_fill_args to in6_dump_addrs Pull the inet6_fill_args arg up to in6_dump_addrs and move netnsid into it. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dac9c979 |
|
07-Oct-2018 |
David Ahern <dsahern@gmail.com> |
net: Add extack to nlmsg_parse Make sure extack is passed to nlmsg_parse where easy to do so. Most of these are dump handlers and leveraging the extack in the netlink_callback. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
86f9bd1f |
|
20-Sep-2018 |
Jeff Barnhill <0xeffeff@gmail.com> |
net/ipv6: Display all addresses in output of /proc/net/if_inet6 The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly handle starting/stopping the iteration. The problem is that at some point during the iteration, an overflow is detected and the process is subsequently stopped. The item being shown via seq_printf() when the overflow occurs is not actually shown, though. When start() is subsequently called to resume iterating, it returns the next item, and thus the item that was being processed when the overflow occurred never gets printed. Alter the meaning of the private data member "offset". Currently, when it is not 0 (which only happens at the very beginning), "offset" represents the next hlist item to be printed. After this change, "offset" always represents the current item. This is also consistent with the private data member "bucket", which represents the current bucket, and also the use of "pos" as defined in seq_file.txt: The pos passed to start() will always be either zero, or the most recent pos used in the previous session. Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ede0bbc |
|
19-Sep-2018 |
Robert Shearman <rshearma@vyatta.att-mail.com> |
ipv6: Allow the l3mdev to be a loopback There is no way currently for an IPv6 client connect using a loopback address in a VRF, whereas for IPv4 the loopback address can be added: $ sudo ip addr add dev vrfred 127.0.0.1/8 $ sudo ip -6 addr add ::1/128 dev vrfred RTNETLINK answers: Cannot assign requested address So allow ::1 to be configured on an L3 master device. In order for this to be usable ip_route_output_flags needs to not consider ::1 to be a link scope address (since oif == l3mdev and so it would be dropped), and ipv6_rcv needs to consider the l3mdev to be a loopback device so that it doesn't drop the packets. Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
203651b6 |
|
04-Sep-2018 |
Christian Brauner <christian@brauner.io> |
ipv6: add inet6_fill_args inet6_fill_if{addr,mcaddr, acaddr}() already took 6 arguments which meant the 7th argument would need to be pushed onto the stack on x86. Add a new struct inet6_fill_args which holds common information passed to inet6_fill_if{addr,mcaddr, acaddr}() and shortens the functions to three pointer arguments. Signed-off-by: Christian Brauner <christian@brauner.io> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ecf4c37 |
|
04-Sep-2018 |
Christian Brauner <christian@brauner.io> |
ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR - Backwards Compatibility: If userspace wants to determine whether ipv6 RTM_GETADDR requests support the new IFA_TARGET_NETNSID property it should verify that the reply includes the IFA_TARGET_NETNSID property. If it does not userspace should assume that IFA_TARGET_NETNSID is not supported for ipv6 RTM_GETADDR requests on this kernel. - From what I gather from current userspace tools that make use of RTM_GETADDR requests some of them pass down struct ifinfomsg when they should actually pass down struct ifaddrmsg. To not break existing tools that pass down the wrong struct we will do the same as for RTM_GETLINK | NLM_F_DUMP requests and not error out when the nlmsg_parse() fails. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e500c6d3 |
|
22-Aug-2018 |
Cong Wang <xiyou.wangcong@gmail.com> |
addrconf: reduce unnecessary atomic allocations All the 3 callers of addrconf_add_mroute() assert RTNL lock, they don't take any additional lock either, so it is safe to convert it to GFP_KERNEL. Same for sit_add_v4_addrs(). Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e873e4b9 |
|
21-Jul-2018 |
Wei Wang <weiwan@google.com> |
ipv6: use fib6_info_hold_safe() when necessary In the code path where only rcu read lock is held, e.g. in the route lookup code path, it is not safe to directly call fib6_info_hold() because the fib6_info may already have been deleted but still exists in the rcu grace period. Holding reference to it could cause double free and crash the kernel. This patch adds a new function fib6_info_hold_safe() and replace fib6_info_hold() in all necessary places. Syzbot reported 3 crash traces because of this. One of them is: 8021q: adding VLAN 0 to HW filter on device team0 IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready dst_release: dst:(____ptrval____) refcnt:-1 dst_release: dst:(____ptrval____) refcnt:-2 WARNING: CPU: 1 PID: 4845 at include/net/dst.h:239 dst_hold include/net/dst.h:239 [inline] WARNING: CPU: 1 PID: 4845 at include/net/dst.h:239 ip6_setup_cork+0xd66/0x1830 net/ipv6/ip6_output.c:1204 dst_release: dst:(____ptrval____) refcnt:-1 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 4845 Comm: syz-executor493 Not tainted 4.18.0-rc3+ #10 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 panic+0x238/0x4e7 kernel/panic.c:184 dst_release: dst:(____ptrval____) refcnt:-2 dst_release: dst:(____ptrval____) refcnt:-3 __warn.cold.8+0x163/0x1ba kernel/panic.c:536 dst_release: dst:(____ptrval____) refcnt:-4 report_bug+0x252/0x2d0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296 dst_release: dst:(____ptrval____) refcnt:-5 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992 RIP: 0010:dst_hold include/net/dst.h:239 [inline] RIP: 0010:ip6_setup_cork+0xd66/0x1830 net/ipv6/ip6_output.c:1204 Code: c1 ed 03 89 9d 18 ff ff ff 48 b8 00 00 00 00 00 fc ff df 41 c6 44 05 00 f8 e9 2d 01 00 00 4c 8b a5 c8 fe ff ff e8 1a f6 e6 fa <0f> 0b e9 6a fc ff ff e8 0e f6 e6 fa 48 8b 85 d0 fe ff ff 48 8d 78 RSP: 0018:ffff8801a8fcf178 EFLAGS: 00010293 RAX: ffff8801a8eba5c0 RBX: 0000000000000000 RCX: ffffffff869511e6 RDX: 0000000000000000 RSI: ffffffff869515b6 RDI: 0000000000000005 RBP: ffff8801a8fcf2c8 R08: ffff8801a8eba5c0 R09: ffffed0035ac8338 R10: ffffed0035ac8338 R11: ffff8801ad6419c3 R12: ffff8801a8fcf720 R13: ffff8801a8fcf6a0 R14: ffff8801ad6419c0 R15: ffff8801ad641980 ip6_make_skb+0x2c8/0x600 net/ipv6/ip6_output.c:1768 udpv6_sendmsg+0x2c90/0x35f0 net/ipv6/udp.c:1376 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:641 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:651 ___sys_sendmsg+0x51d/0x930 net/socket.c:2125 __sys_sendmmsg+0x240/0x6f0 net/socket.c:2220 __do_sys_sendmmsg net/socket.c:2249 [inline] __se_sys_sendmmsg net/socket.c:2246 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2246 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x446ba9 Code: e8 cc bb 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fb39a469da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000006dcc54 RCX: 0000000000446ba9 RDX: 00000000000000b8 RSI: 0000000020001b00 RDI: 0000000000000003 RBP: 00000000006dcc50 R08: 00007fb39a46a700 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 45c828efc7a64843 R13: e6eeb815b9d8a477 R14: 5068caf6f713c6fc R15: 0000000000000001 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: syzbot+902e2a1bcd4f7808cef5@syzkaller.appspotmail.com Reported-by: syzbot+8ae62d67f647abeeceb9@syzkaller.appspotmail.com Reported-by: syzbot+3f08feb14086930677d0@syzkaller.appspotmail.com Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f24c5987 |
|
08-Jul-2018 |
Sabrina Dubroca <sd@queasysnail.net> |
net/ipv6: propagate net.ipv6.conf.all.addr_gen_mode to devices This aligns the addr_gen_mode sysctl with the expected behavior of the "all" variant. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bdd72f41 |
|
08-Jul-2018 |
Sabrina Dubroca <sd@queasysnail.net> |
net/ipv6: reserve room for IFLA_INET6_ADDR_GEN_MODE inet6_ifla6_size() is called to check how much space is needed by inet6_fill_link_af() and inet6_fill_ifinfo(), both of which include the IFLA_INET6_ADDR_GEN_MODE attribute. Reserve some room for it. Fixes: bc91b0f07ada ("ipv6: addrconf: implement address generation modes") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
70c30d76 |
|
08-Jul-2018 |
Sabrina Dubroca <sd@queasysnail.net> |
net/ipv6: don't reinitialize ndev->cnf.addr_gen_mode on new inet6_dev The value has already been copied from this netns's devconf_dflt, it shouldn't be reset to the global kernel default. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c6dbf7aa |
|
08-Jul-2018 |
Sabrina Dubroca <sd@queasysnail.net> |
net/ipv6: fix addrconf_sysctl_addr_gen_mode addrconf_sysctl_addr_gen_mode() has multiple problems. First, it ignores the errors returned by proc_dointvec(). addrconf_sysctl_addr_gen_mode() calls proc_dointvec() directly, which writes the value to memory, and then checks if it's valid and may return EINVAL. If a bad value is given, the value displayed when reading net.ipv6.conf.foo.addr_gen_mode next time will be invalid. In case the value provided by the user was valid, addrconf_dev_config() won't be called since idev->cnf.addr_gen_mode has already been updated. Fix this in the usual way we deal with values that need to be checked after the proc_do*() helper has returned: define a local ctl_table and storage, call proc_dointvec() on that temporary area, then check and store. addrconf_sysctl_addr_gen_mode() also writes the new value to the global ipv6_devconf_dflt, when we're writing to some netns's default, so that new netns will inherit the value that was set by the change occuring in any netns. That doesn't make any sense, so let's drop this assignment. Finally, since addr_gen_mode is a __u32, switch to proc_douintvec(). Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e7c7faa9 |
|
28-Jun-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Fix updates to prefix route Sowmini reported that a recent commit broke prefix routes for linklocal addresses. The newly added modify_prefix_route is attempting to add a new prefix route when the ifp priority does not match the route metric however the check needs to account for the default priority. In addition, the route add fails because the route already exists, and then the delete removes the one that exists. Flip the order to do the delete first. Fixes: 8308f3ff1753 ("net/ipv6: Add support for specifying metric of connected routes") Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3f2d67b6 |
|
11-Jun-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Ensure cfg is properly initialized in ipv6_create_tempaddr Valdis reported a BUG in ipv6_add_addr: [ 1820.832682] BUG: unable to handle kernel NULL pointer dereference at 0000000000000209 [ 1820.832728] RIP: 0010:ipv6_add_addr+0x280/0xd10 [ 1820.832732] Code: 49 8b 1f 0f 84 6a 0a 00 00 48 85 db 0f 84 4e 0a 00 00 48 8b 03 48 8b 53 08 49 89 45 00 49 8b 47 10 49 89 55 08 48 85 c0 74 15 <48> 8b 50 08 48 8b 00 49 89 95 b8 01 00 00 49 89 85 b0 01 00 00 4c [ 1820.832847] RSP: 0018:ffffaa07c2fd7880 EFLAGS: 00010202 [ 1820.832853] RAX: 0000000000000201 RBX: ffffaa07c2fd79b0 RCX: 0000000000000000 [ 1820.832858] RDX: a4cfbfba2cbfa64c RSI: 0000000000000000 RDI: ffffffff8a8e9fa0 [ 1820.832862] RBP: ffffaa07c2fd7920 R08: 000000000000017a R09: ffffffff8a555300 [ 1820.832866] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888d18e71c00 [ 1820.832871] R13: ffff888d0a9b1200 R14: 0000000000000000 R15: ffffaa07c2fd7980 [ 1820.832876] FS: 00007faa51bdb800(0000) GS:ffff888d1d400000(0000) knlGS:0000000000000000 [ 1820.832880] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1820.832885] CR2: 0000000000000209 CR3: 000000021e8f8001 CR4: 00000000001606e0 [ 1820.832888] Call Trace: [ 1820.832898] ? __local_bh_enable_ip+0x119/0x260 [ 1820.832904] ? ipv6_create_tempaddr+0x259/0x5a0 [ 1820.832912] ? __local_bh_enable_ip+0x139/0x260 [ 1820.832921] ipv6_create_tempaddr+0x2da/0x5a0 [ 1820.832926] ? ipv6_create_tempaddr+0x2da/0x5a0 [ 1820.832941] manage_tempaddrs+0x1a5/0x240 [ 1820.832951] inet6_addr_del+0x20b/0x3b0 [ 1820.832959] ? nla_parse+0xce/0x1e0 [ 1820.832968] inet6_rtm_deladdr+0xd9/0x210 [ 1820.832981] rtnetlink_rcv_msg+0x1d4/0x5f0 Looking at the code I found 1 element (peer_pfx) of the newly introduced ifa6_config struct that is not initialized. Use a memset rather than hard coding an init for each struct element. Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Fixes: e6464b8c63619 ("net/ipv6: Convert ipv6_add_addr to struct ifa6_config") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9deb441c |
|
04-Jun-2018 |
Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> |
net: ipv6: Generate random IID for addresses on RAWIP devices RAWIP devices such as rmnet do not have a hardware address and instead require the kernel to generate a random IID for the IPv6 addresses. Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8308f3ff |
|
27-May-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Add support for specifying metric of connected routes Add support for IFA_RT_PRIORITY to ipv6 addresses. If the metric is changed on an existing address then the new route is inserted before removing the old one. Since the metric is one of the route keys, the prefix route can not be atomically replaced. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d169a1f8 |
|
27-May-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Pass ifa6_config struct to inet6_addr_modify Update inet6_addr_modify to take ifa6_config argument versus a parameter list. This is an argument move only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
19b1518c |
|
27-May-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Pass ifa6_config struct to inet6_addr_add Move the creation of struct ifa6_config up to callers of inet6_addr_add. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e6464b8c |
|
27-May-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Convert ipv6_add_addr to struct ifa6_config Move config parameters for adding an ipv6 address to a struct. struct names stem from inet6_rtm_newaddr which is the modern handler for adding an address. Start the conversion to ifa6_config with ipv6_add_addr. This is an argument move only; no functional change intended. Mapping of variable changes: addr --> cfg->pfx peer_addr --> cfg->peer_pfx pfxlen --> cfg->plen flags --> cfg->ifa_flags scope, valid_lft, prefered_lft have the same names within cfg (with corrected spelling). Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c3506372 |
|
10-Apr-2018 |
Christoph Hellwig <hch@lst.de> |
proc: introduce proc_create_net{,_data} Variants of proc_create{,_data} that directly take a struct seq_operations and deal with network namespaces in ->open and ->release. All callers of proc_create + seq_open_net converted over, and seq_{open,release}_net are removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
0aef78aa |
|
24-Apr-2018 |
Ivan Vecera <cera@cera.cz> |
ipv6: addrconf: don't evaluate keep_addr_on_down twice The addrconf_ifdown() evaluates keep_addr_on_down state twice. There is no need to do it. Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27b10608 |
|
18-Apr-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Fix gfp_flags arg to addrconf_prefix_route Eric noticed that __ipv6_ifa_notify is called under rcu_read_lock, so the gfp argument to addrconf_prefix_route can not be GFP_KERNEL. While scrubbing other calls I noticed addrconf_addr_gen has one place with GFP_ATOMIC that can be GFP_KERNEL. Fixes: acb54e3cba404 ("net/ipv6: Add gfp_flags to route add functions") Reported-by: syzbot+2add39b05179b31f912f@syzkaller.appspotmail.com Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ee8cbb2 |
|
18-Apr-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Remove aca_idev aca_idev has only 1 user - inet6_fill_ifacaddr - and it only wants the device index which can be extracted from the fib6_info nexthop. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
360a9887 |
|
18-Apr-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Rename addrconf_dst_alloc addrconf_dst_alloc now returns a fib6_info. Update the name and its users to reflect the change. Rename only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93c2fb25 |
|
18-Apr-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Rename fib6_info struct elements Change the prefix for fib6_info struct elements from rt6i_ to fib6_. rt6i_pcpu and rt6i_exception_bucket are left as is given that they point to rt6_info entries. Rename only; not functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d1c802b |
|
17-Apr-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Flip FIB entries to fib6_info Convert all code paths referencing a FIB entry from rt6_info to fib6_info. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93531c67 |
|
17-Apr-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: separate handling of FIB entries from dst based routes Last step before flipping the data type for FIB entries: - use fib6_info_alloc to create FIB entries in ip6_route_info_create and addrconf_dst_alloc - use fib6_info_release in place of dst_release, ip6_rt_put and rt6_release - remove the dst_hold before calling __ip6_ins_rt or ip6_del_rt - when purging routes, drop per-cpu routes - replace inc and dec of rt6i_ref with fib6_info_hold and fib6_info_release - use rt->from since it points to the FIB entry - drop references to exception bucket, fib6_metrics and per-cpu from dst entries (those are relevant for fib entries only) Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
acb54e3c |
|
17-Apr-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Add gfp_flags to route add functions Most FIB entries can be added using memory allocated with GFP_KERNEL. Add gfp_flags to ip6_route_add and addrconf_dst_alloc. Code paths that can be reached from the packet path (e.g., ndisc and autoconfig) or atomic notifiers use GFP_ATOMIC; paths from user context (adding addresses and routes) use GFP_KERNEL. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3b6761d1 |
|
17-Apr-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Move dst flags to booleans in fib entries Continuing to wean FIB paths off of dst_entry, use a bool to hold requests for certain dst settings. Add a helper to convert the flags to DST flags when a FIB entry is converted to a dst_entry. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
14895687 |
|
17-Apr-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: move expires into rt6_info Add expires to rt6_info for FIB entries, and add fib6 helpers to manage it. Data path use of dst.expires remains. The transition is fairly straightforward: when working with fib entries, rt->dst.expires is just rt->expires, rt6_clean_expires is replaced with fib6_clean_expires, rt6_set_expires becomes fib6_set_expires, and rt6_check_expired becomes fib6_check_expired, where the fib6 versions are added by this patch. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5e670d84 |
|
17-Apr-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Move nexthop data to fib6_nh Introduce fib6_nh structure and move nexthop related data from rt6_info and rt6_info.dst to fib6_nh. References to dev, gateway or lwtstate from a FIB lookup perspective are converted to use fib6_nh; datapath references to dst version are left as is. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e8478e80 |
|
17-Apr-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Save route type in rt6_info The RTN_ type for IPv6 FIB entries is currently embedded in rt6i_flags and dst.error. Since dst is going to be removed, it can no longer be relied on for FIB dumps so save the route type as fib6_type. fc_type is set in current users based on the algorithm in rt6_fill_node: - rt6i_flags contains RTF_LOCAL: fc_type = RTN_LOCAL - rt6i_flags contains RTF_ANYCAST: fc_type = RTN_ANYCAST - else fc_type = RTN_UNICAST Similarly, fib6_type is set in the rt6_info templates based on the RTF_REJECT section of rt6_fill_node converting dst.error to RTN type. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
afb1d4b5 |
|
17-Apr-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Pass net namespace to route functions Pass network namespace reference into route add, delete and get functions. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a2d481b3 |
|
17-Apr-2018 |
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> |
ipv6: send netlink notifications for manually configured addresses Send a netlink notification when userspace adds a manually configured address if DAD is enabled and optimistic flag isn't set. Moreover send RTM_DELADDR notifications for tentative addresses. Some userspace applications (e.g. NetworkManager) are interested in addr netlink events albeit the address is still in tentative state, however events are not sent if DAD process is not completed. If the address is added and immediately removed userspace listeners are not notified. This behaviour can be easily reproduced by using veth interfaces: $ ip -b - <<EOF > link add dev vm1 type veth peer name vm2 > link set dev vm1 up > link set dev vm2 up > addr add 2001:db8:a:b:1:2:3:4/64 dev vm1 > addr del 2001:db8:a:b:1:2:3:4/64 dev vm1 EOF This patch reverts the behaviour introduced by the commit f784ad3d79e5 ("ipv6: do not send RTM_DELADDR for tentative addresses") Suggested-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f85f94b8 |
|
16-Apr-2018 |
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> |
ipv6: remove unnecessary check in addrconf_prefix_rcv_add_addr() Remove unnecessary check on update_lft variable in addrconf_prefix_rcv_add_addr routine since it is always set to 0. Moreover remove update_lft re-initialization to 0 Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f635cee |
|
27-Mar-2018 |
Kirill Tkhai <ktkhai@virtuozzo.com> |
net: Drop pernet_operations::async Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e32ac250 |
|
26-Mar-2018 |
Joe Perches <joe@perches.com> |
ipv6: addrconf: Use normal debugging style Remove local ADBG macro and use netdev_dbg/pr_debug Miscellanea: o Remove unnecessary debug message after allocation failure as there already is a dump_stack() on the failure paths o Leave the allocation failure message on snmp6_alloc_dev as there is one code path that does not do a dump_stack() Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d6444062 |
|
23-Mar-2018 |
Joe Perches <joe@perches.com> |
net: Use octal not symbolic permissions Prefer the direct use of octal for permissions. Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace and some typing. Miscellanea: o Whitespace neatening around these conversions. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1893ff20 |
|
13-Mar-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Add l3mdev check to ipv6_chk_addr_and_flags Lookup the L3 master device for the passed in device. Only consider addresses on netdev's with the same master device. If the device is not enslaved or is NULL, then the l3mdev is NULL which means only devices not enslaved (ie, in the default domain) are considered. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
232378e8 |
|
13-Mar-2018 |
David Ahern <dsahern@gmail.com> |
net/ipv6: Change address check to always take a device argument ipv6_chk_addr_and_flags determines if an address is a local address and optionally if it is an address on a specific device. For example, it is called by ip6_route_info_create to determine if a given gateway address is a local address. The address check currently does not consider L3 domains and as a result does not allow a route to be added in one VRF if the nexthop points to an address in a second VRF. e.g., $ ip route add 2001:db8:1::/64 vrf r2 via 2001:db8:102::23 Error: Invalid gateway address. where 2001:db8:102::23 is an address on an interface in vrf r1. ipv6_chk_addr_and_flags needs to allow callers to always pass in a device with a separate argument to not limit the address to the specific device. The device is used used to determine the L3 domain of interest. To that end add an argument to skip the device check and update callers to always pass a device where possible and use the new argument to mean any address in the domain. Update a handful of users of ipv6_chk_addr with a NULL dev argument. This patch handles the change to these callers without adding the domain check. ip6_validate_gw needs to handle 2 cases - one where the device is given as part of the nexthop spec and the other where the device is resolved. There is at least 1 VRF case where deferring the check to only after the route lookup has resolved the device fails with an unintuitive error "RTNETLINK answers: No route to host" as opposed to the preferred "Error: Gateway can not be a local address." The 'no route to host' error is because of the fallback to a full lookup. The check is done twice to avoid this error. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f1c02cfb |
|
28-Feb-2018 |
Sabrina Dubroca <sd@queasysnail.net> |
ipv6: allow userspace to add IFA_F_OPTIMISTIC addresses According to RFC 4429 (section 3.1), adding new IPv6 addresses as optimistic addresses is acceptable, as long as the implementation follows some rules: * Optimistic DAD SHOULD only be used when the implementation is aware that the address is based on a most likely unique interface identifier (such as in [RFC2464]), generated randomly [RFC3041], or by a well-distributed hash function [RFC3972] or assigned by Dynamic Host Configuration Protocol for IPv6 (DHCPv6) [RFC3315]. Optimistic DAD SHOULD NOT be used for manually entered addresses. Thus, it seems reasonable to allow userspace to set the optimistic flag when adding new addresses. We must not let userspace set NODAD + OPTIMISTIC, since if the kernel is not performing DAD we would never clear the optimistic flag. We must also ignore userspace's request to add OPTIMISTIC flag to addresses that have already completed DAD (addresses that don't have the TENTATIVE flag, or that have the DADFAILED flag). Then we also need to clear the OPTIMISTIC flag on permanent addresses when DAD fails. Otherwise, IFA_F_OPTIMISTIC addresses added by userspace can still be used after DAD has failed, because in ipv6_chk_addr_and_flags(), IFA_F_OPTIMISTIC overrides IFA_F_TENTATIVE. Setting IFA_F_OPTIMISTIC from userspace is conditional on CONFIG_IPV6_OPTIMISTIC_DAD and the optimistic_dad sysctl. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
50911411 |
|
19-Feb-2018 |
Kirill Tkhai <ktkhai@virtuozzo.com> |
net: Convert raw6_net_ops, udplite6_net_ops, ipv6_proc_ops, if6_proc_net_ops and ip6_route_net_late_ops These pernet_operations create and destroy /proc entries and safely may be converted and safely may be mark as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0bc9be67 |
|
12-Feb-2018 |
Kirill Tkhai <ktkhai@virtuozzo.com> |
net: Convert addrconf_ops These pernet_operations (un)register sysctl, which are not touched by anybody else. So, it's safe to make them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e64e469b |
|
26-Jan-2018 |
Eric Dumazet <edumazet@google.com> |
ipv6: addrconf: break critical section in addrconf_verify_rtnl() Heiner reported a lockdep splat [1] This is caused by attempting GFP_KERNEL allocation while RCU lock is held and BH blocked. We believe that addrconf_verify_rtnl() could run for a long period, so instead of using GFP_ATOMIC here as Ido suggested, we should break the critical section and restart it after the allocation. [1] [86220.125562] ============================= [86220.125586] WARNING: suspicious RCU usage [86220.125612] 4.15.0-rc7-next-20180110+ #7 Not tainted [86220.125641] ----------------------------- [86220.125666] kernel/sched/core.c:6026 Illegal context switch in RCU-bh read-side critical section! [86220.125711] other info that might help us debug this: [86220.125755] rcu_scheduler_active = 2, debug_locks = 1 [86220.125792] 4 locks held by kworker/0:2/1003: [86220.125817] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680 [86220.125895] #1: ((addr_chk_work).work){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680 [86220.125959] #2: (rtnl_mutex){+.+.}, at: [<00000000b06d9510>] rtnl_lock+0x12/0x20 [86220.126017] #3: (rcu_read_lock_bh){....}, at: [<00000000aef52299>] addrconf_verify_rtnl+0x1e/0x510 [ipv6] [86220.126111] stack backtrace: [86220.126142] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7 [86220.126185] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015 [86220.126250] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6] [86220.126288] Call Trace: [86220.126312] dump_stack+0x70/0x9e [86220.126337] lockdep_rcu_suspicious+0xce/0xf0 [86220.126365] ___might_sleep+0x1d3/0x240 [86220.126390] __might_sleep+0x45/0x80 [86220.126416] kmem_cache_alloc_trace+0x53/0x250 [86220.126458] ? ipv6_add_addr+0xfe/0x6e0 [ipv6] [86220.126498] ipv6_add_addr+0xfe/0x6e0 [ipv6] [86220.126538] ipv6_create_tempaddr+0x24d/0x430 [ipv6] [86220.126580] ? ipv6_create_tempaddr+0x24d/0x430 [ipv6] [86220.126623] addrconf_verify_rtnl+0x339/0x510 [ipv6] [86220.126664] ? addrconf_verify_rtnl+0x339/0x510 [ipv6] [86220.126708] addrconf_verify_work+0xe/0x20 [ipv6] [86220.126738] process_one_work+0x258/0x680 [86220.126765] worker_thread+0x35/0x3f0 [86220.126790] kthread+0x124/0x140 [86220.126813] ? process_one_work+0x680/0x680 [86220.126839] ? kthread_create_worker_on_cpu+0x40/0x40 [86220.126869] ? umh_complete+0x40/0x40 [86220.126893] ? call_usermodehelper_exec_async+0x12a/0x160 [86220.126926] ret_from_fork+0x4b/0x60 [86220.126999] BUG: sleeping function called from invalid context at mm/slab.h:420 [86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: kworker/0:2 [86220.127082] 4 locks held by kworker/0:2/1003: [86220.127107] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680 [86220.127179] #1: ((addr_chk_work).work){+.+.}, at: [<00000000da8e9b73>] process_one_work+0x1de/0x680 [86220.127242] #2: (rtnl_mutex){+.+.}, at: [<00000000b06d9510>] rtnl_lock+0x12/0x20 [86220.127300] #3: (rcu_read_lock_bh){....}, at: [<00000000aef52299>] addrconf_verify_rtnl+0x1e/0x510 [ipv6] [86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 4.15.0-rc7-next-20180110+ #7 [86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS B246P105 06/01/2015 [86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6] [86220.127568] Call Trace: [86220.127591] dump_stack+0x70/0x9e [86220.127616] ___might_sleep+0x14d/0x240 [86220.127644] __might_sleep+0x45/0x80 [86220.127672] kmem_cache_alloc_trace+0x53/0x250 [86220.127717] ? ipv6_add_addr+0xfe/0x6e0 [ipv6] [86220.127762] ipv6_add_addr+0xfe/0x6e0 [ipv6] [86220.127807] ipv6_create_tempaddr+0x24d/0x430 [ipv6] [86220.127854] ? ipv6_create_tempaddr+0x24d/0x430 [ipv6] [86220.127903] addrconf_verify_rtnl+0x339/0x510 [ipv6] [86220.127950] ? addrconf_verify_rtnl+0x339/0x510 [ipv6] [86220.127998] addrconf_verify_work+0xe/0x20 [ipv6] [86220.128032] process_one_work+0x258/0x680 [86220.128063] worker_thread+0x35/0x3f0 [86220.128091] kthread+0x124/0x140 [86220.128117] ? process_one_work+0x680/0x680 [86220.128146] ? kthread_create_worker_on_cpu+0x40/0x40 [86220.128180] ? umh_complete+0x40/0x40 [86220.128207] ? call_usermodehelper_exec_async+0x12a/0x160 [86220.128243] ret_from_fork+0x4b/0x60 Fixes: f3d9832e56c4 ("ipv6: addrconf: cleanup locking in ipv6_add_addr") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c76fe2d9 |
|
25-Jan-2018 |
David Ahern <dsahern@gmail.com> |
net: ipv6: send unsolicited NA after DAD Unsolicited IPv6 neighbor advertisements should be sent after DAD completes. Update ndisc_send_unsol_na to skip tentative, non-optimistic addresses and have those sent by addrconf_dad_completed after DAD. Fixes: 4a6e3c5def13c ("net: ipv6: send unsolicited NA on admin up") Reported-by: Vivek Venkatraman <vivek@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
96890d62 |
|
15-Jan-2018 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: delete /proc THIS_MODULE references /proc has been ignoring struct file_operations::owner field for 10 years. Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where inode->i_fop is initialized with proxy struct file_operations for regular files: - if (de->proc_fops) - inode->i_fop = de->proc_fops; + if (de->proc_fops) { + if (S_ISREG(inode->i_mode)) + inode->i_fop = &proc_reg_file_ops; + else + inode->i_fop = de->proc_fops; + } VFS stopped pinning module at this point. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27c6fa73 |
|
06-Jan-2018 |
Ido Schimmel <idosch@mellanox.com> |
ipv6: Set nexthop flags upon carrier change Similar to IPv4, when the carrier of a netdev changes we should toggle the 'linkdown' flag on all the nexthops using it as their nexthop device. This will later allow us to test for the presence of this flag during route lookup and dump. Up until commit 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") host and anycast routes used the loopback netdev as their nexthop device and thus were not marked with the 'linkdown' flag. The patch preserves this behavior and allows one to ping the local address even when the nexthop device does not have a carrier and the 'ignore_routes_with_linkdown' sysctl is set. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4c981e28 |
|
06-Jan-2018 |
Ido Schimmel <idosch@mellanox.com> |
ipv6: Prepare to handle multiple netdev events To make IPv6 more in line with IPv4 we need to be able to respond differently to different netdev events. For example, when a netdev is unregistered all the routes using it as their nexthop device should be flushed, whereas when the netdev's carrier changes only the 'linkdown' flag should be toggled. Currently, this is not possible, as the function that traverses the routing tables is not aware of the triggering event. Propagate the triggering event down, so that it could be used in later patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2127d95a |
|
06-Jan-2018 |
Ido Schimmel <idosch@mellanox.com> |
ipv6: Clear nexthop flags upon netdev up Previous patch marked nexthops with the 'dead' and 'linkdown' flags. Clear these flags when the netdev comes back up. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a3fde2ad |
|
04-Dec-2017 |
Florian Westphal <fw@strlen.de> |
rtnetlink: ipv6: convert remaining users to rtnl_register_module convert remaining users of rtnl_register to rtnl_register_module and un-export rtnl_register. Requested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
16feebcf |
|
02-Dec-2017 |
Florian Westphal <fw@strlen.de> |
rtnetlink: remove __rtnl_register This removes __rtnl_register and switches callers to either rtnl_register or rtnl_register_module. Also, rtnl_register() will now print an error if memory allocation failed rather than panic the kernel. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e99e88a9 |
|
16-Oct-2017 |
Kees Cook <keescook@chromium.org> |
treewide: setup_timer() -> timer_setup() This converts all remaining cases of the old setup_timer() API into using timer_setup(), where the callback argument is the structure already holding the struct timer_list. These should have no behavioral changes, since they just change which pointer is passed into the callback with the same available pointers after conversion. It handles the following examples, in addition to some other variations. Casting from unsigned long: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... setup_timer(&ptr->my_timer, my_callback, ptr); and forced object casts: void my_callback(struct something *ptr) { ... } ... setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr); become: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... timer_setup(&ptr->my_timer, my_callback, 0); Direct function assignments: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... ptr->my_timer.function = my_callback; have a temporary cast added, along with converting the args: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback; And finally, callbacks without a data assignment: void my_callback(unsigned long data) { ... } ... setup_timer(&ptr->my_timer, my_callback, 0); have their argument renamed to verify they're unused during conversion: void my_callback(struct timer_list *unused) { ... } ... timer_setup(&ptr->my_timer, my_callback, 0); The conversion is done with the following Coccinelle script: spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup.cocci @fix_address_of@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_timer, NULL, _E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E->_timer, NULL, (_cast_data)_E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E._timer, NULL, &_E); +timer_setup(&_E._timer, NULL, 0); | -setup_timer(&_E._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_timer, _callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | _E->_timer@_stl.function = _callback; | _E->_timer@_stl.function = &_callback; | _E->_timer@_stl.function = (_cast_func)_callback; | _E->_timer@_stl.function = (_cast_func)&_callback; | _E._timer@_stl.function = _callback; | _E._timer@_stl.function = &_callback; | _E._timer@_stl.function = (_cast_func)_callback; | _E._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { ( ... when != _origarg _handletype *_handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { + _handletype *_origarg = from_timer(_origarg, t, _timer); + ... when != _origarg - (_handletype *)_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list *t) { ... } // callback(struct something *handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype *_handle +struct timer_list *t ) { + _handletype *_handle = from_timer(_handle, t, _timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list *t) { - _handletype *_handle = from_timer(_handle, t, _timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_timer, _callback, 0); +setup_timer(&_E->_timer, _callback, (_cast_data)_E); | -timer_setup(&_E._timer, _callback, 0); +setup_timer(&_E._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_timer | -(_cast_data)&_E +&_E._timer | -_E +&_E->_timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_timer, _callback, 0); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0L); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0UL); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0L); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0UL); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_timer, _callback, 0); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0L); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0UL); +timer_setup(&_timer, _callback, 0); | -setup_timer(_timer, _callback, 0); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0L); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0UL); +timer_setup(_timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list *unused ) { ... when != _origarg } Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
09400953 |
|
14-Nov-2017 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipv6: set all.accept_dad to 0 by default With commits 35e015e1f577 and a2d3f3e33853, the global 'accept_dad' flag is also taken into account (default value is 1). If either global or per-interface flag is non-zero, DAD will be enabled on a given interface. This is not backward compatible: before those patches, the user could disable DAD just by setting the per-interface flag to 0. Now, the user instead needs to set both flags to 0 to actually disable DAD. Restore the previous behaviour by setting the default for the global 'accept_dad' flag to 0. This way, DAD is still enabled by default, as per-interface flags are set to 1 on device creation, but setting them to 0 is enough to disable DAD on a given interface. - Before 35e015e1f57a7 and a2d3f3e33853: global per-interface DAD enabled [default] 1 1 yes X 0 no X 1 yes - After 35e015e1f577 and a2d3f3e33853: global per-interface DAD enabled [default] 1 1 yes 0 0 no 0 1 yes 1 0 yes - After this fix: global per-interface DAD enabled 1 1 yes 0 0 no [default] 0 1 yes 1 0 yes Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers") Fixes: a2d3f3e33853 ("ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real") CC: Stefano Brivio <sbrivio@redhat.com> CC: Matteo Croce <mcroce@redhat.com> CC: Erik Kline <ek@google.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2210d6b2 |
|
07-Nov-2017 |
Maciej Żenczykowski <maze@google.com> |
net: ipv6: sysctl to specify IPv6 ND traffic class Add a per-device sysctl to specify the default traffic class to use for kernel originated IPv6 Neighbour Discovery packets. Currently this includes: - Router Solicitation (ICMPv6 type 133) ndisc_send_rs() -> ndisc_send_skb() -> ip6_nd_hdr() - Neighbour Solicitation (ICMPv6 type 135) ndisc_send_ns() -> ndisc_send_skb() -> ip6_nd_hdr() - Neighbour Advertisement (ICMPv6 type 136) ndisc_send_na() -> ndisc_send_skb() -> ip6_nd_hdr() - Redirect (ICMPv6 type 137) ndisc_send_redirect() -> ndisc_send_skb() -> ip6_nd_hdr() and if the kernel ever gets around to generating RA's, it would presumably also include: - Router Advertisement (ICMPv6 type 134) (radvd daemon could pick up on the kernel setting and use it) Interface drivers may examine the Traffic Class value and translate the DiffServ Code Point into a link-layer appropriate traffic prioritization scheme. An example of mapping IETF DSCP values to IEEE 802.11 User Priority values can be found here: https://tools.ietf.org/html/draft-ietf-tsvwg-ieee-802-11 The expected primary use case is to properly prioritize ND over wifi. Testing: jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass 0 jzem22:~# echo -1 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass -bash: echo: write error: Invalid argument jzem22:~# echo 256 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass -bash: echo: write error: Invalid argument jzem22:~# echo 0 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass jzem22:~# echo 255 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass 255 jzem22:~# echo 34 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass 34 jzem22:~# echo $[0xDC] > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass jzem22:~# tcpdump -v -i eth0 icmp6 and src host jzem22.pgc and dst host fe80::1 tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes IP6 (class 0xdc, hlim 255, next-header ICMPv6 (58) payload length: 24) jzem22.pgc > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 24, tgt is jzem22.pgc, Flags [solicited] (based on original change written by Erik Kline, with minor changes) v2: fix 'suspicious rcu_dereference_check() usage' by explicitly grabbing the rcu_read_lock. Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Erik Kline <ek@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fffcefe9 |
|
06-Nov-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: addrconf: fix a lockdep splat Fixes a case where GFP_ATOMIC allocation must be used instead of GFP_KERNEL one. [ 54.891146] lock_acquire+0xb3/0x2f0 [ 54.891153] ? fs_reclaim_acquire.part.60+0x5/0x30 [ 54.891165] fs_reclaim_acquire.part.60+0x29/0x30 [ 54.891170] ? fs_reclaim_acquire.part.60+0x5/0x30 [ 54.891178] kmem_cache_alloc_trace+0x3f/0x500 [ 54.891186] ? cyc2ns_read_end+0x1e/0x30 [ 54.891196] ipv6_add_addr+0x15a/0xc30 [ 54.891217] ? ipv6_create_tempaddr+0x2ea/0x5d0 [ 54.891223] ipv6_create_tempaddr+0x2ea/0x5d0 [ 54.891238] ? manage_tempaddrs+0x195/0x220 [ 54.891249] ? addrconf_prefix_rcv_add_addr+0x1c0/0x4f0 [ 54.891255] addrconf_prefix_rcv_add_addr+0x1c0/0x4f0 [ 54.891268] addrconf_prefix_rcv+0x2e5/0x9b0 [ 54.891279] ? neigh_update+0x446/0xb90 [ 54.891298] ? ndisc_router_discovery+0x5ab/0xf00 [ 54.891303] ndisc_router_discovery+0x5ab/0xf00 [ 54.891311] ? retint_kernel+0x2d/0x2d [ 54.891331] ndisc_rcv+0x1b6/0x270 [ 54.891340] icmpv6_rcv+0x6aa/0x9f0 [ 54.891345] ? ipv6_chk_mcast_addr+0x176/0x530 [ 54.891351] ? do_csum+0x17b/0x260 [ 54.891360] ip6_input_finish+0x194/0xb20 [ 54.891372] ip6_input+0x5b/0x2c0 [ 54.891380] ? ip6_rcv_finish+0x320/0x320 [ 54.891389] ip6_mc_input+0x15a/0x250 [ 54.891396] ipv6_rcv+0x772/0x1050 [ 54.891403] ? consume_skb+0xbe/0x2d0 [ 54.891412] ? ip6_make_skb+0x2a0/0x2a0 [ 54.891418] ? ip6_input+0x2c0/0x2c0 [ 54.891425] __netif_receive_skb_core+0xa0f/0x1600 [ 54.891436] ? process_backlog+0xac/0x400 [ 54.891441] process_backlog+0xfa/0x400 [ 54.891448] ? net_rx_action+0x145/0x1130 [ 54.891456] net_rx_action+0x310/0x1130 [ 54.891524] ? RTUSBBulkReceive+0x11d/0x190 [mt7610u_sta] [ 54.891538] __do_softirq+0x140/0xaba [ 54.891553] irq_exit+0x10b/0x160 [ 54.891561] do_IRQ+0xbb/0x1b0 Fixes: f3d9832e56c4 ("ipv6: addrconf: cleanup locking in ipv6_add_addr") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27c565ae |
|
04-Nov-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: remove IN6_ADDR_HSIZE from addrconf.h IN6_ADDR_HSIZE is private to addrconf.c, move it here to avoid confusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e669b869 |
|
30-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: addrconf: increment ifp refcount before ipv6_del_addr() In the (unlikely) event fixup_permanent_addr() returns a failure, addrconf_permanent_addr() calls ipv6_del_addr() without the mandatory call to in6_ifa_hold(), leading to a refcount error, spotted by syzkaller : WARNING: CPU: 1 PID: 3142 at lib/refcount.c:227 refcount_dec+0x4c/0x50 lib/refcount.c:227 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 3142 Comm: ip Not tainted 4.14.0-rc4-next-20171009+ #33 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x41c kernel/panic.c:181 __warn+0x1c4/0x1e0 kernel/panic.c:544 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178 do_trap_no_signal arch/x86/kernel/traps.c:212 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:261 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311 invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 RIP: 0010:refcount_dec+0x4c/0x50 lib/refcount.c:227 RSP: 0018:ffff8801ca49e680 EFLAGS: 00010286 RAX: 000000000000002c RBX: ffff8801d07cfcdc RCX: 0000000000000000 RDX: 000000000000002c RSI: 1ffff10039493c90 RDI: ffffed0039493cc4 RBP: ffff8801ca49e688 R08: ffff8801ca49dd70 R09: 0000000000000000 R10: ffff8801ca49df58 R11: 0000000000000000 R12: 1ffff10039493cd9 R13: ffff8801ca49e6e8 R14: ffff8801ca49e7e8 R15: ffff8801d07cfcdc __in6_ifa_put include/net/addrconf.h:369 [inline] ipv6_del_addr+0x42b/0xb60 net/ipv6/addrconf.c:1208 addrconf_permanent_addr net/ipv6/addrconf.c:3327 [inline] addrconf_notify+0x1c66/0x2190 net/ipv6/addrconf.c:3393 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x32/0x60 net/core/dev.c:1697 call_netdevice_notifiers net/core/dev.c:1715 [inline] __dev_notify_flags+0x15d/0x430 net/core/dev.c:6843 dev_change_flags+0xf5/0x140 net/core/dev.c:6879 do_setlink+0xa1b/0x38e0 net/core/rtnetlink.c:2113 rtnl_newlink+0xf0d/0x1a40 net/core/rtnetlink.c:2661 rtnetlink_rcv_msg+0x733/0x1090 net/core/rtnetlink.c:4301 netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2408 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4313 netlink_unicast_kernel net/netlink/af_netlink.c:1273 [inline] netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1299 netlink_sendmsg+0xa4a/0xe70 net/netlink/af_netlink.c:1862 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 ___sys_sendmsg+0x75b/0x8a0 net/socket.c:2049 __sys_sendmsg+0xe5/0x210 net/socket.c:2083 SYSC_sendmsg net/socket.c:2094 [inline] SyS_sendmsg+0x2d/0x50 net/socket.c:2090 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x7fa9174d3320 RSP: 002b:00007ffe302ae9e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007ffe302b2ae0 RCX: 00007fa9174d3320 RDX: 0000000000000000 RSI: 00007ffe302aea20 RDI: 0000000000000016 RBP: 0000000000000082 R08: 0000000000000000 R09: 000000000000000f R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe302b32a0 R13: 0000000000000000 R14: 00007ffe302b2ab8 R15: 00007ffe302b32b8 Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsahern@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
da13c59b |
|
30-Oct-2017 |
Vishwanath Pai <vpai@akamai.com> |
net: display hw address of source machine during ipv6 DAD failure This patch updates the error messages displayed in kernel log to include hwaddress of the source machine that caused ipv6 duplicate address detection failures. Examples: a) When we receive a NA packet from another machine advertising our address: ICMPv6: NA: 34:ab:cd:56:11:e8 advertised our address 2001:db8:: on eth0! b) When we detect DAD failure during address assignment to an interface: IPv6: eth0: IPv6 duplicate address 2001:db8:: used by 34:ab:cd:56:11:e8 detected! v2: Changed %pI6 to %pI6c in ndisc_recv_na() Chaged the v6 address in the commit message to 2001:db8:: Suggested-by: Igor Lubashev <ilubashe@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4e5f47ab |
|
23-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: addrconf: do not block BH in ipv6_chk_home_addr() rcu_read_lock() is enough here, no need to block BH. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a5c1d98f |
|
23-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: addrconf: do not block BH in /proc/net/if_inet6 handling Table is really RCU protected, no need to block BH Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
24f226da |
|
23-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: addrconf: do not block BH in ipv6_get_ifaddr() rcu_read_lock() is enough here, no need to block BH. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
480318a0 |
|
23-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: addrconf: do not block BH in ipv6_chk_addr_and_flags() rcu_read_lock() is enough here, as inet6_ifa_finish_destroy() uses kfree_rcu() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3f27fb23 |
|
23-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: addrconf: add per netns perturbation in inet6_addr_hash() Bring IPv6 in par with IPv4 : - Use net_hash_mix() to spread addresses a bit more. - Use 256 slots hash table instead of 16 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
752a9292 |
|
23-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: addrconf: factorize inet6_addr_hash() call ipv6_add_addr_hash() can compute the hash value outside of locked section and pass it to ipv6_chk_same_addr(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
56fc709b |
|
23-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: addrconf: move ipv6_chk_same_addr() to avoid forward declaration ipv6_chk_same_addr() is only used by ipv6_add_addr_hash(), so moving it avoids a forward declaration. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
de95e047 |
|
18-Oct-2017 |
David Ahern <dsahern@gmail.com> |
net: Add extack to validator_info structs used for address notifier Add extack to in_validator_info and in6_validator_info. Update the one user of each, ipvlan, to return an error message for failures. Only manual configuration of an address is plumbed in the IPv6 code path. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ff7883ea |
|
18-Oct-2017 |
David Ahern <dsahern@gmail.com> |
net: ipv6: Make inet6addr_validator a blocking notifier inet6addr_validator chain was added by commit 3ad7d2468f79f ("Ipvlan should return an error when an address is already in use") to allow address validation before changes are committed and to be able to fail the address change with an error back to the user. The address validation is not done for addresses received from router advertisements. Handling RAs in softirq context is the only reason for the notifier chain to be atomic versus blocking. Since the only current user, ipvlan, of the validator chain ignores softirq context, the notifier can be made blocking and simply not invoked for softirq path. The blocking option is needed by spectrum for example to validate resources for an adding an address to an interface. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3d9832e |
|
18-Oct-2017 |
David Ahern <dsahern@gmail.com> |
ipv6: addrconf: cleanup locking in ipv6_add_addr ipv6_add_addr is called in process context with rtnl lock held (e.g., manual config of an address) or during softirq processing (e.g., autoconf and address from a router advertisement). Currently, ipv6_add_addr calls rcu_read_lock_bh shortly after entry and does not call unlock until exit, minus the call around the address validator notifier. Similarly, addrconf_hash_lock is taken after the validator notifier and held until exit. This forces the allocation of inet6_ifaddr to always be atomic. Refactor ipv6_add_addr as follows: 1. add an input boolean to discriminate the call path (process context or softirq). This new flag controls whether the alloc can be done with GFP_KERNEL or GFP_ATOMIC. 2. Move the rcu_read_lock_bh and unlock calls only around functions that do rcu updates. 3. Remove the in6_dev_hold and put added by 3ad7d2468f79f ("Ipvlan should return an error when an address is already in use."). This was done presumably because rcu_read_unlock_bh needs to be called before calling the validator. Since rcu_read_lock is not needed before the validator runs revert the hold and put added by 3ad7d2468f79f and only do the hold when setting ifp->idev. 4. move duplicate address check and insertion of new address in the global address hash into a helper. The helper is called after an ifa is allocated and filled in. This allows the ifa for manually configured addresses to be done with GFP_KERNEL and reduces the overall amount of time with rcu_read_lock held and hash table spinlock held. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c24675f8 |
|
11-Oct-2017 |
Florian Westphal <fw@strlen.de> |
ipv6: addrconf: don't use rtnl mutex in RTM_GETADDR Similar to the previous patch, use the device lookup functions that bump device refcount and flag this as DOIT_UNLOCKED to avoid rtnl mutex. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4ea2607f |
|
11-Oct-2017 |
Florian Westphal <fw@strlen.de> |
ipv6: addrconf: don't use rtnl mutex in RTM_GETNETCONF Instead of relying on rtnl mutex bump device reference count. After this change, values reported can change in parallel, but thats not much different from current state, as anyone can change the settings right after rtnl_unlock (and before userspace processed reply). While at it, switch to GFP_KERNEL allocation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc429c8f |
|
07-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: avoid cache line dirtying in ipv6_dev_get_saddr() By extending the rcu section a bit, we can avoid these very expensive in6_ifa_put()/in6_ifa_hold() calls done in __ipv6_dev_get_saddr() and ipv6_dev_get_saddr() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f59c031e |
|
07-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: __ipv6_dev_get_saddr() rcu conversion Callers hold rcu_read_lock(), so we do not need the rcu_read_lock()/rcu_read_unlock() pair. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
24ba333b |
|
07-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: ipv6_chk_prefix() rcu conversion Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
47e26941 |
|
07-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: ipv6_chk_custom_prefix() rcu conversion Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d9bf82c2 |
|
07-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: ipv6_count_addresses() rcu conversion Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8ef802aa |
|
07-Oct-2017 |
Eric Dumazet <edumazet@google.com> |
ipv6: prepare RCU lookups for idev->addr_list inet6_ifa_finish_destroy() already uses kfree_rcu() to free inet6_ifaddr structs. We need to use proper list additions/deletions in order to allow readers to use RCU instead of idev->lock rwlock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a2d3f3e3 |
|
05-Oct-2017 |
Matteo Croce <mcroce@redhat.com> |
ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real Commit 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers") was intended to affect accept_dad flag handling in such a way that DAD operation and mode on a given interface would be selected according to the maximum value of conf/{all,interface}/accept_dad. However, addrconf_dad_begin() checks for particular cases in which we need to skip DAD, and this check was modified in the wrong way. Namely, it was modified so that, if the accept_dad flag is 0 for the given interface *or* for all interfaces, DAD would be skipped. We have instead to skip DAD if accept_dad is 0 for the given interface *and* for all interfaces. Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers") Acked-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Reported-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
66f5d6ce |
|
06-Oct-2017 |
Wei Wang <weiwan@google.com> |
ipv6: replace rwlock with rcu and spinlock in fib6_table With all the preparation work before, we are now ready to replace rwlock with rcu and spinlock in fib6_table. That means now all fib6_node in fib6_table are protected by rcu. And when freeing fib6_node, call_rcu() is used to wait for the rcu grace period before releasing the memory. When accessing fib6_node, corresponding rcu APIs need to be used. And all previous sessions protected by the write lock will now be protected by the spin lock per table. All previous sessions protected by read lock will now be protected by rcu_read_lock(). A couple of things to note here: 1. As part of the work of replacing rwlock with rcu, the linked list of fn->leaf now has to be rcu protected as well. So both fn->leaf and rt->dst.rt6_next are now __rcu tagged and corresponding rcu APIs are used when manipulating them. 2. For fn->rr_ptr, first of all, it also needs to be rcu protected now and is tagged with __rcu and rcu APIs are used in corresponding places. Secondly, fn->rr_ptr is changed in rt6_select() which is a reader thread. This makes the issue a bit complicated. We think a valid solution for it is to let rt6_select() grab the tb6_lock if it decides to change it. As it is not in the normal operation and only happens when there is no valid neighbor cache for the route, we think the performance impact should be low. 3. fib6_walk_continue() has to be called with tb6_lock held even in the route dumping related functions, e.g. inet6_dump_fib(), fib6_tables_dump() and ipv6_route_seq_ops. It is because fib6_walk_continue() makes modifications to the walker structure, and so are fib6_repair_tree() and fib6_del_route(). In order to do proper syncing between them, we need to let fib6_walk_continue() hold the lock. We may be able to do further improvement on the way we do the tree walk to get rid of the need for holding the spin lock. But not for now. 4. When fib6_del_route() removes a route from the tree, we no longer mark rt->dst.rt6_next to NULL to make simultaneous reader be able to further traverse the list with rcu. However, rt->dst.rt6_next is only valid within this same rcu period. No one should access it later. 5. All the operation of atomic_inc(rt->rt6i_ref) is changed to be performed before we publish this route (either by linking it to fn->leaf or insert it in the list pointed by fn->leaf) just to be safe because as soon as we publish the route, some read thread will be able to access it. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d3843fe5 |
|
06-Oct-2017 |
Wei Wang <weiwan@google.com> |
ipv6: replace dst_hold() with dst_hold_safe() in routing code With rwlock, it is safe to call dst_hold() in the read thread because read thread is guaranteed to be separated from write thread. However, after we replace rwlock with rcu, it is no longer safe to use dst_hold(). A dst might already have been deleted but is waiting for the rcu grace period to pass before freeing the memory when a read thread is trying to do dst_hold(). This could potentially cause double free issue. So this commit replaces all dst_hold() with dst_hold_safe() in all read thread to avoid this double free issue. And in order to make the code more compact, a new function ip6_hold_safe() is introduced. It calls dst_hold_safe() first, and if that fails, it will either fall back to hold and return net->ipv6.ip6_null_entry or set rt to NULL according to the caller's need. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2b760fcf |
|
06-Oct-2017 |
Wei Wang <weiwan@google.com> |
ipv6: hook up exception table to store dst cache This commit makes use of the exception hash table implementation to store dst caches created by pmtu discovery and ip redirect into the hash table under the rt_info and no longer inserts these routes into fib6 tree. This makes the fib6 tree only contain static configured routes and could now be protected by rcu instead of a rw lock. With this change, in the route lookup related functions, after finding the rt6_info with the longest prefix, we also need to search for the exception table before doing backtracking. In the route delete function, if the route being deleted is not a dst cache, deletion of this route also need to flush the whole hash table under it. If it is a dst cache, then only delete the cached dst in the hash table. Note: for fib6_walk_continue() function, w->root now is always pointing to a root node considering that fib6_prune_clones() is removed from the code. So we add a WARN_ON() msg to make sure w->root always points to a root node and also removed the update of w->root in fib6_repair_tree(). This is a prerequisite for later patch because we don't need to make w->root as rcu protected when replacing rwlock with RCU. Also, we remove all prune related variables as it is no longer used. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
38fbeeee |
|
06-Oct-2017 |
Wei Wang <weiwan@google.com> |
ipv6: prepare fib6_locate() for exception table fib6_locate() is used to find the fib6_node according to the passed in prefix address key. It currently tries to find the fib6_node with the exact match of the passed in key. However, when we move cached routes into the exception table, fib6_locate() will fail to find the fib6_node for it as the cached routes will be stored in the exception table under the fib6_node with the longest prefix match of the cache's dst addr key. This commit adds a new parameter to let the caller specify if it needs exact match or longest prefix match. Right now, all callers still does exact match when calling fib6_locate(). It will be changed in later commit where exception table is hooked up to store cached routes. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5c45121d |
|
04-Oct-2017 |
Florian Westphal <fw@strlen.de> |
rtnetlink: remove __rtnl_af_unregister switch the only caller to rtnl_af_unregister. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f372c7b |
|
25-Sep-2017 |
Mike Manning <mmanning@brocade.com> |
net: ipv6: send NS for DAD when link operationally up The NS for DAD are sent on admin up as long as a valid qdisc is found. A race condition exists by which these packets will not egress the interface if the operational state of the lower device is not yet up. The solution is to delay DAD until the link is operationally up according to RFC2863. Rather than only doing this, follow the existing code checks by deferring IPv6 device initialization altogether. The fix allows DAD on devices like tunnels that are controlled by userspace control plane. The fix has no impact on regular deployments, but means that there is no IPv6 connectivity until the port has been opened in the case of port-based network access control, which should be desirable. Signed-off-by: Mike Manning <mmanning@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
63a4e80b |
|
26-Sep-2017 |
Tobias Klauser <tklauser@distanz.ch> |
ipv6: Remove redundant unlikely() IS_ERR() already implies unlikely(), so it can be omitted. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
35e015e1 |
|
12-Sep-2017 |
Matteo Croce <mcroce@redhat.com> |
ipv6: fix net.ipv6.conf.all interface DAD handlers Currently, writing into net.ipv6.conf.all.{accept_dad,use_optimistic,optimistic_dad} has no effect. Fix handling of these flags by: - using the maximum of global and per-interface values for the accept_dad flag. That is, if at least one of the two values is non-zero, enable DAD on the interface. If at least one value is set to 2, enable DAD and disable IPv6 operation on the interface if MAC-based link-local address was found - using the logical OR of global and per-interface values for the optimistic_dad flag. If at least one of them is set to one, optimistic duplicate address detection (RFC 4429) is enabled on the interface - using the logical OR of global and per-interface values for the use_optimistic flag. If at least one of them is set to one, optimistic addresses won't be marked as deprecated during source address selection on the interface. While at it, as we're modifying the prototype for ipv6_use_optimistic_addr(), drop inline, and let the compiler decide. Fixes: 7fd2561e4ebd ("net: ipv6: Add a sysctl to make optimistic addresses useful candidates") Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6819a14e |
|
04-Sep-2017 |
Mike Manning <mmanning@brocade.com> |
net: ipv6: fix regression of no RTM_DELADDR sent after DAD failure Commit f784ad3d79e5 ("ipv6: do not send RTM_DELADDR for tentative addresses") incorrectly assumes that no RTM_NEWADDR are sent for addresses in tentative state, as this does happen for the standard IPv6 use-case of DAD failure, see the call to ipv6_ifa_notify() in addconf_dad_stop(). So as a result of this change, no RTM_DELADDR is sent after DAD failure for a link-local when strict DAD (accept_dad=2) is configured, or on the next admin down in other cases. The absence of this notification breaks backwards compatibility and causes problems after DAD failure if this notification was being relied on. The solution is to allow RTM_DELADDR to still be sent after DAD failure. Fixes: f784ad3d79e5 ("ipv6: do not send RTM_DELADDR for tentative addresses") Signed-off-by: Mike Manning <mmanning@brocade.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4e587ea7 |
|
25-Aug-2017 |
Wei Wang <weiwan@google.com> |
ipv6: fix sparse warning on rt6i_node Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This generates a new sparse warning on rt->rt6i_node related code: net/ipv6/route.c:1394:30: error: incompatible types in comparison expression (different address spaces) ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison expression (different address spaces) This commit adds "__rcu" tag for rt6i_node and makes sure corresponding rcu API is used for it. After this fix, sparse no longer generates the above warning. Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4832c30d |
|
17-Aug-2017 |
David Ahern <dsahern@gmail.com> |
net: ipv6: put host and anycast routes on device with address One nagging difference between ipv4 and ipv6 is host routes for ipv6 addresses are installed using the loopback device or VRF / L3 Master device. e.g., 2001:db8:1::/120 dev veth0 proto kernel metric 256 pref medium local 2001:db8:1::1 dev lo table local proto kernel metric 0 pref medium Using the loopback device is convenient -- necessary for local tx, but has some nasty side effects, most notably setting the 'lo' device down causes all host routes for all local IPv6 address to be removed from the FIB and completely breaks IPv6 networking across all interfaces. This patch puts FIB entries for IPv6 routes against the device. This simplifies the routes in the FIB, for example by making dst->dev and rt6i_idev->dev the same (a future patch can look at removing the device reference taken for rt6i_idev for FIB entries). When copies are made on FIB lookups, the cloned route has dst->dev set to loopback (or the L3 master device). This is needed for the local Tx of packets to local addresses. With fib entries allocated against the real network device, the addrconf code that reinserts host routes on admin up of 'lo' is no longer needed. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b97bac64 |
|
09-Aug-2017 |
Florian Westphal <fw@strlen.de> |
rtnetlink: make rtnl_register accept a flags parameter This change allows us to later indicate to rtnetlink core that certain doit functions should be called without acquiring rtnl_mutex. This change should have no effect, we simply replace the last (now unused) calcit argument with the new flag. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fc882fcf |
|
03-Aug-2017 |
Ido Schimmel <idosch@mellanox.com> |
ipv6: Regenerate host route according to node pointer upon interface up When an interface is brought back up, the kernel tries to restore the host routes tied to its permanent addresses. However, if the host route was removed from the FIB, then we need to reinsert it. This is done by releasing the current dst and allocating a new, so as to not reuse a dst with obsolete values. Since this function is called under RTNL and using the same explanation from the previous patch, we can test if the route is in the FIB by checking its node pointer instead of its reference count. Tested using the following script and Andrey's reproducer mentioned in commit 8048ced9beb2 ("net: ipv6: regenerate host route if moved to gc list") and linked below: $ ip link set dev lo up $ ip link add dummy1 type dummy $ ip -6 address add cafe::1/64 dev dummy1 $ ip link set dev lo down # cafe::1/128 is removed $ ip link set dev dummy1 up $ ip link set dev lo up The host route is correctly regenerated. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Link: http://lkml.kernel.org/r/CAAeHK+zSe82vc5gCRgr_EoUwiALPnWVdWJBPwJZBpbxYz=kGJw@mail.gmail.com Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9217d8c2 |
|
03-Aug-2017 |
Ido Schimmel <idosch@mellanox.com> |
ipv6: Regenerate host route according to node pointer upon loopback up When the loopback device is brought back up we need to check if the host route attached to the address is still in the FIB and regenerate one in case it's not. Host routes using the loopback device are always inserted into and removed from the FIB under RTNL (under which this function is called), so we can test their node pointer instead of the reference count in order to check if the route is in the FIB or not. Tested using the following script from Nicolas mentioned in commit a220445f9f43 ("ipv6: correctly add local routes when lo goes up"): $ ip link add dummy1 type dummy $ ip link set dummy1 up $ ip link set lo down ; ip link set lo up The host route is correctly regenerated. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
271201c0 |
|
04-Jul-2017 |
Reshetova, Elena <elena.reshetova@intel.com> |
net, ipv6: convert inet6_ifaddr.refcnt from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1be92460 |
|
04-Jul-2017 |
Reshetova, Elena <elena.reshetova@intel.com> |
net, ipv6: convert inet6_dev.refcnt from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ec8add2a |
|
29-Jun-2017 |
Sabrina Dubroca <sd@queasysnail.net> |
ipv6: dad: don't remove dynamic addresses if link is down Currently, when the link for $DEV is down, this command succeeds but the address is removed immediately by DAD (1): ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 In the same situation, this will succeed and not remove the address (2): ip addr add 1111::12/64 dev $DEV ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 The comment in addrconf_dad_begin() when !IF_READY makes it look like this is the intended behavior, but doesn't explain why: * If the device is not ready: * - keep it tentative if it is a permanent address. * - otherwise, kill it. We clearly cannot prevent userspace from doing (2), but we can make (1) work consistently with (2). addrconf_dad_stop() is only called in two cases: if DAD failed, or to skip DAD when the link is down. In that second case, the fix is to avoid deleting the address, like we already do for permanent addresses. Fixes: 3c21edbd1137 ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
60abc0be |
|
21-Jun-2017 |
WANG Cong <xiyou.wangcong@gmail.com> |
ipv6: avoid unregistering inet6_dev for loopback The per netns loopback_dev->ip6_ptr is unregistered and set to NULL when its mtu is set to smaller than IPV6_MIN_MTU, this leads to that we could set rt->rt6i_idev NULL after a rt6_uncached_list_flush_dev() and then crash after another call. In this case we should just bring its inet6_dev down, rather than unregistering it, at least prior to commit 176c39af29bc ("netns: fix addrconf_ifdown kernel panic") we always override the case for loopback. Thanks a lot to Andrey for finding a reliable reproducer. Fixes: 176c39af29bc ("netns: fix addrconf_ifdown kernel panic") Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Daniel Lezcano <dlezcano@fr.ibm.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ad65a2f0 |
|
17-Jun-2017 |
Wei Wang <weiwan@google.com> |
ipv6: call dst_hold_safe() properly Similar as ipv4, ipv6 path also needs to call dst_hold_safe() when necessary to avoid double free issue on the dst. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f8a894b2 |
|
15-Jun-2017 |
Xin Long <lucien.xin@gmail.com> |
ipv6: fix calling in6_ifa_hold incorrectly for dad work Now when starting the dad work in addrconf_mod_dad_work, if the dad work is idle and queued, it needs to hold ifa. The problem is there's one gap in [1], during which if the pending dad work is removed elsewhere. It will miss to hold ifa, but the dad word is still idea and queue. if (!delayed_work_pending(&ifp->dad_work)) in6_ifa_hold(ifp); <--------------[1] mod_delayed_work(addrconf_wq, &ifp->dad_work, delay); An use-after-free issue can be caused by this. Chen Wei found this issue when WARN_ON(!hlist_unhashed(&ifp->addr_lst)) in net6_ifa_finish_destroy was hit because of it. As Hannes' suggestion, this patch is to fix it by holding ifa first in addrconf_mod_dad_work, then calling mod_delayed_work and putting ifa if the dad_work is already in queue. Note that this patch did not choose to fix it with: if (!mod_delayed_work(delay)) in6_ifa_hold(ifp); As with it, when delay == 0, dad_work would be scheduled immediately, all addrconf_mod_dad_work(0) callings had to be moved under ifp->lock. Reported-by: Wei Chen <weichen@redhat.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ad7d246 |
|
08-Jun-2017 |
Krister Johansen <kjlx@templeofstupid.com> |
Ipvlan should return an error when an address is already in use. The ipvlan code already knows how to detect when a duplicate address is about to be assigned to an ipvlan device. However, that failure is not propogated outward and leads to a silent failure. Introduce a validation step at ip address creation time and allow device drivers to register to validate the incoming ip addresses. The ipvlan code is the first consumer. If it detects an address in use, we can return an error to the user before beginning to commit the new ifa in the networking code. This can be especially useful if it is necessary to provision many ipvlans in containers. The provisioning software (or operator) can use this to detect situations where an ip address is unexpectedly in use. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
333c4301 |
|
21-May-2017 |
David Ahern <dsahern@gmail.com> |
net: ipv6: Plumb extack through route add functions Plumb extack argument down to route add functions. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
66eb9f86 |
|
12-May-2017 |
Mahesh Bandewar <maheshb@google.com> |
ipv6: avoid dad-failures for addresses with NODAD Every address gets added with TENTATIVE flag even for the addresses with IFA_F_NODAD flag and dad-work is scheduled for them. During this DAD process we realize it's an address with NODAD and complete the process without sending any probe. However the TENTATIVE flags stays on the address for sometime enough to cause misinterpretation when we receive a NS. While processing NS, if the address has TENTATIVE flag, we mark it DADFAILED and endup with an address that was originally configured as NODAD with DADFAILED. We can't avoid scheduling dad_work for addresses with NODAD but we can avoid adding TENTATIVE flag to avoid this racy situation. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
242d3a49 |
|
08-May-2017 |
WANG Cong <xiyou.wangcong@gmail.com> |
ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf For each netns (except init_net), we initialize its null entry in 3 places: 1) The template itself, as we use kmemdup() 2) Code around dst_init_metrics() in ip6_route_net_init() 3) ip6_route_dev_notify(), which is supposed to initialize it after loopback registers Unfortunately the last one still happens in a wrong order because we expect to initialize net->ipv6.ip6_null_entry->rt6i_idev to net->loopback_dev's idev, thus we have to do that after we add idev to loopback. However, this notifier has priority == 0 same as ipv6_dev_notf, and ipv6_dev_notf is registered after ip6_route_dev_notifier so it is called actually after ip6_route_dev_notifier. This is similar to commit 2f460933f58e ("ipv6: initialize route null entry in addrconf_init()") which fixes init_net. Fix it by picking a smaller priority for ip6_route_dev_notifier. Also, we have to release the refcnt accordingly when unregistering loopback_dev because device exit functions are called before subsys exit functions. Acked-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f460933 |
|
03-May-2017 |
WANG Cong <xiyou.wangcong@gmail.com> |
ipv6: initialize route null entry in addrconf_init() Andrey reported a crash on init_net.ipv6.ip6_null_entry->rt6i_idev since it is always NULL. This is clearly wrong, we have code to initialize it to loopback_dev, unfortunately the order is still not correct. loopback_dev is registered very early during boot, we lose a chance to re-initialize it in notifier. addrconf_init() is called after ip6_route_init(), which means we have no chance to correct it. Fix it by moving this initialization explicitly after ipv6_add_dev(init_net.loopback_dev) in addrconf_init(). Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6d717134 |
|
02-May-2017 |
David Ahern <dsahern@gmail.com> |
net: ipv6: Do not duplicate DAD on link up Andrey reported a warning triggered by the rcu code: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 5911 at lib/debugobjects.c:289 debug_print_object+0x175/0x210 ODEBUG: activate active (active state 1) object type: rcu_head hint: (null) Modules linked in: CPU: 1 PID: 5911 Comm: a.out Not tainted 4.11.0-rc8+ #271 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x192/0x22d lib/dump_stack.c:52 __warn+0x19f/0x1e0 kernel/panic.c:549 warn_slowpath_fmt+0xe0/0x120 kernel/panic.c:564 debug_print_object+0x175/0x210 lib/debugobjects.c:286 debug_object_activate+0x574/0x7e0 lib/debugobjects.c:442 debug_rcu_head_queue kernel/rcu/rcu.h:75 __call_rcu.constprop.76+0xff/0x9c0 kernel/rcu/tree.c:3229 call_rcu_sched+0x12/0x20 kernel/rcu/tree.c:3288 rt6_rcu_free net/ipv6/ip6_fib.c:158 rt6_release+0x1ea/0x290 net/ipv6/ip6_fib.c:188 fib6_del_route net/ipv6/ip6_fib.c:1461 fib6_del+0xa42/0xdc0 net/ipv6/ip6_fib.c:1500 __ip6_del_rt+0x100/0x160 net/ipv6/route.c:2174 ip6_del_rt+0x140/0x1b0 net/ipv6/route.c:2187 __ipv6_ifa_notify+0x269/0x780 net/ipv6/addrconf.c:5520 addrconf_ifdown+0xe60/0x1a20 net/ipv6/addrconf.c:3672 ... Andrey's reproducer program runs in a very tight loop, calling 'unshare -n' and then spawning 2 sets of 14 threads running random ioctl calls. The relevant networking sequence: 1. New network namespace created via unshare -n - ip6tnl0 device is created in down state 2. address added to ip6tnl0 - equivalent to ip -6 addr add dev ip6tnl0 fd00::bb/1 - DAD is started on the address and when it completes the host route is inserted into the FIB 3. ip6tnl0 is brought up - the new fixup_permanent_addr function restarts DAD on the address 4. exit namespace - teardown / cleanup sequence starts - once in a blue moon, lo teardown appears to happen BEFORE teardown of ip6tunl0 + down on 'lo' removes the host route from the FIB since the dst->dev for the route is loobback + host route added to rcu callback list * rcu callback has not run yet, so rt is NOT on the gc list so it has NOT been marked obsolete 5. in parallel to 4. worker_thread runs addrconf_dad_completed - DAD on the address on ip6tnl0 completes - calls ipv6_ifa_notify which inserts the host route All of that happens very quickly. The result is that a host route that has been deleted from the IPv6 FIB and added to the RCU list is re-inserted into the FIB. The exit namespace eventually gets to cleaning up ip6tnl0 which removes the host route from the FIB again, calls the rcu function for cleanup -- and triggers the double rcu trace. The root cause is duplicate DAD on the address -- steps 2 and 3. Arguably, DAD should not be started in step 2. The interface is in the down state, so it can not really send out requests for the address which makes starting DAD pointless. Since the second DAD was introduced by a recent change, seems appropriate to use it for the Fixes tag and have the fixup function only start DAD for addresses in the PREDAD state which occurs in addrconf_ifdown if the address is retained. Big thanks to Andrey for isolating a reliable reproducer for this problem. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsahern@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8048ced9 |
|
25-Apr-2017 |
David Ahern <dsa@cumulusnetworks.com> |
net: ipv6: regenerate host route if moved to gc list Taking down the loopback device wreaks havoc on IPv6 routing. By extension, taking down a VRF device wreaks havoc on its table. Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6 FIB code while running syzkaller fuzzer. The root cause is a dead dst that is on the garbage list gets reinserted into the IPv6 FIB. While on the gc (or perhaps when it gets added to the gc list) the dst->next is set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the out-of-bounds access. Andrey's reproducer was the key to getting to the bottom of this. With IPv6, host routes for an address have the dst->dev set to the loopback device. When the 'lo' device is taken down, rt6_ifdown initiates a walk of the fib evicting routes with the 'lo' device which means all host routes are removed. That process moves the dst which is attached to an inet6_ifaddr to the gc list and marks it as dead. The recent change to keep global IPv6 addresses added a new function, fixup_permanent_addr, that is called on admin up. That function restarts dad for an inet6_ifaddr and when it completes the host route attached to it is inserted into the fib. Since the route was marked dead and moved to the gc list, re-inserting the route causes the reported out-of-bounds accesses. If the device with the address is taken down or the address is removed, the WARN_ON in fib6_del is triggered. All of those faults are fixed by regenerating the host route if the existing one has been moved to the gc list, something that can be determined by checking if the rt6i_ref counter is 0. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c21ef3e3 |
|
16-Apr-2017 |
David Ahern <dsa@cumulusnetworks.com> |
net: rtnetlink: plumb extended ack to doit function Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse for doit functions that call it directly. This is the first step to using extended error reporting in rtnetlink. >From here individual subsystems can be updated to set netlink_ext_ack as needed. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fceb6435 |
|
12-Apr-2017 |
Johannes Berg <johannes.berg@intel.com> |
netlink: pass extended ACK struct to parsing functions Pass the new extended ACK reporting struct to all of the generic netlink parsing functions. For now, pass NULL in almost all callers (except for some in the core.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9dae2e03 |
|
12-Mar-2017 |
Luiz Augusto von Dentz <luiz.von.dentz@intel.com> |
6lowpan: Fix IID format for Bluetooth According to RFC 7668 U/L bit shall not be used: https://wiki.tools.ietf.org/html/rfc7668#section-3.2.2 [Page 10]: In the figure, letter 'b' represents a bit from the Bluetooth device address, copied as is without any changes on any bit. This means that no bit in the IID indicates whether the underlying Bluetooth device address is public or random. |0 1|1 3|3 4|4 6| |0 5|6 1|2 7|8 3| +----------------+----------------+----------------+----------------+ |bbbbbbbbbbbbbbbb|bbbbbbbb11111111|11111110bbbbbbbb|bbbbbbbbbbbbbbbb| +----------------+----------------+----------------+----------------+ Because of this the code cannot figure out the address type from the IP address anymore thus it makes no sense to use peer_lookup_ba as it needs the peer address type. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
#
8a7a4b47 |
|
12-Mar-2017 |
Alexander Aring <aar@pengutronix.de> |
ipv6: addrconf: fix 48 bit 6lowpan autoconfiguration This patch adds support for 48 bit 6LoWPAN address length autoconfiguration which is the case for BTLE 6LoWPAN. Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
#
a2d6cbb0 |
|
10-Apr-2017 |
Rabin Vincent <rabinv@axis.com> |
ipv6: Fix idev->addr_list corruption addrconf_ifdown() removes elements from the idev->addr_list without holding the idev->lock. If this happens while the loop in __ipv6_dev_get_saddr() is handling the same element, that function ends up in an infinite loop: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [test:1719] Call Trace: ipv6_get_saddr_eval+0x13c/0x3a0 __ipv6_dev_get_saddr+0xe4/0x1f0 ipv6_dev_get_saddr+0x1b4/0x204 ip6_dst_lookup_tail+0xcc/0x27c ip6_dst_lookup_flow+0x38/0x80 udpv6_sendmsg+0x708/0xba8 sock_sendmsg+0x18/0x30 SyS_sendto+0xb8/0xf8 syscall_common+0x34/0x58 Fixes: 6a923934c33 (Revert "ipv6: Revert optional address flusing on ifdown.") Signed-off-by: Rabin Vincent <rabinv@axis.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
23452170 |
|
28-Mar-2017 |
David Ahern <dsa@cumulusnetworks.com> |
net: ipv6: Add support for RTM_DELNETCONF Send RTM_DELNETCONF notifications when a device is deleted. The message only needs the device index, so modify inet6_netconf_fill_devconf to skip devconf references if it is NULL. Allows a userspace cache to remove entries as devices are deleted. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
85b3daad |
|
28-Mar-2017 |
David Ahern <dsa@cumulusnetworks.com> |
net: ipv6: Refactor inet6_netconf_notify_devconf to take event Refactor inet6_netconf_notify_devconf to take the event as an input arg. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bbea124b |
|
22-Mar-2017 |
Joel Scherpelz <jscherpelz@google.com> |
net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs. This commit adds a new sysctl accept_ra_rt_info_min_plen that defines the minimum acceptable prefix length of Route Information Options. The new sysctl is intended to be used together with accept_ra_rt_info_max_plen to configure a range of acceptable prefix lengths. It is useful to prevent misconfigurations from unintentionally blackholing too much of the IPv6 address space (e.g., home routers announcing RIOs for fc00::/7, which is incorrect). Signed-off-by: Joel Scherpelz <jscherpelz@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df789fe7 |
|
23-Feb-2017 |
David Forster <dforster@brocade.com> |
ipv6: Provide ipv6 version of "disable_policy" sysctl This provides equivalent functionality to the existing ipv4 "disable_policy" systcl. ie. Allows IPsec processing to be skipped on terminating packets on a per-interface basis. Signed-off-by: David Forster <dforster@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
174cd4b1 |
|
02-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
8c171d6c |
|
26-Feb-2017 |
Felix Jia <felix.jia@alliedtelesis.co.nz> |
net/ipv6: avoid possible dead locking on addr_gen_mode sysctl The addr_gen_mode variable can be accessed by both sysctl and netlink. Repleacd rtnl_lock() with rtnl_trylock() protect the sysctl operation to avoid the possbile dead lock.` Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a11a7f71 |
|
06-Feb-2017 |
Marcus Huewe <suse-tux@gmx.de> |
ipv6: addrconf: fix generation of new temporary addresses Under some circumstances it is possible that no new temporary addresses will be generated. For instance, addrconf_prefix_rcv_add_addr() indirectly calls ipv6_create_tempaddr(), which creates a tentative temporary address and starts dad. Next, addrconf_prefix_rcv_add_addr() indirectly calls addrconf_verify_rtnl(). Now, assume that the previously created temporary address has the least preferred lifetime among all existing addresses and is still tentative (that is, dad is still running). Hence, the next run of addrconf_verify_rtnl() is performed when the preferred lifetime of the temporary address ends. If dad succeeds before the next run, the temporary address becomes deprecated during the next run, but no new temporary address is generated. In order to fix this, schedule the next addrconf_verify_rtnl() run slightly before the temporary address becomes deprecated, if dad succeeded. Signed-off-by: Marcus Huewe <suse-tux@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a088d1d7 |
|
03-Feb-2017 |
Linus Lüssing <linus.luessing@c0d3.blue> |
ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches When for instance a mobile Linux device roams from one access point to another with both APs sharing the same broadcast domain and a multicast snooping switch in between: 1) (c) <~~~> (AP1) <--[SSW]--> (AP2) 2) (AP1) <--[SSW]--> (AP2) <~~~> (c) Then currently IPv6 multicast packets will get lost for (c) until an MLD Querier sends its next query message. The packet loss occurs because upon roaming the Linux host so far stayed silent regarding MLD and the snooping switch will therefore be unaware of the multicast topology change for a while. This patch fixes this by always resending MLD reports when an interface change happens, for instance from NO-CARRIER to CARRIER state. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
45ce0fd1 |
|
25-Jan-2017 |
Felix Jia <felix.jia@alliedtelesis.co.nz> |
net/ipv6: support more tunnel interfaces for EUI64 link-local generation Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d35a00b8 |
|
25-Jan-2017 |
Felix Jia <felix.jia@alliedtelesis.co.nz> |
net/ipv6: allow sysctl to change link-local address generation mode The address generation mode for IPv6 link-local can only be configured by netlink messages. This patch adds the ability to change the address generation mode via sysctl. v1 -> v2 Removed the rtnl lock and switch to use RCU lock to iterate through the netdev list. v2 -> v3 Removed the addrgenmode variable from the idev structure and use the systcl storage for the flag. Simplifed the logic for sysctl handling by removing the supported for all operation. Added support for more types of tunnel interfaces for link-local address generation. Based the patches from net-next. v3 -> v4 Removed unnecessary whitespace changes. Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
03e4deff |
|
19-Jan-2017 |
Kefeng Wang <wangkefeng.wang@huawei.com> |
ipv6: addrconf: Avoid addrconf_disable_change() using RCU read-side lock Just like commit 4acd4945cd1e ("ipv6: addrconf: Avoid calling netdevice notifiers with RCU read-side lock"), it is unnecessary to make addrconf_disable_change() use RCU iteration over the netdev list, since it already holds the RTNL lock, or we may meet Illegal context switch in RCU read-side critical section. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f784ad3d |
|
04-Jan-2017 |
Mahesh Bandewar <maheshb@google.com> |
ipv6: do not send RTM_DELADDR for tentative addresses RTM_NEWADDR notification is sent when IFA_F_TENTATIVE is cleared from the address. So if the address is added and deleted before DAD probes completes, the RTM_DELADDR will be sent for which there was no RTM_NEWADDR causing asymmetry in notification. However if the same logic is used while sending RTM_DELADDR notification, this asymmetry can be avoided. Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
adc176c5 |
|
02-Dec-2016 |
Erik Nordmark <nordmark@arista.com> |
ipv6 addrconf: Implemented enhanced DAD (RFC7527) Implemented RFC7527 Enhanced DAD. IPv6 duplicate address detection can fail if there is some temporary loopback of Ethernet frames. RFC7527 solves this by including a random nonce in the NS messages used for DAD, and if an NS is received with the same nonce it is assumed to be a looped back DAD probe and is ignored. RFC7527 is enabled by default. Can be disabled by setting both of conf/{all,interface}/enhanced_dad to zero. Signed-off-by: Erik Nordmark <nordmark@arista.com> Signed-off-by: Bob Gilligan <gilligan@arista.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
764d3be6 |
|
22-Nov-2016 |
Paolo Abeni <pabeni@redhat.com> |
ipv6: bump genid when the IFA_F_TENTATIVE flag is clear When an ipv6 address has the tentative flag set, it can't be used as source for egress traffic, while the associated route, if any, can be looked up and even stored into some dst_cache. In the latter scenario, the source ipv6 address selected and stored in the cache is most probably wrong (e.g. with link-local scope) and the entity using the dst_cache will experience lack of ipv6 connectivity until said cache is cleared or invalidated. Overall this may cause lack of connectivity over most IPv6 tunnels (comprising geneve and vxlan), if the first egress packet reaches the tunnel before the DaD is completed for the used ipv6 address. This patch bumps a new genid after that the IFA_F_TENTATIVE flag is cleared, so that dst_cache will be invalidated on next lookup and ipv6 connectivity restored. Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device") Fixes: 468dfffcd762 ("geneve: add dst caching support") Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bf355b8d |
|
08-Nov-2016 |
David Lebrun <david.lebrun@uclouvain.be> |
ipv6: sr: add core files for SR HMAC support This patch adds the necessary functions to compute and check the HMAC signature of an SR-enabled packet. Two HMAC algorithms are supported: hmac(sha1) and hmac(sha256). In order to avoid dynamic memory allocation for each HMAC computation, a per-cpu ring buffer is allocated for this purpose. A new per-interface sysctl called seg6_require_hmac is added, allowing a user-defined policy for processing HMAC-signed SR-enabled packets. A value of -1 means that the HMAC field will always be ignored. A value of 0 means that if an HMAC field is present, its validity will be enforced (the packet is dropped is the signature is incorrect). Finally, a value of 1 means that any SR-enabled packet that does not contain an HMAC signature or whose signature is incorrect will be dropped. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ababeba |
|
08-Nov-2016 |
David Lebrun <david.lebrun@uclouvain.be> |
ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header) Implement minimal support for processing of SR-enabled packets as described in https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-02. This patch implements the following operations: - Intermediate segment endpoint: incrementation of active segment and rerouting. - Egress for SR-encapsulated packets: decapsulation of outer IPv6 header + SRH and routing of inner packet. - Cleanup flag support for SR-inlined packets: removal of SRH if we are the penultimate segment endpoint. A per-interface sysctl seg6_enabled is provided, to accept/deny SR-enabled packets. Default is deny. This patch does not provide support for HMAC-signed packets. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7aa8e63f |
|
19-Oct-2016 |
Jiri Bohac <jbohac@suse.cz> |
ipv6: properly prevent temp_prefered_lft sysctl race The check for an underflow of tmp_prefered_lft is always false because tmp_prefered_lft is unsigned. The intention of the check was to guard against racing with an update of the temp_prefered_lft sysctl, potentially resulting in an underflow. As suggested by David Miller, the best way to prevent the race is by reading the sysctl variable using READ_ONCE. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Reported-by: Julia Lawall <julia.lawall@lip6.fr> Fixes: 76506a986dc3 ("IPv6: fix DESYNC_FACTOR") Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
76506a98 |
|
13-Oct-2016 |
Jiri Bohac <jbohac@suse.cz> |
IPv6: fix DESYNC_FACTOR The IPv6 temporary address generation uses a variable called DESYNC_FACTOR to prevent hosts updating the addresses at the same time. Quoting RFC 4941: ... The value DESYNC_FACTOR is a random value (different for each client) that ensures that clients don't synchronize with each other and generate new addresses at exactly the same time ... DESYNC_FACTOR is defined as: DESYNC_FACTOR -- A random value within the range 0 - MAX_DESYNC_FACTOR. It is computed once at system start (rather than each time it is used) and must never be greater than (TEMP_VALID_LIFETIME - REGEN_ADVANCE). First, I believe the RFC has a typo in it and meant to say: "and must never be greater than (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE)" The reason is that at various places in the RFC, DESYNC_FACTOR is used in a calculation like (TEMP_PREFERRED_LIFETIME - DESYNC_FACTOR) or (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR). It needs to be smaller than (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE) for the result of these calculations to be larger than zero. It's never used in a calculation together with TEMP_VALID_LIFETIME. I already submitted an errata to the rfc-editor: https://www.rfc-editor.org/errata_search.php?rfc=4941 The Linux implementation of DESYNC_FACTOR is very wrong: max_desync_factor is used in places DESYNC_FACTOR should be used. max_desync_factor is initialized to the RFC-recommended value for MAX_DESYNC_FACTOR (600) but the whole point is to get a _random_ value. And nothing ensures that the value used is not greater than (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE), which leads to underflows. The effect can easily be observed when setting the temp_prefered_lft sysctl e.g. to 60. The preferred lifetime of the temporary addresses will be bogus. TEMP_PREFERRED_LIFETIME and REGEN_ADVANCE are not constants and can be influenced by these three sysctls: regen_max_retry, dad_transmits and temp_prefered_lft. Thus, the upper bound for desync_factor needs to be re-calculated each time a new address is generated and if desync_factor is larger than the new upper bound, a new random value needs to be re-generated. And since we already have max_desync_factor configurable per interface, we also need to calculate and store desync_factor per interface. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9d6280da |
|
13-Oct-2016 |
Jiri Bohac <jbohac@suse.cz> |
IPv6: Drop the temporary address regen_timer The randomized interface identifier (rndid) was periodically updated from the regen_timer timer. Simplify the code by updating the rndid only when needed by ipv6_try_regen_rndid(). This makes the follow-up DESYNC_FACTOR fix much simpler. Also it fixes a reference counting error in this error path, where an in6_dev_put was missing: err = addrconf_sysctl_register(ndev); if (err) { ipv6_mc_destroy_dev(ndev); - del_timer(&ndev->regen_timer); snmp6_unregister_dev(ndev); goto err_release; Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a220445f |
|
12-Oct-2016 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipv6: correctly add local routes when lo goes up The goal of the patch is to fix this scenario: ip link add dummy1 type dummy ip link set dummy1 up ip link set lo down ; ip link set lo up After that sequence, the local route to the link layer address of dummy1 is not there anymore. When the loopback is set down, all local routes are deleted by addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still exists, because the corresponding idev has a reference on it. After the rcu grace period, dst_rcu_free() is called, and thus ___dst_free(), which will set obsolete to DST_OBSOLETE_DEAD. In this case, init_loopback() is called before dst_rcu_free(), thus obsolete is still sets to something <= 0. So, the function doesn't add the route again. To avoid that race, let's check the rt6 refcnt instead. Fixes: 25fb6ca4ed9c ("net IPv6 : Fix broken IPv6 routing table after loopback down-up") Fixes: a881ae1f625c ("ipv6: don't call addrconf_dst_alloc again when enable lo") Fixes: 33d99113b110 ("ipv6: reallocate addrconf router for ipv6 address when lo device up") Reported-by: Francesco Santoro <francesco.santoro@6wind.com> Reported-by: Samuel Gauthier <samuel.gauthier@6wind.com> CC: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com> CC: Maruthi Thotad <Maruthi.Thotad@ap.sony.com> CC: Sabrina Dubroca <sd@queasysnail.net> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Weilong Chen <chenweilong@huawei.com> CC: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb4a4c69 |
|
07-Oct-2016 |
Maciej Żenczykowski <maze@google.com> |
ipv6 addrconf: disallow rtr_solicits < -1 This disallows setting /proc/sys/net/ipv6/conf/*/router_solicitations to values below -1. -1 continues to mean an unlimited number of retransmits. Note: this depends on 'ipv6 addrconf: remove addrconf_sysctl_hop_limit()' Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb9e684e |
|
29-Sep-2016 |
Maciej Żenczykowski <maze@google.com> |
ipv6 addrconf: remove addrconf_sysctl_hop_limit() This is an effective no-op in terms of user observable behaviour. By preventing the overwrite of non-null extra1/extra2 fields in addrconf_sysctl() we can enable the use of proc_dointvec_minmax(). This allows us to eliminate the constant min/max (1..255) trampoline function that is addrconf_sysctl_hop_limit(). This is nice because it simplifies the code, and allows future sysctls with constant min/max limits to also not require trampolines. We still can't eliminate the trampoline for mtu because it isn't actually a constant (it depends on other tunables of the device) and thus requires at-write-time logic to enforce range. Signed-off-by: Maciej Żenczykowski <maze@google.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bd11f074 |
|
28-Sep-2016 |
Maciej Żenczykowski <maze@google.com> |
ipv6 addrconf: implement RFC7559 router solicitation backoff This implements: https://tools.ietf.org/html/rfc7559 Backoff is performed according to RFC3315 section 14: https://tools.ietf.org/html/rfc3315#section-14 We allow setting /proc/sys/net/ipv6/conf/*/router_solicitations to a negative value meaning an unlimited number of retransmits, and we make this the new default (inline with the RFC). We also add a new setting: /proc/sys/net/ipv6/conf/*/router_solicitation_max_interval defaulting to 1 hour (per RFC recommendation). Signed-off-by: Maciej Żenczykowski <maze@google.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aca05671 |
|
29-Sep-2016 |
Jia He <hejianet@gmail.com> |
ipv6: Remove useless parameter in __snmp6_fill_statsdev The parameter items(is always ICMP6_MIB_MAX) is useless for __snmp6_fill_statsdev Signed-off-by: Jia He <hejianet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
751eb6b6 |
|
05-Sep-2016 |
Wei Yongjun <weiyongjun1@huawei.com> |
ipv6: addrconf: fix dev refcont leak when DAD failed In general, when DAD detected IPv6 duplicate address, ifp->state will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a delayed work, the call tree should be like this: ndisc_recv_ns -> addrconf_dad_failure <- missing ifp put -> addrconf_mod_dad_work -> schedule addrconf_dad_work() -> addrconf_dad_stop() <- missing ifp hold before call it addrconf_dad_failure() called with ifp refcont holding but not put. addrconf_dad_work() call addrconf_dad_stop() without extra holding refcount. This will not cause any issue normally. But the race between addrconf_dad_failure() and addrconf_dad_work() may cause ifp refcount leak and netdevice can not be unregister, dmesg show the following messages: IPv6: eth0: IPv6 duplicate address fe80::XX:XXXX:XXXX:XX detected! ... unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Cc: stable@vger.kernel.org Fixes: c15b1ccadb32 ("ipv6: move DAD and addrconf_verify processing to workqueue") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
29c994e3 |
|
30-Aug-2016 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
netconf: add a notif when settings are created All changes are notified, but the initial state was missing. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d26c638c |
|
30-Aug-2016 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipv6: add missing netconf notif when 'all' is updated The 'default' value was not advertised. Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding status") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
85b51b12 |
|
18-Aug-2016 |
Mike Manning <mmanning@brocade.com> |
net: ipv6: Remove addresses for failures with strict DAD If DAD fails with accept_dad set to 2, global addresses and host routes are incorrectly left in place. Even though disable_ipv6 is set, contrary to documentation, the addresses are not dynamically deleted from the interface. It is only on a subsequent link down/up that these are removed. The fix is not only to set the disable_ipv6 flag, but also to call addrconf_ifdown(), which is the action to carry out when disabling IPv6. This results in the addresses and routes being deleted immediately. The DAD failure for the LL addr is determined as before via netlink, or by the absence of the LL addr (which also previously would have had to be checked for in case of an intervening link down and up). As the call to addrconf_ifdown() requires an rtnl lock, the logic to disable IPv6 when DAD fails is moved to addrconf_dad_work(). Previous behavior: root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2 net.ipv6.conf.eth3.accept_dad = 2 root@vm1:/# ip -6 addr add 2000::10/64 dev eth3 root@vm1:/# ip link set up eth3 root@vm1:/# ip -6 addr show dev eth3 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2000::10/64 scope global valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe43:dd5a/64 scope link tentative dadfailed valid_lft forever preferred_lft forever root@vm1:/# ip -6 route show dev eth3 2000::/64 proto kernel metric 256 fe80::/64 proto kernel metric 256 root@vm1:/# ip link set down eth3 root@vm1:/# ip link set up eth3 root@vm1:/# ip -6 addr show dev eth3 root@vm1:/# ip -6 route show dev eth3 root@vm1:/# New behavior: root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2 net.ipv6.conf.eth3.accept_dad = 2 root@vm1:/# ip -6 addr add 2000::10/64 dev eth3 root@vm1:/# ip link set up eth3 root@vm1:/# ip -6 addr show dev eth3 root@vm1:/# ip -6 route show dev eth3 root@vm1:/# Signed-off-by: Mike Manning <mmanning@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bc561632 |
|
11-Aug-2016 |
Mike Manning <mmanning@brocade.com> |
net: ipv6: Do not keep IPv6 addresses when IPv6 is disabled If IPv6 is disabled when the option is set to keep IPv6 addresses on link down, userspace is unaware of this as there is no such indication via netlink. The solution is to remove the IPv6 addresses in this case, which results in netlink messages indicating removal of addresses in the usual manner. This fix also makes the behavior consistent with the case of having IPv6 disabled first, which stops IPv6 addresses from being added. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: Mike Manning <mmanning@brocade.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c882219a |
|
28-Jul-2016 |
Wei Yongjun <weiyj.lk@gmail.com> |
net: ipv6: use list_move instead of list_del/list_add Using list_move() instead of list_del() + list_add(). Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ea06f717 |
|
22-Jul-2016 |
Mike Manning <mmanning@brocade.com> |
net: ipv6: Always leave anycast and multicast groups on link down Default kernel behavior is to delete IPv6 addresses on link down, which entails deletion of the multicast and the subnet-router anycast addresses. These deletions do not happen with sysctl setting to keep global IPv6 addresses on link down, so every link down/up causes an increment of the anycast and multicast refcounts. These bogus refcounts may stop these addrs from being removed on subsequent calls to delete them. The solution is to leave the groups for the multicast and subnet anycast on link down for the callflow when global IPv6 addresses are kept. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: Mike Manning <mmanning@brocade.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
927265bc |
|
07-Jul-2016 |
Eric Dumazet <edumazet@google.com> |
ipv6: do not abuse GFP_ATOMIC in inet6_netconf_notify_devconf() All inet6_netconf_notify_devconf() callers are in process context, so we can use GFP_KERNEL allocations if we take care of not holding a rwlock while not needed in ip6mr (we hold RTNL there) Fixes: d67b8c616b48 ("netconf: advertise mc_forwarding status") Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding status") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
afbac601 |
|
16-Jun-2016 |
David Ahern <dsa@cumulusnetworks.com> |
net: ipv6: Address selection needs to consider L3 domains IPv6 version of 3f2fb9a834cb ("net: l3mdev: address selection should only consider devices in L3 domain") and the follow up commit, a17b693cdd876 ("net: l3mdev: prefer VRF master for source address selection"). That is, if outbound device is given then the address preference order is an address from that device, an address from the master device if it is enslaved, and then an address from a device in the same L3 domain. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc84b3c6 |
|
15-Jun-2016 |
Alexander Aring <aar@pengutronix.de> |
ipv6: export several functions This patch exports some neighbour discovery functions which can be used by 6lowpan neighbour discovery ops functionality then. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f997c55c |
|
15-Jun-2016 |
Alexander Aring <aar@pengutronix.de> |
ipv6: introduce neighbour discovery ops This patch introduces neighbour discovery ops callback structure. The idea is to separate the handling for 6LoWPAN into the 6lowpan module. These callback offers 6lowpan different handling, such as 802.15.4 short address handling or RFC6775 (Neighbor Discovery Optimization for IPv6 over 6LoWPANs). Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4f672235 |
|
15-Jun-2016 |
Alexander Aring <aar@pengutronix.de> |
addrconf: put prefix address add in an own function This patch moves the functionality to add a RA PIO prefix generated address in an own function. This move prepares to add a hook for adding a second address for a second link-layer address. E.g. short address for 802.15.4 6LoWPAN. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2ad3ed59 |
|
15-Jun-2016 |
Alexander Aring <aar@pengutronix.de> |
6lowpan: add 802.15.4 short addr slaac This patch adds the autoconfiguration if a valid 802.15.4 short address is available for 802.15.4 6LoWPAN interfaces. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ba46ee4c |
|
13-Jun-2016 |
David Ahern <dsa@cumulusnetworks.com> |
net: ipv6: Do not add multicast route for l3 master devices L3 master devices are virtual devices similar to the loopback device. Link local and multicast routes for these devices do not make sense. The ipv6 addrconf code already skips adding a linklocal address; do the same for the mcast route. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
38bd10c4 |
|
21-Apr-2016 |
David Ahern <dsa@cumulusnetworks.com> |
net: ipv6: Delete host routes on an ifdown It was a simple idea -- save IPv6 configured addresses on a link down so that IPv6 behaves similar to IPv4. As always the devil is in the details and the IPv6 stack as too many behavioral differences from IPv4 making the simple idea more complicated than it needs to be. The current implementation for keeping IPv6 addresses can panic or spit out a warning in one of many paths: 1. IPv6 route gets an IPv4 route as its 'next' which causes a panic in rt6_fill_node while handling a route dump request. 2. rt->dst.obsolete is set to DST_OBSOLETE_DEAD hitting the WARN_ON in fib6_del 3. Panic in fib6_purge_rt because rt6i_ref count is not 1. The root cause of all these is references related to the host route for an address that is retained. So, this patch deletes the host route every time the ifdown loop runs. Since the host route is deleted and will be re-generated an up there is no longer a need for the l3mdev fix up. On the 'admin up' side move addrconf_permanent_addr into the NETDEV_UP event handling so that it runs only once versus on UP and CHANGE events. All of the current panics and warnings appear to be related to addresses on the loopback device, but given the catastrophic nature when a bug is triggered this patch takes the conservative approach and evicts all host routes rather than trying to determine when it can be re-used and when it can not. That can be a later optimizaton if desired. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6a923934 |
|
26-Apr-2016 |
David S. Miller <davem@davemloft.net> |
Revert "ipv6: Revert optional address flusing on ifdown." This reverts commit 841645b5f2dfceac69b78fcd0c9050868d41ea61. Ok, this puts the feature back. I've decided to apply David A.'s bug fix and run with that rather than make everyone wait another whole release for this feature. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
841645b5 |
|
25-Apr-2016 |
David S. Miller <davem@davemloft.net> |
ipv6: Revert optional address flusing on ifdown. This reverts the following three commits: 70af921db6f8835f4b11c65731116560adb00c14 799977d9aafbf0ca0b9c39b04cbfb16db71302c9 f1705ec197e705b79ea40fe7a2cc5acfa1d3bfac The feature was ill conceived, has terrible semantics, and has added nothing but regressions to the already fragile ipv6 stack. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5df1f77f |
|
18-Apr-2016 |
Konstantin Khlebnikov <koct9i@gmail.com> |
net/ipv6/addrconf: fix sysctl table indentation Separated from previous patch for readability. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
607ea7cd |
|
18-Apr-2016 |
Konstantin Khlebnikov <koct9i@gmail.com> |
net/ipv6/addrconf: simplify sysctl registration Struct ctl_table_header holds pointer to sysctl table which could be used for freeing it after unregistration. IPv4 sysctls already use that. Remove redundant NULL assignment: ndev allocated using kzalloc. This also saves some bytes: sysctl table could be shorter than DEVCONF_MAX+1 if some options are disable in config. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
70af921d |
|
08-Apr-2016 |
David Ahern <dsa@cumulusnetworks.com> |
net: ipv6: Do not keep linklocal and loopback addresses f1705ec197e7 added the option to retain user configured addresses on an admin down. A comment to one of the later revisions suggested using the IFA_F_PERMANENT flag rather than adding a user_managed boolean to the ifaddr struct. A side effect of this change is that link local and loopback addresses are also retained which is not part of the objective of f1705ec197e7. Add check to drop those addresses. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
47e27d5e |
|
08-Apr-2016 |
Daniel Borkmann <daniel@iogearbox.net> |
ipv6, token: allow for clearing the current device token The original tokenized iid support implemented via f53adae4eae5 ("net: ipv6: add tokenized interface identifier support") didn't allow for clearing a device token as it was intended that this addressing mode was the only one active for globally scoped IPv6 addresses. Later we relaxed that restriction via 617fe29d45bd ("net: ipv6: only invalidate previously tokenized addresses"), and we should also allow for clearing tokens as there's no good reason why it shouldn't be allowed. Fixes: 617fe29d45bd ("net: ipv6: only invalidate previously tokenized addresses") Reported-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4f7f34ea |
|
07-Apr-2016 |
David Ahern <dsa@cumulusnetworks.com> |
net: vrf: Fix dev refcnt leak due to IPv6 prefix route ifupdown2 found a kernel bug with IPv6 routes and movement from the main table to the VRF table. Sequence of events: Create the interface and add addresses: ip link add dev eth4.105 link eth4 type vlan id 105 ip addr add dev eth4.105 8.105.105.10/24 ip -6 addr add dev eth4.105 2008:105:105::10/64 At this point IPv6 has inserted a prefix route in the main table even though the interface is 'down'. From there the VRF device is created: ip link add dev vrf105 type vrf table 105 ip addr add dev vrf105 9.9.105.10/32 ip -6 addr add dev vrf105 2000:9:105::10/128 ip link set vrf105 up Then the interface is enslaved, while still in the 'down' state: ip link set dev eth4.105 master vrf105 Since the device is down the VRF driver cycling the device does not send the NETDEV_UP and NETDEV_DOWN but rather the NETDEV_CHANGE event which does not flush the routes inserted prior. When the link is brought up ip link set dev eth4.105 up the prefix route is added in the VRF table, but does not remove the route from the main table. Fix by handling the NETDEV_CHANGEUPPER event similar what was implemented for IPv4 in 7f49e7a38b77 ("net: Flush local routes when device changes vrf association") Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
136ba622 |
|
10-Mar-2016 |
Zhang Shengju <zhangshengju@cmss.chinamobile.com> |
netconf: add macro to represent all attributes This patch adds macro NETCONFA_ALL to represent all type of netconf attributes for IPv4 and IPv6. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
799977d9 |
|
02-Mar-2016 |
David Ahern <dsa@cumulusnetworks.com> |
net: ipv6: Fix refcnt on host routes Andrew and Ying Huang's test robot both reported usage count problems that trace back to the 'keep address on ifdown' patch. >From Andrew: We execute CRIU test on linux-next. On the current linux-next kernel they hangs on creating a network namespace. The kernel log contains many massages like this: [ 1036.122108] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 1046.165156] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 1056.210287] unregister_netdevice: waiting for lo to become free. Usage count = 2 I tried to revert this patch and the bug disappeared. Here is a set of commands to reproduce this bug: [root@linux-next-test linux-next]# uname -a Linux linux-next-test 4.5.0-rc6-next-20160301+ #3 SMP Wed Mar 2 17:32:18 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux [root@linux-next-test ~]# unshare -n [root@linux-next-test ~]# ip link set up dev lo [root@linux-next-test ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever [root@linux-next-test ~]# logout [root@linux-next-test ~]# unshare -n ----- The problem is a change made to RTM_DELADDR case in __ipv6_ifa_notify that was added in an early version of the offending patch and is no longer needed. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Cc: Andrey Wagin <avagin@gmail.com> Cc: Ying Huang <ying.huang@linux.intel.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: Jeremiah Mahler <jmmahler@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4f25a111 |
|
27-Feb-2016 |
David Ahern <dsa@cumulusnetworks.com> |
net: ipv6/l3mdev: Move host route on saved address if necessary Commit f1705ec197e70 allows IPv6 addresses to be retained on a link down. The address can have a cached host route which can point to the wrong FIB table if the L3 enslavement is changed (e.g., route can point to local table instead of VRF table if device is added to an L3 domain). On link up check the table of the cached host route against the FIB table associated with the device and correct if needed. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f1705ec1 |
|
24-Feb-2016 |
David Ahern <dsa@cumulusnetworks.com> |
net: ipv6: Make address flushing on ifdown optional Currently, all ipv6 addresses are flushed when the interface is configured down, including global, static addresses: $ ip -6 addr show dev eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000 inet6 2100:1::2/120 scope global valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe79:34bd/64 scope link valid_lft forever preferred_lft forever $ ip link set dev eth1 down $ ip -6 addr show dev eth1 << nothing; all addresses have been flushed>> Add a new sysctl to make this behavior optional. The new setting defaults to flush all addresses to maintain backwards compatibility. When the set global addresses with no expire times are not flushed on an admin down. The sysctl is per-interface or system-wide for all interfaces $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1 or $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1 Will keep addresses on eth1 on an admin down. $ ip -6 addr show dev eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000 inet6 2100:1::2/120 scope global valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe79:34bd/64 scope link valid_lft forever preferred_lft forever $ ip link set dev eth1 down $ ip -6 addr show dev eth1 3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000 inet6 2100:1::2/120 scope global tentative valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative valid_lft forever preferred_lft forever Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a97eb33f |
|
16-Feb-2016 |
Anton Protopopov <a.s.protopopov@gmail.com> |
rtnl: RTM_GETNETCONF: fix wrong return value An error response from a RTM_GETNETCONF request can return the positive error value EINVAL in the struct nlmsgerr that can mislead userspace. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7a02bf89 |
|
04-Feb-2016 |
Johannes Berg <johannes.berg@intel.com> |
ipv6: add option to drop unsolicited neighbor advertisements In certain 802.11 wireless deployments, there will be NA proxies that use knowledge of the network to correctly answer requests. To prevent unsolicitd advertisements on the shared medium from being a problem, on such deployments wireless needs to drop them. Enable this by providing an option called "drop_unsolicited_na". Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
abbc3043 |
|
04-Feb-2016 |
Johannes Berg <johannes.berg@intel.com> |
ipv6: add option to drop unicast encapsulated in L2 multicast In order to solve a problem with 802.11, the so-called hole-196 attack, add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if enabled, causes the stack to drop IPv6 unicast packets encapsulated in link-layer multi- or broadcast frames. Such frames can (as an attack) be created by any member of the same wireless network and transmitted as valid encrypted frames since the symmetric key for broadcast frames is shared between all stations. Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
16186a82 |
|
01-Feb-2016 |
Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> |
ipv6: addrconf: Fix recursive spin lock call A rcu stall with the following backtrace was seen on a system with forwarding, optimistic_dad and use_optimistic set. To reproduce, set these flags and allow ipv6 autoconf. This occurs because the device write_lock is acquired while already holding the read_lock. Back trace below - INFO: rcu_preempt self-detected stall on CPU { 1} (t=2100 jiffies g=3992 c=3991 q=4471) <6> Task dump for CPU 1: <2> kworker/1:0 R running task 12168 15 2 0x00000002 <2> Workqueue: ipv6_addrconf addrconf_dad_work <6> Call trace: <2> [<ffffffc000084da8>] el1_irq+0x68/0xdc <2> [<ffffffc000cc4e0c>] _raw_write_lock_bh+0x20/0x30 <2> [<ffffffc000bc5dd8>] __ipv6_dev_ac_inc+0x64/0x1b4 <2> [<ffffffc000bcbd2c>] addrconf_join_anycast+0x9c/0xc4 <2> [<ffffffc000bcf9f0>] __ipv6_ifa_notify+0x160/0x29c <2> [<ffffffc000bcfb7c>] ipv6_ifa_notify+0x50/0x70 <2> [<ffffffc000bd035c>] addrconf_dad_work+0x314/0x334 <2> [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc <2> [<ffffffc0000b7324>] worker_thread+0x2f8/0x418 <2> [<ffffffc0000bb40c>] kthread+0xe0/0xec v2: do addrconf_dad_kick inside read lock and then acquire write lock for ipv6_ifa_notify as suggested by Eric Fixes: 7fd2561e4ebdd ("net: ipv6: Add a sysctl to make optimistic addresses useful candidates") Cc: Eric Dumazet <edumazet@google.com> Cc: Erik Kline <ek@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d171f39 |
|
08-Jan-2016 |
Lubomir Rintel <lkundrak@v3.sk> |
ipv6: always add flag an address that failed DAD with DADFAILED The userspace needs to know why is the address being removed so that it can perhaps obtain a new address. Without the DADFAILED flag it's impossible to distinguish removal of a temporary and tentative address due to DAD failure from other reasons (device removed, manual address removal). Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5449a5ca |
|
21-Dec-2015 |
WANG Cong <xiyou.wangcong@gmail.com> |
addrconf: always initialize sysctl table data When sysctl performs restrict writes, it allows to write from a middle position of a sysctl file, which requires us to initialize the table data before calling proc_dostring() for the write case. Fixes: 3d1bec99320d ("ipv6: introduce secret_stable to ipv6_devconf") Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc9da6cc |
|
16-Dec-2015 |
Bjørn Mork <bjorn@mork.no> |
ipv6: addrconf: use stable address generator for ARPHRD_NONE Add a new address generator mode, using the stable address generator with an automatically generated secret. This is intended as a default address generator mode for device types with no EUI64 implementation. The new generator is used for ARPHRD_NONE interfaces initially, adding default IPv6 autoconf support to e.g. tun interfaces. If the addrgenmode is set to 'random', either by default or manually, and no stable secret is available, then a random secret is used as input for the stable-privacy address generator. The secret can be read and modified like manually configured secrets, using the proc interface. Modifying the secret will change the addrgen mode to 'stable-privacy' to indicate that it operates on a known secret. Existing behaviour of the 'stable-privacy' mode is kept unchanged. If a known secret is available when the device is created, then the mode will default to 'stable-privacy' as before. The mode can be manually set to 'random' but it will behave exactly like 'stable-privacy' in this case. The secret will not change. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: 吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b29c696 |
|
15-Dec-2015 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: automatically enable stable privacy mode if stable_secret set Bjørn reported that while we switch all interfaces to privacy stable mode when setting the secret, we don't set this mode for new interfaces. This does not make sense, so change this behaviour. Fixes: 622c81d57b392cc ("ipv6: generation of stable privacy addresses for link-local and autoconf") Reported-by: Bjørn Mork <bjorn@mork.no> Cc: Bjørn Mork <bjorn@mork.no> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5241c2d7 |
|
14-Dec-2015 |
Alexander Aring <alex.aring@gmail.com> |
ipv6: addrconf: drop ieee802154 specific things This patch removes ARPHRD_IEEE802154 from addrconf handling. In the earlier days of 802.15.4 6LoWPAN, the interface type was ARPHRD_IEEE802154 which introduced several issues, because 802.15.4 interfaces used the same type. Since commit 965e613d299c ("ieee802154: 6lowpan: fix ARPHRD to ARPHRD_6LOWPAN") we use ARPHRD_6LOWPAN for 6LoWPAN interfaces. This patch will remove ARPHRD_IEEE802154 which is currently deadcode, because ARPHRD_IEEE802154 doesn't reach the minimum 1280 MTU of IPv6. Also we use 6LoWPAN EUI64 specific defines instead using link-layer constanst from 802.15.4 link-layer header. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9a1ec461 |
|
04-Dec-2015 |
Bjørn Mork <bjorn@mork.no> |
ipv6: keep existing flags when setting IFA_F_OPTIMISTIC Commit 64236f3f3d74 ("ipv6: introduce IFA_F_STABLE_PRIVACY flag") failed to update the setting of the IFA_F_OPTIMISTIC flag, causing the IFA_F_STABLE_PRIVACY flag to be lost if IFA_F_OPTIMISTIC is set. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Fixes: 64236f3f3d74 ("ipv6: introduce IFA_F_STABLE_PRIVACY flag") Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ef0952c |
|
03-Dec-2015 |
Andrew Lunn <andrew@lunn.ch> |
ipv6: Only act upon NETDEV_*_TYPE_CHANGE if we have ipv6 addresses An interface changing type may not have IPv6 addresses. Don't call the address configuration type change in this case. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d6df198d |
|
01-Dec-2015 |
Phil Sutter <phil@nwl.cc> |
net: ipv6: restrict hop_limit sysctl setting to range [1; 255] Setting a value bigger than 255 resulted in using only the lower eight bits of that value as it is assigned to the u8 header field. To avoid this unexpected result, reject such values. Setting a value of zero is technically possible, but hosts receiving such a packet have to treat it like hop_limit was set to one, according to RFC2460. Therefore I don't see a use-case for that. Setting a route's hop_limit to zero in iproute2 means to use the sysctl default, which is not the case here: Setting e.g. net.conf.eth0.hop_limit=0 will not make the kernel use net.conf.all.hop_limit for outgoing packets on eth0. To avoid these kinds of confusion, reject zero. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
304d888b |
|
27-Nov-2015 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests" This reverts commit ab450605b35caa768ca33e86db9403229bf42be4. In IPv6, we cannot inherit the dst of the original dst. ndisc packets are IPv6 packets and may take another route than the original packet. This patch breaks the following scenario: a packet comes from eth0 and is forwarded through vxlan1. The encapsulated packet triggers an NS which cannot be sent because of the wrong route. CC: Jiri Benc <jbenc@redhat.com> CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2a189f9e |
|
04-Nov-2015 |
Sabrina Dubroca <sd@queasysnail.net> |
ipv6: clean up dev_snmp6 proc entry when we fail to initialize inet6_dev In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up the dev_snmp6 entry that we have already registered for this device. Call snmp6_unregister_dev in this case. Fixes: a317a2f19da7d ("ipv6: fail early when creating netdev named all or default") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b7b0b1d2 |
|
26-Oct-2015 |
Alexander Duyck <aduyck@mirantis.com> |
ipv6: recreate ipv6 link-local addresses when increasing MTU over IPV6_MIN_MTU This change makes it so that we reinitialize the interface if the MTU is increased back above IPV6_MIN_MTU and the interface is up. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b1974ed0 |
|
19-Oct-2015 |
Arad, Ronen <ronen.arad@intel.com> |
netlink: Rightsize IFLA_AF_SPEC size calculation if_nlmsg_size() overestimates the minimum allocation size of netlink dump request (when called from rtnl_calcit()) or the size of the message (when called from rtnl_getlink()). This is because ext_filter_mask is not supported by rtnl_link_get_af_size() and rtnl_link_get_size(). The over-estimation is significant when at least one netdev has many VLANs configured (8 bytes for each configured VLAN). This patch-set "rightsizes" the protocol specific attribute size calculation by propagating ext_filter_mask to rtnl_link_get_af_size() and adding this a argument to get_link_af_size op in rtnl_af_ops. Bridge module already used filtering aware sizing for notifications. br_get_link_af_size_filtered() is consistent with the modified get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops. br_get_link_af_size() becomes unused and thus removed. Signed-off-by: Ronen Arad <ronen.arad@intel.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ca254490 |
|
12-Oct-2015 |
David Ahern <dsa@cumulusnetworks.com> |
net: Add VRF support to IPv6 stack As with IPv4 support for VRFs added to IPv6 stack by replacing hardcoded table ids with possibly device specific ones and manipulating the oif in the flowi6. The flow flags are used to skip oif compare in nexthop lookups if the device is enslaved to a VRF via the L3 master device. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d9e4ce65 |
|
08-Oct-2015 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: gre: setup default multicast routes over PtP links GRE point-to-point interfaces should also support ipv6 multicast. Setting up default multicast routes on interface creation was forgotten. Add it. Bugzilla: <https://bugzilla.kernel.org/show_bug.cgi?id=103231> Cc: Julien Muchembled <jm@jmuchemb.eu> Cc: Eric Dumazet <edumazet@google.com> Cc: Nicolas Dumazet <ndumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
38cf595b |
|
22-Sep-2015 |
Jiri Benc <jbenc@redhat.com> |
ipv6: remove unused neigh parameter from ndisc functions Since commit 12fd84f4383b1 ("ipv6: Remove unused neigh argument for icmp6_dst_alloc() and its callers."), the neigh parameter of ndisc_send_na and ndisc_send_ns is unused. CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d5566fd7 |
|
11-Sep-2015 |
Sowmini Varadhan <sowmini.varadhan@oracle.com> |
rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats Many commonly used functions like getifaddrs() invoke RTM_GETLINK to dump the interface information, and do not need the the AF_INET6 statististics that are always returned by default from rtnl_fill_ifinfo(). Computing the statistics can be an expensive operation that impacts scaling, so it is desirable to avoid this if the information is not needed. This patch adds a the RTEXT_FILTER_SKIP_STATS extended info flag that can be passed with netlink_request() to avoid statistics computation for the ifinfo path. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e3d5be7 |
|
15-Sep-2015 |
Martin KaFai Lau <kafai@fb.com> |
ipv6: Avoid double dst_free It is a prep work to get dst freeing from fib tree undergo a rcu grace period. The following is a common paradigm: if (ip6_del_rt(rt)) dst_free(rt) which means, if rt cannot be deleted from the fib tree, dst_free(rt) now. 1. We don't know the ip6_del_rt(rt) failure is because it was not managed by fib tree (e.g. DST_NOCACHE) or it had already been removed from the fib tree. 2. If rt had been managed by the fib tree, ip6_del_rt(rt) failure means dst_free(rt) has been called already. A second dst_free(rt) is not always obviously safe. The rt may have been destroyed already. 3. If rt is a DST_NOCACHE, dst_free(rt) should not be called. 4. It is a stopper to make dst freeing from fib tree undergo a rcu grace period. This patch is to use a DST_NOCACHE flag to indicate a rt is not managed by the fib tree. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a3a77372 |
|
29-Aug-2015 |
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> |
net: Optimize snmp stat aggregation by walking all the percpu data at once Docker container creation linearly increased from around 1.6 sec to 7.5 sec (at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field. reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks through per cpu data of an item (iteratively for around 36 items). idea: This patch tries to aggregate the statistics by going through all the items of each cpu sequentially which is reducing cache misses. Docker creation got faster by more than 2x after the patch. Result: Before After Docker creation time 6.836s 3.25s cache miss 2.7% 1.41% perf before: 50.73% docker [kernel.kallsyms] [k] snmp_fold_field 9.07% swapper [kernel.kallsyms] [k] snooze_loop 3.49% docker [kernel.kallsyms] [k] veth_stats_one 2.85% swapper [kernel.kallsyms] [k] _raw_spin_lock perf after: 10.57% docker docker [.] scanblock 8.37% swapper [kernel.kallsyms] [k] snooze_loop 6.91% docker [kernel.kallsyms] [k] snmp_get_cpu_field 6.67% docker [kernel.kallsyms] [k] veth_stats_one changes/ideas suggested: Using buffer in stack (Eric), Usage of memset (David), Using memcpy in place of unaligned_put (Joe). Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
399e6f95 |
|
30-Jul-2015 |
Matan Barak <matanb@mellanox.com> |
net/ipv6: Export addrconf_ifid_eui48 For loopback purposes, RoCE devices should have a default GID in the port GID table, even when the interface is down. In order to do so, we use the IPv6 link local address which would have been genenrated for the related Ethernet netdevice when it goes up as a default GID. addrconf_ifid_eui48 is used to gernerate this address, export it. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ab450605 |
|
20-Aug-2015 |
Jiri Benc <jbenc@redhat.com> |
ipv6: ndisc: inherit metadata dst when creating ndisc requests If output device wants to see the dst, inherit the dst of the original skb in the ndisc request. This is an IPv6 counterpart of commit 0accfc268f4d ("arp: Inherit metadata dst when creating ARP requests"). Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0344338b |
|
13-Aug-2015 |
Andy Gospodarek <gospo@cumulusnetworks.com> |
net: addr IFLA_OPERSTATE to netlink message for ipv6 ifinfo This is useful information to include in ipv6 netlink messages that report interface information. IFLA_OPERSTATE is already included in ipv4 messages, but missing for ipv6. This closes that gap. Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
35103d11 |
|
13-Aug-2015 |
Andy Gospodarek <gospo@cumulusnetworks.com> |
net: ipv6 sysctl option to ignore routes when nexthop link is down Like the ipv4 patch with a similar title, this adds a sysctl to allow the user to change routing behavior based on whether or not the interface associated with the nexthop was an up or down link. The default setting preserves the current behavior, but anyone that enables it will notice that nexthops on down interfaces will no longer be selected: net.ipv6.conf.all.ignore_routes_with_linkdown = 0 net.ipv6.conf.default.ignore_routes_with_linkdown = 0 net.ipv6.conf.lo.ignore_routes_with_linkdown = 0 ... When the above sysctls are set, not only will link status be reported to userspace, but an indication that a nexthop is dead and will not be used is also reported. 1000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium 1000::/8 via 8000::2 dev p8p1 metric 1024 pref medium 7000::/8 dev p7p1 proto kernel metric 256 dead linkdown pref medium 8000::/8 dev p8p1 proto kernel metric 256 pref medium 9000::/8 via 8000::2 dev p8p1 metric 2048 pref medium 9000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium fe80::/64 dev p7p1 proto kernel metric 256 dead linkdown pref medium fe80::/64 dev p8p1 proto kernel metric 256 pref medium This also adds devconf support and notification when sysctl values change. v2: drop use of rt6i_nhflags since it is not needed right now Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8013d1d7 |
|
30-Jul-2015 |
Hangbin Liu <liuhangbin@gmail.com> |
net/ipv6: add sysctl option accept_ra_min_hop_limit Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface") disabled accept hop limit from RA if it is smaller than the current hop limit for security stuff. But this behavior kind of break the RFC definition. RFC 4861, 6.3.4. Processing Received Router Advertisements A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time, and Retrans Timer) may contain a value denoting that it is unspecified. In such cases, the parameter should be ignored and the host should continue using whatever value it is already using. If the received Cur Hop Limit value is non-zero, the host SHOULD set its CurHopLimit variable to the received value. So add sysctl option accept_ra_min_hop_limit to let user choose the minimum hop limit value they can accept from RA. And set default to 1 to meet RFC standards. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3985e8a3 |
|
22-Jul-2015 |
Erik Kline <ek@google.com> |
ipv6: sysctl to restrict candidate source addresses Per RFC 6724, section 4, "Candidate Source Addresses": It is RECOMMENDED that the candidate source addresses be the set of unicast addresses assigned to the interface that will be used to send to the destination (the "outgoing" interface). Add a sysctl to enable this behaviour. Signed-off-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c15df306 |
|
16-Jul-2015 |
YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> |
ipv6: Remove unused arguments for __ipv6_dev_get_saddr(). Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c0b8da1e |
|
13-Jul-2015 |
YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> |
ipv6: Fix finding best source address in ipv6_dev_get_saddr(). Commit 9131f3de2 ("ipv6: Do not iterate over all interfaces when finding source address on specific interface.") did not properly update best source address available. Plus, it introduced possible NULL pointer dereference. Bug was reported by Erik Kline <ek@google.com>. Based on patch proposed by Hajime Tazaki <thehajime@gmail.com>. Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b ("ipv6: Do not iterate over all interfaces when finding source address on specific interface.") Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Acked-by: Hajime Tazaki <thehajime@gmail.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9131f3de |
|
10-Jul-2015 |
YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> |
ipv6: Do not iterate over all interfaces when finding source address on specific interface. If outgoing interface is specified and the candidate address is restricted to the outgoing interface, it is enough to iterate over that given interface only. Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f56a01f |
|
28-Apr-2015 |
Martin KaFai Lau <kafai@fb.com> |
ipv6: Consider RTF_CACHE when searching the fib6 tree It is a prep work for the later bug-fix patch which will stop /128 route from disappearing after pmtu update. The later bug-fix patch will allow a /128 route and its RTF_CACHE clone both exist at the same fib6_node. To do this, we need to prepare the existing fib6 tree search to expect RTF_CACHE for /128 route. Note that the fn->leaf is sorted by rt6i_metric. Hence, RTF_CACHE (if there is any) is always at the front. This property leads to the following: 1. When doing ip6_route_del(), it should honor the RTF_CACHE flag which the caller is used to ask for deleting clone or non-clone. The rtm_to_fib6_config() should also check the RTM_F_CLONED and then set RTF_CACHE accordingly so that: - 'ip -6 r del...' will make ip6_route_del() to delete a route and all its clones. Note that its clones is flushed by fib6_del() - 'ip -6 r flush table cache' will make ip6_route_del() to only delete clone(s). 2. Exclude RTF_CACHE from addrconf_get_prefix_route() which should not configure on a cloned route. 3. No change is need for rt6_device_match() since it currently could return a RTF_CACHE clone route, so the later bug-fix patch will not affect it. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a54acb3a |
|
02-Apr-2015 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
dev: introduce dev_get_iflink() The goal of this patch is to prepare the removal of the iflink field. It introduces a new ndo function, which will be implemented by virtual interfaces. There is no functional change into this patch. All readers of iflink field now call dev_get_iflink(). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
930345ea |
|
29-Mar-2015 |
Jiri Benc <jbenc@redhat.com> |
netlink: implement nla_put_in_addr and nla_put_in6_addr IP addresses are often stored in netlink attributes. Add generic functions to do that. For nla_put_in_addr, it would be nicer to pass struct in_addr but this is not used universally throughout the kernel, in way too many places __be32 is used to store IPv4 address. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
63159f29 |
|
29-Mar-2015 |
Ian Morris <ipm@chirality.org.uk> |
ipv6: coding style: comparison for equality with NULL The ipv6 code uses a mixture of coding styles. In some instances check for NULL pointer is done as x == NULL and sometimes as !x. !x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ff40217e |
|
24-Mar-2015 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: fix sparse warnings in privacy stable addresses generation Those warnings reported by sparse endianness check (via kbuild test robot) are harmless, nevertheless fix them up and make the code a little bit easier to read. Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: 622c81d57b392cc ("ipv6: generation of stable privacy addresses for link-local and autoconf") Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1855b7c3 |
|
23-Mar-2015 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: introduce idgen_delay and idgen_retries knobs This is specified by RFC 7217. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5f40ef77 |
|
23-Mar-2015 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: do retries on stable privacy addresses If a DAD conflict is detected, we want to retry privacy stable address generation up to idgen_retries (= 3) times with a delay of idgen_delay (= 1 second). Add the logic to addrconf_dad_failure. By design, we don't clean up dad failed permanent addresses. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e8e676d |
|
23-Mar-2015 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: collapse state_lock and lock Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
64236f3f |
|
23-Mar-2015 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: introduce IFA_F_STABLE_PRIVACY flag We need to mark appropriate addresses so we can do retries in case their DAD failed. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
622c81d5 |
|
23-Mar-2015 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: generation of stable privacy addresses for link-local and autoconf This patch implements the stable privacy address generation for link-local and autoconf addresses as specified in RFC7217. RID = F(Prefix, Net_Iface, Network_ID, DAD_Counter, secret_key) is the RID (random identifier). As the hash function F we chose one round of sha1. Prefix will be either the link-local prefix or the router advertised one. As Net_Iface we use the MAC address of the device. DAD_Counter and secret_key are implemented as specified. We don't use Network_ID, as it couples the code too closely to other subsystems. It is specified as optional in the RFC. As Net_Iface we only use the MAC address: we simply have no stable identifier in the kernel we could possibly use: because this code might run very early, we cannot depend on names, as they might be changed by user space early on during the boot process. A new address generation mode is introduced, IN6_ADDR_GEN_MODE_STABLE_PRIVACY. With iproute2 one can switch back to none or eui64 address configuration mode although the stable_secret is already set. We refuse writes to ipv6/conf/all/stable_secret but only allow ipv6/conf/default/stable_secret and the interface specific file to be written to. The default stable_secret is used as the parameter for the namespace, the interface specific can overwrite the secret, e.g. when switching a network configuration from one system to another while inheriting the secret. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d1bec99 |
|
23-Mar-2015 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: introduce secret_stable to ipv6_devconf This patch implements the procfs logic for the stable_address knob: The secret is formatted as an ipv6 address and will be stored per interface and per namespace. We track initialized flag and return EIO errors until the secret is set. We don't inherit the secret to newly created namespaces. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
54ff9ef3 |
|
18-Mar-2015 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop} in favor of their inner __ ones, which doesn't grab rtnl. As these functions need to operate on a locked socket, we can't be grabbing rtnl by then. It's too late and doing so causes reversed locking. So this patch: - move rtnl handling to callers instead while already fixing some reversed locking situations, like on vxlan and ipvs code. - renames __ ones to not have the __ mark: __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop} Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93a714d6 |
|
25-Feb-2015 |
Madhu Challa <challa@noironetworks.com> |
multicast: Extend ip address command to enable multicast group join/leave on Joining multicast group on ethernet level via "ip maddr" command would not work if we have an Ethernet switch that does igmp snooping since the switch would not replicate multicast packets on ports that did not have IGMP reports for the multicast addresses. Linux vxlan interfaces created via "ip link add vxlan" have the group option that enables then to do the required join. By extending ip address command with option "autojoin" we can get similar functionality for openvswitch vxlan interfaces as well as other tunneling mechanisms that need to receive multicast traffic. The kernel code is structured similar to how the vxlan driver does a group join / leave. example: ip address add 224.1.1.10/24 dev eth5 autojoin ip address del 224.1.1.10/24 dev eth5 Signed-off-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
77751427 |
|
23-Feb-2015 |
Marcelo Leitner <mleitner@redhat.com> |
ipv6: addrconf: validate new MTU before applying it Currently we don't check if the new MTU is valid or not and this allows one to configure a smaller than minimum allowed by RFCs or even bigger than interface own MTU, which is a problem as it may lead to packet drops. If you have a daemon like NetworkManager running, this may be exploited by remote attackers by forging RA packets with an invalid MTU, possibly leading to a DoS. (NetworkManager currently only validates for values too small, but not for too big ones.) The fix is just to make sure the new value is valid. That is, between IPV6_MIN_MTU and interface's MTU. Note that similar check is already performed at ndisc_router_discovery(), for when kernel itself parses the RA. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
11b1f828 |
|
05-Feb-2015 |
Daniel Borkmann <daniel@iogearbox.net> |
ipv6: addrconf: add missing validate_link_af handler We still need a validate_link_af() handler with an appropriate nla policy, similarly as we have in IPv4 case, otherwise size validations are not being done properly in that case. Fixes: f53adae4eae5 ("net: ipv6: add tokenized interface identifier support") Fixes: bc91b0f07ada ("ipv6: addrconf: implement address generation modes") Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c58da4c6 |
|
04-Feb-2015 |
Erik Kline <ek@google.com> |
net: ipv6: allow explicitly choosing optimistic addresses RFC 4429 ("Optimistic DAD") states that optimistic addresses should be treated as deprecated addresses. From section 2.1: Unless noted otherwise, components of the IPv6 protocol stack should treat addresses in the Optimistic state equivalently to those in the Deprecated state, indicating that the address is available for use but should not be used if another suitable address is available. Optimistic addresses are indeed avoided when other addresses are available (i.e. at source address selection time), but they have not heretofore been available for things like explicit bind() and sendmsg() with struct in6_pktinfo, etc. This change makes optimistic addresses treated more like deprecated addresses than tentative ones. Signed-off-by: Erik Kline <ek@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
207895fd3 |
|
28-Jan-2015 |
Daniel Borkmann <daniel@iogearbox.net> |
net: mark some potential candidates __read_mostly They are all either written once or extremly rarely (e.g. from init code), so we can move them to the .data..read_mostly section. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c2943f14 |
|
20-Jan-2015 |
Harout Hedeshian <harouth@codeaurora.org> |
net: ipv6: Add sysctl entry to disable MTU updates from RA The kernel forcefully applies MTU values received in router advertisements provided the new MTU is less than the current. This behavior is undesirable when the user space is managing the MTU. Instead a sysctl flag 'accept_ra_mtu' is introduced such that the user space can control whether or not RA provided MTU updates should be applied. The default behavior is unchanged; user space must explicitly set this flag to 0 for RA MTUs to be ignored. Signed-off-by: Harout Hedeshian <harouth@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7b46a644 |
|
18-Jan-2015 |
David S. Miller <davem@davemloft.net> |
netlink: Fix bugs in nlmsg_end() conversions. Commit 053c095a82cf ("netlink: make nlmsg_end() and genlmsg_end() void") didn't catch all of the cases where callers were breaking out on the return value being equal to zero, which they no longer should when zero means success. Fix all such cases. Reported-by: Marcel Holtmann <marcel@holtmann.org> Reported-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
053c095a |
|
16-Jan-2015 |
Johannes Berg <johannes.berg@intel.com> |
netlink: make nlmsg_end() and genlmsg_end() void Contrary to common expectations for an "int" return, these functions return only a positive value -- if used correctly they cannot even return 0 because the message header will necessarily be in the skb. This makes the very common pattern of if (genlmsg_end(...) < 0) { ... } be a whole bunch of dead code. Many places also simply do return nlmsg_end(...); and the caller is expected to deal with it. This also commonly (at least for me) causes errors, because it is very common to write if (my_function(...)) /* error condition */ and if my_function() does "return nlmsg_end()" this is of course wrong. Additionally, there's not a single place in the kernel that actually needs the message length returned, and if anyone needs it later then it'll be very easy to just use skb->len there. Remove this, and make the functions void. This removes a bunch of dead code as described above. The patch adds lines because I did - return nlmsg_end(...); + nlmsg_end(...); + return 0; I could have preserved all the function's return values by returning skb->len, but instead I've audited all the places calling the affected functions and found that none cared. A few places actually compared the return value with <= 0 in dump functionality, but that could just be changed to < 0 with no change in behaviour, so I opted for the more efficient version. One instance of the error I've made numerous times now is also present in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't check for <0 or <=0 and thus broke out of the loop every single time. I've preserved this since it will (I think) have caused the messages to userspace to be formatted differently with just a single message for every SKB returned to userspace. It's possible that this isn't needed for the tools that actually use this, but I don't even know what they are so couldn't test that changing this behaviour would be acceptable. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
73cf0e92 |
|
25-Nov-2014 |
zhuyj <zyjzyj2000@gmail.com> |
ipv6: Remove unnecessary test The "init_net" test in function addrconf_exit_net is introduced in commit 44a6bd29 [Create ipv6 devconf-s for namespaces] to avoid freeing init_net. In commit c900a800 [ipv6: fix bad free of addrconf_init_net], function addrconf_init_net will allocate memory for every net regardless of init_net. In this case, it is unnecessary to make "init_net" test. CC: Hong Zhiguo <honkiko@gmail.com> CC: Octavian Purdila <opurdila@ixiacom.com> CC: Pavel Emelyanov <xemul@openvz.org> CC: Cong Wang <cwang@twopensource.com> Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5d08d71 |
|
23-Nov-2014 |
Ian Morris <ipm@chirality.org.uk> |
ipv6: coding style improvements (remove assignment in if statements) This change has no functional impact and simply addresses some coding style issues detected by checkpatch. Specifically this change adjusts "if" statements which also include the assignment of a variable. No changes to the resultant object files result as determined by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ba7a46f1 |
|
11-Nov-2014 |
Joe Perches <joe@perches.com> |
net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited Use the more common dynamic_debug capable net_dbg_ratelimited and remove the LIMIT_NETDEBUG macro. All messages are still ratelimited. Some KERN_<LEVEL> uses are changed to KERN_DEBUG. This may have some negative impact on messages that were emitted at KERN_INFO that are not not enabled at all unless DEBUG is defined or dynamic_debug is enabled. Even so, these messages are now _not_ emitted by default. This also eliminates the use of the net_msg_warn sysctl "/proc/sys/net/core/warnings". For backward compatibility, the sysctl is not removed, but it has no function. The extern declaration of net_msg_warn is removed from sock.h and made static in net/core/sysctl_net_core.c Miscellanea: o Update the sysctl documentation o Remove the embedded uses of pr_fmt o Coalesce format fragments o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7fd2561e |
|
28-Oct-2014 |
Erik Kline <ek@google.com> |
net: ipv6: Add a sysctl to make optimistic addresses useful candidates Add a sysctl that causes an interface's optimistic addresses to be considered equivalent to other non-deprecated addresses for source address selection purposes. Preferred addresses will still take precedence over optimistic addresses, subject to other ranking in the source address selection algorithm. This is useful where different interfaces are connected to different networks from different ISPs (e.g., a cell network and a home wifi network). The current behaviour complies with RFC 3484/6724, and it makes sense if the host has only one interface, or has multiple interfaces on the same network (same or cooperating administrative domain(s), but not in the multiple distinct networks case. For example, if a mobile device has an IPv6 address on an LTE network and then connects to IPv6-enabled wifi, while the wifi IPv6 address is undergoing DAD, IPv6 connections will try use the wifi default route with the LTE IPv6 address, and will get stuck until they time out. Also, because optimistic nodes can receive frames, issue an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC flag appropriately set). A second RTM_NEWADDR is sent if DAD completes (the address flags have changed), otherwise an RTM_DELADDR is sent. Also: add an entry in ip-sysctl.txt for optimistic_dad. Signed-off-by: Erik Kline <ek@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b2ed64a9 |
|
27-Oct-2014 |
Lubomir Rintel <lkundrak@v3.sk> |
ipv6: notify userspace when we added or changed an ipv6 token NetworkManager might want to know that it changed when the router advertisement arrives. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Daniel Borkmann <dborkman@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9451a304 |
|
27-Oct-2014 |
Fabian Frederick <fabf@skynet.be> |
ipv6: replace min/casting by min_t Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
705f1c86 |
|
27-Sep-2014 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: remove rt6i_genid Eric Dumazet noticed that all no-nonexthop or no-gateway routes which are already marked DST_HOST (e.g. input routes routes) will always be invalidated during sk_dst_check. Thus per-socket dst caching absolutely had no effect and early demuxing had no effect. Thus this patch removes rt6i_genid: fn_sernum already gets modified during add operations, so we only must ensure we mutate fn_sernum during ipv6 address remove operations. This is a fairly cost extensive operations, but address removal should not happen that often. Also our mtu update functions do the same and we heard no complains so far. xfrm policy changes also cause a call into fib6_flush_trees. Also plug a hole in rt6_info (no cacheline changes). I verified via tracing that this change has effect. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Martin Lau <kafai@fb.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ce62a84 |
|
11-Sep-2014 |
WANG Cong <xiyou.wangcong@gmail.com> |
ipv6: exit early in addrconf_notify() if IPv6 is disabled If IPv6 is explicitly disabled before the interface comes up, it makes no sense to continue when it comes up, even just print a message. (I am not sure about other cases though, so I prefer not to touch) Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
013b4d90 |
|
11-Sep-2014 |
WANG Cong <xiyou.wangcong@gmail.com> |
ipv6: clean up ipv6_dev_ac_inc() Make it accept inet6_dev, and rename it to __ipv6_dev_ac_inc() to reflect this change. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
381f4dca |
|
10-Sep-2014 |
Sabrina Dubroca <sd@queasysnail.net> |
ipv6: clean up anycast when an interface is destroyed If we try to rmmod the driver for an interface while sockets with setsockopt(JOIN_ANYCAST) are alive, some refcounts aren't cleaned up and we get stuck on: unregister_netdevice: waiting for ens3 to become free. Usage count = 1 If we LEAVE_ANYCAST/close everything before rmmod'ing, there is no problem. We need to perform a cleanup similar to the one for multicast in addrconf_ifdown(how == 1). Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e7478dfc |
|
03-Sep-2014 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipv6: use addrconf_get_prefix_route() to remove peer addr addrconf_get_prefix_route() ensures to get the right route in the right table. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f24062b0 |
|
03-Sep-2014 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipv6: fix a refcnt leak with peer addr There is no reason to take a refcnt before deleting the peer address route. It's done some lines below for the local prefix route because inet6_ifa_finish_destroy() will release it at the end. For the peer address route, we want to free it right now. This bug has been introduced by commit caeaba79009c ("ipv6: add support of peer address"). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a9ed4a29 |
|
02-Sep-2014 |
Sabrina Dubroca <sd@queasysnail.net> |
ipv6: fix rtnl locking in setsockopt for anycast and multicast Calling setsockopt with IPV6_JOIN_ANYCAST or IPV6_LEAVE_ANYCAST triggers the assertion in addrconf_join_solict()/addrconf_leave_solict() ipv6_sock_ac_join(), ipv6_sock_ac_drop(), ipv6_sock_ac_close() need to take RTNL before calling ipv6_dev_ac_inc/dec. Same thing with ipv6_sock_mc_join(), ipv6_sock_mc_drop(), ipv6_sock_mc_close() before calling ipv6_dev_mc_inc/dec. This patch moves ASSERT_RTNL() up a level in the call stack. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reported-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
67ba4152 |
|
24-Aug-2014 |
Ian Morris <ipm@chirality.org.uk> |
ipv6: White-space cleansing : Line Layouts This patch makes no changes to the logic of the code but simply addresses coding style issues as detected by checkpatch. Both objdump and diff -w show no differences. A number of items are addressed in this patch: * Multiple spaces converted to tabs * Spaces before tabs removed. * Spaces in pointer typing cleansed (char *)foo etc. * Remove space after sizeof * Ensure spacing around comparators such as if statements. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a317a2f1 |
|
25-Jul-2014 |
WANG Cong <xiyou.wangcong@gmail.com> |
ipv6: fail early when creating netdev named all or default We create a proc dir for each network device, this will cause conflicts when the devices have name "all" or "default". Rather than emitting an ugly kernel warning, we could just fail earlier by checking the device name. Reported-by: Stephane Chazelas <stephane.chazelas@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bc91b0f0 |
|
11-Jul-2014 |
Jiri Pirko <jiri@resnulli.us> |
ipv6: addrconf: implement address generation modes This patch introduces a possibility for userspace to set various (so far two) modes of generating addresses. This is useful for example for NetworkManager because it can set the mode to NONE and take care of link local addresses itself. That allow it to have the interface up, monitoring carrier but still don't have any addresses on it. One more use-case by Dan Williams: <quote> WWAN devices often have their LL address provided by the firmware of the device, which sometimes refuses to respond to incorrect LL addresses when doing DHCPv6 or IPv6 ND. The kernel cannot generate the correct LL address for two reasons: 1) WWAN pseudo-ethernet interfaces often construct a fake MAC address, or read a meaningless MAC address from the firmware. Thus the EUI64 and the IPv6LL address the kernel assigns will be wrong. The real LL address is often retrieved from the firmware with AT or proprietary commands. 2) WWAN PPP interfaces receive their LL address from IPV6CP, not from kernel assignments. Only after IPV6CP has completed do we know the LL address of the PPP interface and its peer. But the kernel has already assigned an incorrect LL address to the interface. So being able to suppress the kernel LL address generation and assign the one retrieved from the firmware is less complicated and more robust. </quote> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d9333196 |
|
25-Jun-2014 |
Ben Greear <greearb@candelatech.com> |
ipv6: Allow accepting RA from local IP addresses. This can be used in virtual networking applications, and may have other uses as well. The option is disabled by default. A specific use case is setting up virtual routers, bridges, and hosts on a single OS without the use of network namespaces or virtual machines. With proper use of ip rules, routing tables, veth interface pairs and/or other virtual interfaces, and applications that can bind to interfaces and/or IP addresses, it is possibly to create one or more virtual routers with multiple hosts attached. The host interfaces can act as IPv6 systems, with radvd running on the ports in the virtual routers. With the option provided in this patch enabled, those hosts can now properly obtain IPv6 addresses from the radvd. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
698365fa |
|
05-May-2014 |
WANG Cong <xiyou.wangcong@gmail.com> |
net: clean up snmp stats code commit 8f0ea0fe3a036a47767f9c80e (snmp: reduce percpu needs by 50%) reduced snmp array size to 1, so technically it doesn't have to be an array any more. What's more, after the following commit: commit 933393f58fef9963eac61db8093689544e29a600 Date: Thu Dec 22 11:58:51 2011 -0600 percpu: Remove irqsafe_cpu_xxx variants We simply say that regular this_cpu use must be safe regardless of preemption and interrupt state. That has no material change for x86 and s390 implementations of this_cpu operations. However, arches that do not provide their own implementation for this_cpu operations will now get code generated that disables interrupts instead of preemption. probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after almost 3 years, no one complains. So, just convert the array to a single pointer and remove snmp_mib_init() and snmp_mib_free() as well. Cc: Christoph Lameter <cl@linux.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07c8e35a |
|
02-May-2014 |
WANG Cong <xiyou.wangcong@gmail.com> |
ipv6: remove unused function ipv6_inherit_linklocal() It is no longer used after commit e837735ec406a347756e (ip6_tunnel: ensure to always have a link local address). Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6046d5b4 |
|
20-Apr-2014 |
Heiner Kallweit <heiner.kallweit@web.de> |
ipv6: support IFA_F_MANAGETEMPADDR for address deletion too Userspace applications can use IFA_F_MANAGETEMPADDR with RTM_NEWADDR already to indicate that the kernel should take care of temporary address management. This patch adds related functionality to RTM_DELADDR. By setting IFA_F_MANAGETEMPADDR a userspace application can indicate that the kernel should delete all related temporary addresses as well. A corresponding patch for the "ip addr del" command has been applied to iproute2 already. Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c15b1cca |
|
27-Mar-2014 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: move DAD and addrconf_verify processing to workqueue addrconf_join_solict and addrconf_join_anycast may cause actions which need rtnl locked, especially on first address creation. A new DAD state is introduced which defers processing of the initial DAD processing into a workqueue. To get rtnl lock we need to push the code paths which depend on those calls up to workqueues, specifically addrconf_verify and the DAD processing. (v2) addrconf_dad_failure needs to be queued up to the workqueue, too. This patch introduces a new DAD state and stop the DAD processing in the workqueue (this is because of the possible ipv6_del_addr processing which removes the solicited multicast address from the device). addrconf_verify_lock is removed, too. After the transition it is not needed any more. As we are not processing in bottom half anymore we need to be a bit more careful about disabling bottom half out when we lock spin_locks which are also used in bh. Relevant backtrace: [ 541.030090] RTNL: assertion failed at net/core/dev.c (4496) [ 541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.10.33-1-amd64-vyatta #1 [ 541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 541.031146] ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8 [ 541.031148] 0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18 [ 541.031150] 0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000 [ 541.031152] Call Trace: [ 541.031153] <IRQ> [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17 [ 541.031180] [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180 [ 541.031183] [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0 [ 541.031185] [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0 [ 541.031189] [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90 [ 541.031198] [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6] [ 541.031208] [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0 [ 541.031212] [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6] [ 541.031216] [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6] [ 541.031219] [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6] [ 541.031223] [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6] [ 541.031226] [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6] [ 541.031229] [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6] [ 541.031233] [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6] [ 541.031241] [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50 [ 541.031244] [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6] [ 541.031247] [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6] [ 541.031255] [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90 [ 541.031258] [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6] Hunks and backtrace stolen from a patch by Stephen Hemminger. Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ecab6701 |
|
12-Mar-2014 |
Heiner Kallweit <heiner.kallweit@web.de> |
ipv6: Avoid unnecessary temporary addresses being generated tmp_prefered_lft is an offset to ifp->tstamp, not now. Therefore age needs to be added to the condition. Age calculation in ipv6_create_tempaddr is different from the one in addrconf_verify and doesn't consider ADDRCONF_TIMER_FUZZ_MINUS. This can cause age in ipv6_create_tempaddr to be less than the one in addrconf_verify and therefore unnecessary temporary address to be generated. Use age calculation as in addrconf_modify to avoid this. Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
08b44656 |
|
17-Feb-2014 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
gre: add link local route when local addr is any This bug was reported by Steinar H. Gunderson and was introduced by commit f7cb8886335d ("sit/gre6: don't try to add the same route two times"). root@morgental:~# ip tunnel add foo mode gre remote 1.2.3.4 ttl 64 root@morgental:~# ip link set foo up mtu 1468 root@morgental:~# ip -6 route show dev foo fe80::/64 proto kernel metric 256 but after the above commit, no such route shows up. There is no link local route because dev->dev_addr is 0 (because local ipv4 address is 0), hence no link local address is configured. In this scenario, the link local address is added manually: 'ip -6 addr add fe80::1 dev foo' and because prefix is /128, no link local route is added by the kernel. Even if the right things to do is to add the link local address with a /64 prefix, we need to restore the previous behavior to avoid breaking userpace. Reported-by: Steinar H. Gunderson <sesse@samfundet.no> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
33d99113 |
|
24-Jan-2014 |
Gao feng <gaofeng@cn.fujitsu.com> |
ipv6: reallocate addrconf router for ipv6 address when lo device up commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f "net IPv6 : Fix broken IPv6 routing table after loopback down-up" allocates addrconf router for ipv6 address when lo device up. but commit a881ae1f625c599b460cc8f8a7fcb1c438f699ad "ipv6:don't call addrconf_dst_alloc again when enable lo" breaks this behavior. Since the addrconf router is moved to the garbage list when lo device down, we should release this router and rellocate a new one for ipv6 address when lo device up. This patch solves bug 67951 on bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=67951 change from v1: use ip6_rt_put to repleace ip6_del_rt, thanks Hannes! change code style, suggested by Sergei. CC: Sabrina Dubroca <sd@queasysnail.net> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
602582ca |
|
19-Jan-2014 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: optimize link local address search ipv6_link_dev_addr sorts newly added addresses by scope in ifp->addr_list. Smaller scope addresses are added to the tail of the list. Use this fact to iterate in reverse over addr_list and break out as soon as a higher scoped one showes up, so we can spare some cycles on machines with lot's of addresses. The ordering of the addresses is not relevant and we are more likely to get the eui64 generated address with this change anyway. Suggested-by: Brian Haley <brian.haley@hp.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
11ffff75 |
|
16-Jan-2014 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: simplify detection of first operational link-local address on interface In commit 1ec047eb4751e3 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses") I build the detection of the first operational link-local address much to complex. Additionally this code now has a race condition. Replace it with a much simpler variant, which just scans the address list when duplicate address detection completes, to check if this is the first valid link local address and send RS and MLD reports then. Fixes: 1ec047eb4751e3 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses") Reported-by: Jiri Pirko <jiri@resnulli.us> Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b84efec |
|
15-Jan-2014 |
Thomas Haller <thaller@redhat.com> |
ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE Refactor the deletion/update of prefix routes when removing an address. Now also consider IFA_F_NOPREFIXROUTE and if there is an address present with this flag, to not cleanup the route. Instead, assume that userspace is taking care of this route. Also perform the same cleanup, when userspace changes an existing address to add NOPREFIXROUTE (to an address that didn't have this flag). This is done because when the address was added, a prefix route was created for it. Since the user now wants to handle this route by himself, we cleanup this route. This cleanup of the route is not totally robust. There is no guarantee, that the route we are about to delete was really the one added by the kernel. This behavior does not change by the patch, and in practice it should work just fine. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
761aac73 |
|
15-Jan-2014 |
Thomas Haller <thaller@redhat.com> |
ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes When adding/modifying an IPv6 address, the userspace application needs a way to suppress adding a prefix route. This is for example relevant together with IFA_F_MANAGERTEMPADDR, where userspace creates autoconf generated addresses, but depending on on-link, no route for the prefix should be added. Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
db9c7c39 |
|
12-Jan-2014 |
stephen hemminger <stephen@networkplumber.org> |
ipv6: addrconf spelling fixes Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
63862b5b |
|
11-Jan-2014 |
Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> |
net: replace macros net_random and net_srandom with direct calls to prandom This patch removes the net_random and net_srandom macros and replaces them with direct calls to the prandom ones. As new commits only seem to use prandom_u32 there is no use to keep them around. This change makes it easier to grep for users of prandom_u32. Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07edd741 |
|
08-Jan-2014 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: add link-local, sit and loopback address with INFINITY_LIFE_TIME In the past the IFA_PERMANENT flag indicated, that the valid and preferred lifetime where ignored. Since change fad8da3e085ddf ("ipv6 addrconf: fix preferred lifetime state-changing behavior while valid_lft is infinity") we honour at least the preferred lifetime on those addresses. As such the valid lifetime gets recalculated and updated to 0. If loopback address is added manually this problem does not occur. Also if NetworkManager manages IPv6, those addresses will get added via inet6_rtm_newaddr and thus will have a correct lifetime, too. Reported-by: François-Xavier Le Bail <fx.lebail@yahoo.com> Reported-by: Damien Wyart <damien.wyart@gmail.com> Fixes: fad8da3e085ddf ("ipv6 addrconf: fix preferred lifetime state-changing behavior while valid_lft is infinity") Cc: Yasushi Asano <yasushi.asano@jp.fujitsu.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
88ad3149 |
|
06-Jan-2014 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: don't install anycast address for /128 addresses on routers It does not make sense to create an anycast address for an /128-prefix. Suppress it. As 32019e651c6fce ("ipv6: Do not leave router anycast address for /127 prefixes.") shows we also may not leave them, because we could accidentally remove an anycast address the user has allocated or got added via another prefix. Cc: François-Xavier Le Bail <fx.lebail@yahoo.com> Cc: Thomas Haller <thaller@redhat.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fad8da3e |
|
30-Dec-2013 |
Yasushi Asano <yasushi.asano@jp.fujitsu.com> |
ipv6 addrconf: fix preferred lifetime state-changing behavior while valid_lft is infinity Fixed a problem with setting the lifetime of an IPv6 address. When setting preferred_lft to a value not zero or infinity, while valid_lft is infinity(0xffffffff) preferred lifetime is set to forever and does not update. Therefore preferred lifetime never becomes deprecated. valid lifetime and preferred lifetime should be set independently, even if valid lifetime is infinity, preferred lifetime must expire correctly (meaning it must eventually become deprecated) Signed-off-by: Yasushi Asano <yasushi.asano@jp.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3678a9d8 |
|
30-Dec-2013 |
stephen hemminger <stephen@networkplumber.org> |
netlink: cleanup rntl_af_register The function __rtnl_af_register is never called outside this code, and the return value is always 0. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
46306b49 |
|
22-Dec-2013 |
Li RongQing <roy.qing.li@gmail.com> |
ipv6: unneccessary to get address prefix in addrconf_get_prefix_route Since addrconf_get_prefix_route inputs the address prefix to fib6_locate, which does not uses the data which is out of the prefix_len length, so do not need to use ipv6_addr_prefix to get address prefix. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c92d5491 |
|
17-Dec-2013 |
stephen hemminger <stephen@networkplumber.org> |
netconf: add support for IPv6 proxy_ndp Need to be able to see changes to proxy NDP status on a per interface basis via netlink (analog to proxy_arp). Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e74bccb8 |
|
11-Dec-2013 |
Jukka Rissanen <jukka.rissanen@linux.intel.com> |
ipv6: Add checks for 6LOWPAN ARP type Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
#
971a351c |
|
10-Dec-2013 |
Jiri Pirko <jiri@resnulli.us> |
ipv6 addrconf: revert /proc/net/if_inet6 ifa_flag format Turned out that applications like ifconfig do not handle the change. So revert ifa_flag format back to 2-letter hex value. Introduced by: commit 479840ffdbe4242e8a25349218c8e0859223aa35 "ipv6 addrconf: extend ifa_flags to u32" Reported-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Tested-by: FLorent Fourcot <florent.fourcot@enst-bretagne.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bba24896 |
|
07-Dec-2013 |
Jiri Pirko <jiri@resnulli.us> |
neigh: ipv6: respect default values set before an address is assigned to device Make the behaviour similar to ipv4. This will allow user to set sysctl default neigh param values and these values will be respected even by devices registered before (that ones what do not have address set yet). Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
73af614a |
|
07-Dec-2013 |
Jiri Pirko <jiri@resnulli.us> |
neigh: use tbl->family to distinguish ipv4 from ipv6 Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f9248e5 |
|
07-Dec-2013 |
Jiri Pirko <jiri@resnulli.us> |
neigh: convert parms to an array This patch converts the neigh param members to an array. This allows easier manipulation which will be needed later on to provide better management of default values. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
53bd6749 |
|
06-Dec-2013 |
Jiri Pirko <jiri@resnulli.us> |
ipv6 addrconf: introduce IFA_F_MANAGETEMPADDR to tell kernel to manage temporary addresses Creating an address with this flag set will result in kernel taking care of temporary addresses in the same way as if the address was created by kernel itself (after RA receive). This allows userspace applications implementing the autoconfiguration (NetworkManager for example) to implement ipv6 addresses privacy. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
479840ff |
|
06-Dec-2013 |
Jiri Pirko <jiri@resnulli.us> |
ipv6 addrconf: extend ifa_flags to u32 There is no more space in u8 ifa_flags. So do what davem suffested and add another netlink attr called IFA_FLAGS for carry more flags. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Thomas Haller <thaller@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
57ec0afe |
|
02-Dec-2013 |
François-Xavier Le Bail <fx.lebail@yahoo.com> |
ipv6: fix third arg of anycast_dst_alloc(), must be bool. Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f7cb8886 |
|
14-Nov-2013 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
sit/gre6: don't try to add the same route two times addrconf_add_linklocal() already adds the link local route, so there is no reason to add it before calling this function. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f0e2acfa |
|
14-Nov-2013 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
sit: link local routes are missing When a link local address was added to a sit interface, the corresponding route was not configured. This breaks routing protocols that use the link local address, like OSPFv3. To ease the code reading, I remove sit_route_add(), which only adds v4 mapped routes, and add this kind of route directly in sit_add_v4_addrs(). Thus link local and v4 mapped routes are configured in the same place. Reported-by: Li Hongjun <hongjun.li@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
929c9cf3 |
|
14-Nov-2013 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
sit: fix prefix length of ll and v4mapped addresses When the local IPv4 endpoint is wilcard (0.0.0.0), the prefix length is correctly set, ie 64 if the address is a link local one or 96 if the address is a v4 mapped one. But when the local endpoint is specified, the prefix length is set to 128 for both kind of address. This patch fix this wrong prefix length. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
827da44c |
|
07-Oct-2013 |
John Stultz <john.stultz@linaro.org> |
net: Explicitly initialize u64_stats_sync structures for lockdep In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Jesse Gross <jesse@nicira.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Roger Luethi <rl@hellgate.ch> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Simon Horman <horms@verge.net.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5d9efa7e |
|
28-Oct-2013 |
David S. Miller <davem@davemloft.net> |
ipv6: Remove privacy config option. The code for privacy extentions is very mature, and making it configurable only gives marginal memory/code savings in exchange for obfuscation and hard to read code via CPP ifdef'ery. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c9d55d5b |
|
25-Sep-2013 |
Paul Marks <pmarks@google.com> |
ipv6: Fix preferred_lft not updating in some cases Consider the scenario where an IPv6 router is advertising a fixed preferred_lft of 1800 seconds, while the valid_lft begins at 3600 seconds and counts down in realtime. A client should reset its preferred_lft to 1800 every time the RA is received, but a bug is causing Linux to ignore the update. The core problem is here: if (prefered_lft != ifp->prefered_lft) { Note that ifp->prefered_lft is an offset, so it doesn't decrease over time. Thus, the comparison is always (1800 != 1800), which fails to trigger an update. The most direct solution would be to compute a "stored_prefered_lft", and use that value in the comparison. But I think that trying to filter out unnecessary updates here is a premature optimization. In order for the filter to apply, both of these would need to hold: - The advertised valid_lft and preferred_lft are both declining in real time. - No clock skew exists between the router & client. So in this patch, I've set "update_lft = 1" unconditionally, which allows the surrounding code to be greatly simplified. Signed-off-by: Paul Marks <pmarks@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7df37ff3 |
|
23-Sep-2013 |
Catalin\(ux\) M. BOIE <catab@embedromix.ro> |
IPv6 NAT: Do not drop DNATed 6to4/6rd packets When a router is doing DNAT for 6to4/6rd packets the latest anti-spoofing commit 218774dc ("ipv6: add anti-spoofing checks for 6to4 and 6rd") will drop them because the IPv6 address embedded does not match the IPv4 destination. This patch will allow them to pass by testing if we have an address that matches on 6to4/6rd interface. I have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR. Also, log the dropped packets (with rate limit). Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
13c7bf08 |
|
30-Aug-2013 |
Petr Holasek <pholasek@redhat.com> |
ipv6: ipv6_create_tempaddr cleanup This two-liner removes max_addresses variable which is now unecessary related to patch [ipv6: remove max_addresses check from ipv6_create_tempaddr]. Signed-off-by: Petr Holasek <pholasek@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f39dc102 |
|
30-Aug-2013 |
Cong Wang <amwang@redhat.com> |
ipv6: move in6_dev_finish_destroy() into core kernel in6_dev_put() will be needed by vxlan module, so is in6_dev_finish_destroy(). Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
caf92bc4 |
|
30-Aug-2013 |
Cong Wang <amwang@redhat.com> |
ipv6: do not call ndisc_send_rs() with write lock Because vxlan module will call ip6_dst_lookup() in TX path, which will hold write lock. So we have to release this write lock before calling ndisc_send_rs(), otherwise could deadlock. Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
034dfc5d |
|
30-Aug-2013 |
Cong Wang <amwang@redhat.com> |
ipv6: export in6addr_loopback to modules It is needed by vxlan module. Noticed by Mike. Cc: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b800c3b9 |
|
26-Aug-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: drop fragmented ndisc packets by default (RFC 6980) This patch implements RFC6980: Drop fragmented ndisc packets by default. If a fragmented ndisc packet is received the user is informed that it is possible to disable the check. Cc: Fernando Gont <fernando@gont.com.ar> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e837735e |
|
19-Aug-2013 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ip6_tunnel: ensure to always have a link local address When an Xin6 tunnel is set up, we check other netdevices to inherit the link- local address. If none is available, the interface will not have any link-local address. RFC4862 expects that each interface has a link local address. Now than this kind of tunnels supports x-netns, it's easy to fall in this case (by creating the tunnel in a netns where ethernet interfaces stand and then moving it to a other netns where no ethernet interface is available). RFC4291, Appendix A suggests two methods: the first is the one currently implemented, the second is to generate a unique identifier, so that we can always generate the link-local address. Let's use eth_random_addr() to generate this interface indentifier. I remove completly the previous method, hence for the whole life of the interface, the link-local address remains the same (previously, it depends on which ethernet interfaces were up when the tunnel interface was set up). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7eaa48a4 |
|
21-Aug-2013 |
David S. Miller <davem@davemloft.net> |
Revert "ipv6: fix checkpatch errors in net/ipv6/addrconf.c" This reverts commit df8372ca747f6da9e8590775721d9363c1dfc87e. These changes are buggy and make unintended semantic changes to ip6_tnl_add_linklocal(). Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df8372ca |
|
16-Aug-2013 |
dingtianhong <dingtianhong@huawei.com> |
ipv6: fix checkpatch errors in net/ipv6/addrconf.c ERROR: code indent should use tabs where possible: fix 2. ERROR: do not use assignment in if condition: fix 5. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ba3542e1 |
|
16-Aug-2013 |
dingtianhong <dingtianhong@huawei.com> |
ipv6: convert the uses of ADBG and remove the superfluous parentheses Just follow the Joe Perches's opinion, it is a better way to fix the style errors. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4b08a8f1 |
|
16-Aug-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: remove max_addresses check from ipv6_create_tempaddr Because of the max_addresses check attackers were able to disable privacy extensions on an interface by creating enough autoconfigured addresses: <http://seclists.org/oss-sec/2012/q4/292> But the check is not actually needed: max_addresses protects the kernel to install too many ipv6 addresses on an interface and guards addrconf_prefix_rcv to install further addresses as soon as this limit is reached. We only generate temporary addresses in direct response of a new address showing up. As soon as we filled up the maximum number of addresses of an interface, we stop installing more addresses and thus also stop generating more temp addresses. Even if the attacker tries to generate a lot of temporary addresses by announcing a prefix and removing it again (lifetime == 0) we won't install more temp addresses, because the temporary addresses do count to the maximum number of addresses, thus we would stop installing new autoconfigured addresses when the limit is reached. This patch fixes CVE-2013-0343 (but other layer-2 attacks are still possible). Thanks to Ding Tianhong to bring this topic up again. Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: George Kargiotakis <kargig@void.gr> Cc: P J P <ppandit@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fc4eba58 |
|
13-Aug-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: make unsolicited report intervals configurable for mld Commit cab70040dfd95ee32144f02fade64f0cb94f31a0 ("net: igmp: Reduce Unsolicited report interval to 1s when using IGMPv3") and 2690048c01f32bf45d1c1e1ab3079bc10ad2aea7 ("net: igmp: Allow user-space configuration of igmp unsolicited report interval") by William Manley made igmp unsolicited report intervals configurable per interface and corrected the interval of unsolicited igmpv3 report messages resendings to 1s. Same needs to be done for IPv6: MLDv1 (RFC2710 7.10.): 10 seconds MLDv2 (RFC3810 9.11.): 1 second Both intervals are configurable via new procfs knobs mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval. (also added .force_mld_version to ipv6_devconf_dflt to bring structs in line without semantic changes) v2: a) Joined documentation update for IPv4 and IPv6 MLD/IGMP unsolicited_report_interval procfs knobs. b) incorporate stylistic feedback from William Manley v3: a) add new DEVCONF_* values to the end of the enum (thanks to David Miller) Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: William Manley <william.manley@youview.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
439677d7 |
|
01-Aug-2013 |
fan.du <fan.du@windriver.com> |
ipv6: bump genid when delete/add address Server Client 2001:1::803/64 <-> 2001:1::805/64 2001:2::804/64 <-> 2001:2::806/64 Server side fib binary tree looks like this: (2001:/64) / / ffff88002103c380 / \ (2) / \ (2001::803/128) ffff880037ac07c0 / \ / \ (3) ffff880037ac0640 (2001::806/128) / \ (1) / \ (2001::804/128) (2001::805/128) Delete 2001::804/64 won't cause prefix route deleted as well as rt in (3) destinate to 2001::806 with source address as 2001::804/64. That's because 2001::803/64 is still alive, which make onlink=1 in ipv6_del_addr, this is where the substantial difference between same prefix configuration and different prefix configuration :) So packet are still transmitted out to 2001::806 with source address as 2001::804/64. So bump genid will clear rt in (3), and up layer protocol will eventually find the right one for themselves. This problem arised from the discussion in here: http://marc.info/?l=linux-netdev&m=137404469219410&w=4 Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a226b2c |
|
01-Aug-2013 |
Jiri Benc <jbenc@redhat.com> |
ipv6: prevent race between address creation and removal There's a race in IPv6 automatic addess assignment. The address is created with zero lifetime when it's added to various address lists. Before it gets assigned the correct lifetime, there's a window where a new address may be configured. This causes the semi-initiated address to be deleted in addrconf_verify. This was discovered as a reference leak caused by concurrent run of __ipv6_ifa_notify for both RTM_NEWADDR and RTM_DELADDR with the same address. Fix this by setting the lifetime before the address is added to inet6_addr_lst. A few notes: 1. In addrconf_prefix_rcv, by setting update_lft to zero, the if (update_lft) { ... } condition is no longer executed for newly created addresses. This is okay, as the ifp fields are set in ipv6_add_addr now and ipv6_ifa_notify is called (and has been called) through addrconf_dad_start. 2. The removal of the whole block under ifp->lock in inet6_addr_add is okay, too, as tstamp is initialized to jiffies in ipv6_add_addr. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3f8f5298 |
|
01-Aug-2013 |
Jiri Pirko <jiri@resnulli.us> |
ipv6: move peer_addr init into ipv6_add_addr() Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8965779d |
|
29-Jun-2013 |
Amerigo Wang <amwang@redhat.com> |
ipv6,mcast: always hold idev->lock before mca_lock dingtianhong reported the following deadlock detected by lockdep: ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.24.05-0.1-default #1 Not tainted ------------------------------------------------------- ksoftirqd/0/3 is trying to acquire lock: (&ndev->lock){+.+...}, at: [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120 but task is already holding lock: (&mc->mca_lock){+.+...}, at: [<ffffffff8149d130>] mld_send_report+0x40/0x150 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mc->mca_lock){+.+...}: [<ffffffff810a8027>] validate_chain+0x637/0x730 [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500 [<ffffffff810a8734>] lock_acquire+0x114/0x150 [<ffffffff814f691a>] rt_spin_lock+0x4a/0x60 [<ffffffff8149e4bb>] igmp6_group_added+0x3b/0x120 [<ffffffff8149e5d8>] ipv6_mc_up+0x38/0x60 [<ffffffff81480a4d>] ipv6_find_idev+0x3d/0x80 [<ffffffff81483175>] addrconf_notify+0x3d5/0x4b0 [<ffffffff814fae3f>] notifier_call_chain+0x3f/0x80 [<ffffffff81073471>] raw_notifier_call_chain+0x11/0x20 [<ffffffff813d8722>] call_netdevice_notifiers+0x32/0x60 [<ffffffff813d92d4>] __dev_notify_flags+0x34/0x80 [<ffffffff813d9360>] dev_change_flags+0x40/0x70 [<ffffffff813ea627>] do_setlink+0x237/0x8a0 [<ffffffff813ebb6c>] rtnl_newlink+0x3ec/0x600 [<ffffffff813eb4d0>] rtnetlink_rcv_msg+0x160/0x310 [<ffffffff814040b9>] netlink_rcv_skb+0x89/0xb0 [<ffffffff813eb357>] rtnetlink_rcv+0x27/0x40 [<ffffffff81403e20>] netlink_unicast+0x140/0x180 [<ffffffff81404a9e>] netlink_sendmsg+0x33e/0x380 [<ffffffff813c4252>] sock_sendmsg+0x112/0x130 [<ffffffff813c537e>] __sys_sendmsg+0x44e/0x460 [<ffffffff813c5544>] sys_sendmsg+0x44/0x70 [<ffffffff814feab9>] system_call_fastpath+0x16/0x1b -> #0 (&ndev->lock){+.+...}: [<ffffffff810a798e>] check_prev_add+0x3de/0x440 [<ffffffff810a8027>] validate_chain+0x637/0x730 [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500 [<ffffffff810a8734>] lock_acquire+0x114/0x150 [<ffffffff814f6c82>] rt_read_lock+0x42/0x60 [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120 [<ffffffff8149b036>] mld_newpack+0xb6/0x160 [<ffffffff8149b18b>] add_grhead+0xab/0xc0 [<ffffffff8149d03b>] add_grec+0x3ab/0x460 [<ffffffff8149d14a>] mld_send_report+0x5a/0x150 [<ffffffff8149f99e>] igmp6_timer_handler+0x4e/0xb0 [<ffffffff8105705a>] call_timer_fn+0xca/0x1d0 [<ffffffff81057b9f>] run_timer_softirq+0x1df/0x2e0 [<ffffffff8104e8c7>] handle_pending_softirqs+0xf7/0x1f0 [<ffffffff8104ea3b>] __do_softirq_common+0x7b/0xf0 [<ffffffff8104f07f>] __thread_do_softirq+0x1af/0x210 [<ffffffff8104f1c1>] run_ksoftirqd+0xe1/0x1f0 [<ffffffff8106c7de>] kthread+0xae/0xc0 [<ffffffff814fff74>] kernel_thread_helper+0x4/0x10 actually we can just hold idev->lock before taking pmc->mca_lock, and avoid taking idev->lock again when iterating idev->addr_list, since the upper callers of mld_newpack() already take read_lock_bh(&idev->lock). Reported-by: dingtianhong <dingtianhong@huawei.com> Cc: dingtianhong <dingtianhong@huawei.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David S. Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: Chen Weilong <chenweilong@huawei.com> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b173ee48 |
|
26-Jun-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: resend MLD report if a link-local address completes DAD RFC3590/RFC3810 specifies we should resend MLD reports as soon as a valid link-local address is available. We now use the valid_ll_addr_cnt to check if it is necessary to resend a new report. Changes since Flavio Leitner's version: a) adapt for valid_ll_addr_cnt b) resend first reports directly in the path and just arm the timer for mc_qrv-1 resends. Reported-by: Flavio Leitner <fleitner@redhat.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ec047eb |
|
26-Jun-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: introduce per-interface counter for dad-completed ipv6 addresses To reduce the number of unnecessary router solicitations, MLDv2 and IGMPv3 messages we need to track the number of valid (as in non-optimistic, no-dad-failed and non-tentative) link-local addresses. Therefore, this patch implements a valid_ll_addr_cnt in struct inet6_dev. We now only emit router solicitations if the first link-local address finishes duplicate address detection. The changes for MLDv2 and IGMPv3 are in a follow-up patch. While there, also simplify one if statement(one minor nit I made in one of my previous patches): if (!...) do(); else return; <<into>> if (...) return; do(); Cc: Flavio Leitner <fbl@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David Stevens <dlstevens@us.ibm.com> Suggested-by: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
77ecaace |
|
25-Jun-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: rearm router solicitaion timer when setting new tokenized address When a new tokenized address gets installed we send out just one router solicition. We should send out `rtr_solicits' in case one router advertisment got lost. So, rearm the timer as we do in addrconf_dad_complete. Cc: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2b9651d7 |
|
24-Jun-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: remove old token ipv6 address as soon as possible If the tokenized ip address is re-set on an interface we depend on the arrival of a new router advertisment to call addrconf_verify to clean up the old address (which valid_lft is now set to 0). Old addresses can linger around for a longer time if e.g. the source of router advertisments vanishes. So, call addrconf_verify immediately after setting the new tokenized address to get rid of the old tokenized addresses. Cc: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dc848292 |
|
24-Jun-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: check return value of ipv6_get_lladdr We should check the return value of ipv6_get_lladdr in inet6_set_iftoken. A possible situation, which could leave ll_addr unassigned is, when the user removed her link-local address but a global scoped address was already set. In this case the interface would still be IF_READY and not dead. In that case the RS source address is some value from the stack. v2: Daniel Borkmann noted a small indent inconstancy; no semantic changes. Cc: Daniel Borkmann <dborkman@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Reviewed-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
876fd05d |
|
23-Jun-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: don't disable interface if last ipv6 address is removed The reason behind this change is that as soon as we delete the last ipv6 address of an interface we also lose the /proc/sys/net/ipv6/conf/<interface> directory. This seems to be a usability problem for me. I don't see any reason why we should shutdown ipv6 on that interface in such cases. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b7b1bfce |
|
23-Jun-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: split duplicate address detection and router solicitation timer This patch splits the timers for duplicate address detection and router solicitations apart. The router solicitations timer goes into inet6_dev and the dad timer stays in inet6_ifaddr. The reason behind this patch is to reduce the number of unneeded router solicitations send out by the host if additional link-local addresses are created. Currently we send out RS for every link-local address on an interface. If the RS timer fires we pick a source address with ipv6_get_lladdr. This change could hurt people adding additional link-local addresses and specifying these addresses in the radvd clients section because we no longer guarantee that we use every ll address as source address in router solicitations. Cc: Flavio Leitner <fleitner@redhat.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reviewed-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b33698e2 |
|
20-Jun-2013 |
Cong Wang <amwang@redhat.com> |
ipv6: remove a useless pr_info() in addrconf_gre_config() This is debug info, should at least be pr_debug(), but given that this code is in upstream for two years, there is no need to keep this debugging printk any more, so just remove it. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a881ae1f |
|
15-Jun-2013 |
Gao feng <gaofeng@cn.fujitsu.com> |
ipv6: don't call addrconf_dst_alloc again when enable lo If we disable all of the net interfaces, and enable un-lo interface before lo interface, we already allocated the addrconf dst in ipv6_add_addr. So we shouldn't allocate it again when we enable lo interface. Otherwise the message below will be triggered. unregister_netdevice: waiting for sit1 to become free. Usage count = 1 This problem is introduced by commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f "net IPv6 : Fix broken IPv6 routing table after loopback down-up" Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fe2c6338 |
|
12-Jun-2013 |
Joe Perches <joe@perches.com> |
net: Convert uses of typedef ctl_table to struct ctl_table Reduce the uses of this unnecessary typedef. Done via perl script: $ git grep --name-only -w ctl_table net | \ xargs perl -p -i -e '\ sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \ s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge' Reflow the modified lines that now exceed 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
534c8779 |
|
02-Jun-2013 |
Gao feng <gaofeng@cn.fujitsu.com> |
ipv6: assign rt6_info to inet6_ifaddr in init_loopback Commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f "net IPv6 : Fix broken IPv6 routing table after loopback down-up" forgot to assign rt6_info to the inet6_ifaddr. When disable the net device, the rt6_info which allocated in init_loopback will not be destroied in __ipv6_ifa_notify. This will trigger the waring message below [23527.916091] unregister_netdevice: waiting for tap0 to become free. Usage count = 1 Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75538c2b |
|
28-May-2013 |
Cong Wang <amwang@redhat.com> |
net: always pass struct netdev_notifier_info to netdevice notifiers commit 351638e7deeed2ec8ce451b53d3 (net: pass info struct via netdevice notifier) breaks booting of my KVM guest, this is due to we still forget to pass struct netdev_notifier_info in several places. This patch completes it. Cc: Jiri Pirko <jiri@resnulli.us> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
351638e7 |
|
27-May-2013 |
Jiri Pirko <jiri@resnulli.us> |
net: pass info struct via netdevice notifier So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2a7851bf |
|
16-May-2013 |
Florian Westphal <fw@strlen.de> |
netfilter: add nf_ipv6_ops hook to fix xt_addrtype with IPv6 Quoting https://bugzilla.netfilter.org/show_bug.cgi?id=812: [ ip6tables -m addrtype ] When I tried to use in the nat/PREROUTING it messes up the routing cache even if the rule didn't matched at all. [..] If I remove the --limit-iface-in from the non-working scenario, so just use the -m addrtype --dst-type LOCAL it works! This happens when LOCAL type matching is requested with --limit-iface-in, and the default ipv6 route is via the interface the packet we test arrived on. Because xt_addrtype uses ip6_route_output, the ipv6 routing implementation creates an unwanted cached entry, and the packet won't make it to the real/expected destination. Silently ignoring --limit-iface-in makes the routing work but it breaks rule matching (--dst-type LOCAL with limit-iface-in is supposed to only match if the dst address is configured on the incoming interface; without --limit-iface-in it will match if the address is reachable via lo). The test should call ipv6_chk_addr() instead. However, this would add a link-time dependency on ipv6. There are two possible solutions: 1) Revert the commit that moved ipt_addrtype to xt_addrtype, and put ipv6 specific code into ip6t_addrtype. 2) add new "nf_ipv6_ops" struct to register pointers to ipv6 functions. While the former might seem preferable, Pablo pointed out that there are more xt modules with link-time dependeny issues regarding ipv6, so lets go for 2). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
#
88924753 |
|
21-May-2013 |
Cong Wang <xiyou.wangcong@gmail.com> |
ipv6: use ipv6_addr_scope() helper ipv6_addr_type(&addr)&IPV6_ADDR_SCOPE_MASK could be replaced by ipv6_addr_scope(), which is slightly faster. Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7996c799 |
|
21-May-2013 |
Cong Wang <xiyou.wangcong@gmail.com> |
ipv6: use ipv6_addr_any() helper ipv6_addr_any() is a faster way to determine if an addr is ipv6 any addr, no need to compute the addr type. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
caeaba79 |
|
16-May-2013 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipv6: add support of peer address This patch adds the support of peer address for IPv6. For example, it is possible to specify the remote end of a 6inY tunnel. This was already possible in IPv4: ip addr add ip1 peer ip2 dev dev1 The peer address is specified with IFA_ADDRESS and the local address with IFA_LOCAL (like explained in include/uapi/linux/if_addr.h). Note that the API is not changed, because before this patch, it was not possible to specify two different addresses in IFA_LOCAL and IFA_REMOTE. There is a small change for the dump: if the peer is different from ::, IFA_ADDRESS will contain the peer address instead of the local address. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f88c91dd |
|
14-Apr-2013 |
Cong Wang <amwang@redhat.com> |
ipv6: statically link register_inet6addr_notifier() Tomas reported the following build error: net/built-in.o: In function `ieee80211_unregister_hw': (.text+0x10f0e1): undefined reference to `unregister_inet6addr_notifier' net/built-in.o: In function `ieee80211_register_hw': (.text+0x10f610): undefined reference to `register_inet6addr_notifier' make: *** [vmlinux] Error 1 when built IPv6 as a module. So we have to statically link these symbols. Reported-by: Tomas Melin <tomas.melin@iki.fi> Cc: Tomas Melin <tomas.melin@iki.fi> Cc: "David S. Miller" <davem@davemloft.net> Cc: YOSHIFUJI Hidaki <yoshfuji@linux-ipv6.org> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
617fe29d |
|
08-Apr-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: ipv6: only invalidate previously tokenized addresses Instead of invalidating all IPv6 addresses with global scope when one decides to use IPv6 tokens, we should only invalidate previous tokens and leave the rest intact until they expire eventually (or are intact forever). For doing this less greedy approach, we're adding a bool at the end of inet6_ifaddr structure instead, for two reasons: i) per-inet6_ifaddr flag space is already used up, making it wider might not be a good idea, since ii) also we do not necessarily need to export this information into user space. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fc403832 |
|
08-Apr-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: ipv6: also allow token to be set when device not ready When we set the iftoken in inet6_set_iftoken(), we return -EINVAL when the device does not have flag IF_READY. This is however not necessary and rather an artificial usability barrier, since we simply can set the token despite that, and in case the device is ready, we just send out our rs, otherwise ifup et al. will do this for us anyway. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
914faa14 |
|
08-Apr-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: ipv6: minor: use in6addr_any in token init Since we check for !ipv6_addr_any(&in6_dev->token) in addrconf_prefix_rcv(), make the token initialization on device setup more intuitive by using in6addr_any as an initializer. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f53adae4 |
|
07-Apr-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: ipv6: add tokenized interface identifier support This patch adds support for IPv6 tokenized IIDs, that allow for administrators to assign well-known host-part addresses to nodes whilst still obtaining global network prefix from Router Advertisements. It is currently in draft status. The primary target for such support is server platforms where addresses are usually manually configured, rather than using DHCPv6 or SLAAC. By using tokenised identifiers, hosts can still determine their network prefix by use of SLAAC, but more readily be automatically renumbered should their network prefix change. [...] The disadvantage with static addresses is that they are likely to require manual editing should the network prefix in use change. If instead there were a method to only manually configure the static identifier part of the IPv6 address, then the address could be automatically updated when a new prefix was introduced, as described in [RFC4192] for example. In such cases a DNS server might be configured with such a tokenised interface identifier of ::53, and SLAAC would use the token in constructing the interface address, using the advertised prefix. [...] http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02 The implementation is partially based on top of Mark K. Thompson's proof of concept. However, it uses the Netlink interface for configuration resp. data retrival, so that it can be easily extended in future. Successfully tested by myself. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
25fb6ca4 |
|
02-Apr-2013 |
Balakumaran Kannan <kumaran.4353@gmail.com> |
net IPv6 : Fix broken IPv6 routing table after loopback down-up IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo) interface. After down-up, routes of other interface's IPv6 addresses through 'lo' are lost. IPv6 addresses assigned to all interfaces are routed through 'lo' for internal communication. Once 'lo' is down, those routing entries are removed from routing table. But those removed entries are not being re-created properly when 'lo' is brought up. So IPv6 addresses of other interfaces becomes unreachable from the same machine. Also this breaks communication with other machines because of NDISC packet processing failure. This patch fixes this issue by reading all interface's IPv6 addresses and adding them to IPv6 routing table while bringing up 'lo'. ==Testing== Before applying the patch: $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ sudo ifdown lo $ sudo ifup lo $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ After applying the patch: $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ sudo ifdown lo $ sudo ifup lo $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com> Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb6bf355 |
|
25-Mar-2013 |
YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> |
firewire net, ipv6: IPv6 over Firewire (RFC3146) support. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a79ca223 |
|
25-Mar-2013 |
Hong Zhiguo <honkiko@gmail.com> |
ipv6: fix bad free of addrconf_init_net Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
63998ac2 |
|
22-Mar-2013 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipv6: provide addr and netconf dump consistency info This patch adds a dev_addr_genid for IPv6. The goal is to use it, combined with dev_base_seq to check if a change occurs during a netlink dump. If a change is detected, the flag NLM_F_DUMP_INTR is set in the first message after the dump was interrupted. Note that only dump of unicast addresses is checked (multicast and anycast are not checked). Reported-by: Junwei Zhang <junwei.zhang@6wind.com> Reported-by: Hongjun Li <hongjun.li@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
661d2967 |
|
21-Mar-2013 |
Thomas Graf <tgraf@suug.ch> |
rtnetlink: Remove passing of attributes into rtnl_doit functions With decnet converted, we can finally get rid of rta_buf and its computations around it. It also gets rid of the minimal header length verification since all message handlers do that explicitly anyway. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7a674200 |
|
05-Mar-2013 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
netconf: add the handler to dump entries It's useful to be able to get the initial state of all entries. The patch adds the support for IPv4 and IPv6. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b67bfe0d |
|
27-Feb-2013 |
Sasha Levin <sasha.levin@oracle.com> |
hlist: drop the node parameter from iterators I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ece31ffd |
|
17-Feb-2013 |
Gao feng <gaofeng@cn.fujitsu.com> |
net: proc: change proc_net_remove to remove_proc_entry proc_net_remove is only used to remove proc entries that under /proc/net,it's not a general function for removing proc entries of netns. if we want to remove some proc entries which under /proc/net/stat/, we still need to call remove_proc_entry. this patch use remove_proc_entry to replace proc_net_remove. we can remove proc_net_remove after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d4beaa66 |
|
17-Feb-2013 |
Gao feng <gaofeng@cn.fujitsu.com> |
net: proc: change proc_net_fops_create to proc_create Right now, some modules such as bonding use proc_create to create proc entries under /proc/net/, and other modules such as ipv4 use proc_net_fops_create. It looks a little chaos.this patch changes all of proc_net_fops_create to proc_create. we can remove proc_net_fops_create after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2c5e8933 |
|
09-Feb-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: by default join ff01::1 and in case of forwarding ff01::2 and ff05:2 Cc: Erik Hugne <erik.hugne@ericsson.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5e98a36e |
|
28-Jan-2013 |
YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> |
ipv6 addrconf: Fix interface identifiers of 802.15.4 devices. The "Universal/Local" (U/L) bit must be complmented according to RFC4944 and RFC2464. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5c766d64 |
|
24-Jan-2013 |
Jiri Pirko <jiri@resnulli.us> |
ipv4: introduce address lifetime There are some usecase when lifetime of ipv4 addresses might be helpful. For example: 1) initramfs networkmanager uses a DHCP daemon to learn network configuration parameters 2) initramfs networkmanager addresses, routes and DNS configuration 3) initramfs networkmanager is requested to stop 4) initramfs networkmanager stops all daemons including dhclient 5) there are addresses and routes configured but no daemon running. If the system doesn't start networkmanager for some reason, addresses and routes will be used forever, which violates RFC 2131. This patch is essentially a backport of ivp6 address lifetime mechanism for ipv4 addresses. Current "ip" tool supports this without any patch (since it does not distinguish between ipv4 and ipv6 addresses in this perspective. Also, this should be back-compatible with all current netlink users. Reported-by: Pavel Šimerda <psimerda@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3f0d2ba0 |
|
21-Jan-2013 |
YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> |
ipv6: Use IS_ERR_OR_NULL(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
21caa662 |
|
09-Jan-2013 |
Romain Kuntz <r.kuntz@ipflavors.com> |
ipv6: use addrconf_get_prefix_route for prefix route lookup [v2] Replace ip6_route_lookup() with addrconf_get_prefix_route() when looking up for a prefix route. This ensures that the connected prefix is looked up in the main table, and avoids the selection of other matching routes located in different tables as well as blackhole or prohibited entries. In addition, this fixes an Opps introduced by commit 64c6d08e (ipv6: del unreachable route when an addr is deleted on lo), that would occur when a blackhole or prohibited entry is selected by ip6_route_lookup(). Such entries have a NULL rt6i_table argument, which is accessed by __ip6_del_rt() when trying to lock rt6i_table->tb6_lock. The function addrconf_is_prefix_route() is not used anymore and is removed. [v2] Minor indentation cleanup and log updates. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
85da53bf |
|
09-Jan-2013 |
Romain Kuntz <r.kuntz@ipflavors.com> |
ipv6: fix the noflags test in addrconf_get_prefix_route The tests on the flags in addrconf_get_prefix_route() does no make much sense: the 'noflags' parameter contains the set of flags that must not match with the route flags, so the test must be done against 'noflags', and not against 'flags'. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bd779028 |
|
17-Dec-2012 |
Cong Ding <dinggnu@gmail.com> |
ipv6: addrconf.c: remove unnecessary "if" the value of err is always negative if it goes to errout, so we don't need to check the value of err. Signed-off-by: Cong Ding <dinggnu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b1afce95 |
|
04-Dec-2012 |
David S. Miller <davem@davemloft.net> |
ipv6: Protect ->mc_forwarding access with CONFIG_IPV6_MROUTE Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d67b8c61 |
|
03-Dec-2012 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
netconf: advertise mc_forwarding status This patch advertise the MC_FORWARDING status for IPv4 and IPv6. This field is readonly, only multicast engine in the kernel updates it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ba2add3 |
|
01-Dec-2012 |
Shmulik Ladkani <shmulik.ladkani@gmail.com> |
ipv6: Make 'addrconf_rs_timer' send Router Solicitations (and re-arm itself) if Router Advertisements are accepted As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted], Router Solicitations are sent whenever kernel accepts Router Advertisements on the interface. However, this logic isn't reflected in 'addrconf_rs_timer'. The timer fails to issue subsequent RS messages (and fails to re-arm itself) if forwarding is enabled and the special hybrid mode is enabled (accept_ra=2). Fix the condition determining whether next RS should be sent, by using 'ipv6_accept_ra()'. Reported-by: Ami Koren <amikoren@yahoo.com> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aeaf6e9d |
|
30-Nov-2012 |
Shmulik Ladkani <shmulik.ladkani@gmail.com> |
ipv6: unify logic evaluating inet6_dev's accept_ra property As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted], the logic determining whether to send Router Solicitations is identical to the logic determining whether kernel accepts Router Advertisements. However the condition itself is repeated in several code locations. Unify it by introducing 'ipv6_accept_ra()' accessor. Also, simplify the condition expression, making it more readable. No semantic change. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b51642f6 |
|
15-Nov-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Enable a userns root rtnl calls that are safe for unprivilged users - Only allow moving network devices to network namespaces you have CAP_NET_ADMIN privileges over. - Enable creating/deleting/modifying interfaces - Enable adding/deleting addresses - Enable adding/setting/deleting neighbour entries - Enable adding/removing routes - Enable adding/removing fib rules - Enable setting the forwarding state - Enable adding/removing ipv6 address labels - Enable setting bridge parameter Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c027aab4 |
|
15-Nov-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Enable some sysctls that are safe for the userns root - Enable the per device ipv4 sysctls: net/ipv4/conf/<if>/forwarding net/ipv4/conf/<if>/mc_forwarding net/ipv4/conf/<if>/accept_redirects net/ipv4/conf/<if>/secure_redirects net/ipv4/conf/<if>/shared_media net/ipv4/conf/<if>/rp_filter net/ipv4/conf/<if>/send_redirects net/ipv4/conf/<if>/accept_source_route net/ipv4/conf/<if>/accept_local net/ipv4/conf/<if>/src_valid_mark net/ipv4/conf/<if>/proxy_arp net/ipv4/conf/<if>/medium_id net/ipv4/conf/<if>/bootp_relay net/ipv4/conf/<if>/log_martians net/ipv4/conf/<if>/tag net/ipv4/conf/<if>/arp_filter net/ipv4/conf/<if>/arp_announce net/ipv4/conf/<if>/arp_ignore net/ipv4/conf/<if>/arp_accept net/ipv4/conf/<if>/arp_notify net/ipv4/conf/<if>/proxy_arp_pvlan net/ipv4/conf/<if>/disable_xfrm net/ipv4/conf/<if>/disable_policy net/ipv4/conf/<if>/force_igmp_version net/ipv4/conf/<if>/promote_secondaries net/ipv4/conf/<if>/route_localnet - Enable the global ipv4 sysctl: net/ipv4/ip_forward - Enable the per device ipv6 sysctls: net/ipv6/conf/<if>/forwarding net/ipv6/conf/<if>/hop_limit net/ipv6/conf/<if>/mtu net/ipv6/conf/<if>/accept_ra net/ipv6/conf/<if>/accept_redirects net/ipv6/conf/<if>/autoconf net/ipv6/conf/<if>/dad_transmits net/ipv6/conf/<if>/router_solicitations net/ipv6/conf/<if>/router_solicitation_interval net/ipv6/conf/<if>/router_solicitation_delay net/ipv6/conf/<if>/force_mld_version net/ipv6/conf/<if>/use_tempaddr net/ipv6/conf/<if>/temp_valid_lft net/ipv6/conf/<if>/temp_prefered_lft net/ipv6/conf/<if>/regen_max_retry net/ipv6/conf/<if>/max_desync_factor net/ipv6/conf/<if>/max_addresses net/ipv6/conf/<if>/accept_ra_defrtr net/ipv6/conf/<if>/accept_ra_pinfo net/ipv6/conf/<if>/accept_ra_rtr_pref net/ipv6/conf/<if>/router_probe_interval net/ipv6/conf/<if>/accept_ra_rt_info_max_plen net/ipv6/conf/<if>/proxy_ndp net/ipv6/conf/<if>/accept_source_route net/ipv6/conf/<if>/optimistic_dad net/ipv6/conf/<if>/mc_forwarding net/ipv6/conf/<if>/disable_ipv6 net/ipv6/conf/<if>/accept_dad net/ipv6/conf/<if>/force_tllao - Enable the global ipv6 sysctls: net/ipv6/bindv6only net/ipv6/icmp/ratelimit Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
af31f412 |
|
15-Nov-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Allow userns root to control ipv6 Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) and capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns, CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls. Settings that merely control a single network device are allowed. Either the network device is a logical network device where restrictions make no difference or the network device is hardware NIC that has been explicity moved from the initial network namespace. In general policy and network stack state changes are allowed while resource control is left unchanged. Allow the SIOCSIFADDR ioctl to add ipv6 addresses. Allow the SIOCDIFADDR ioctl to delete ipv6 addresses. Allow the SIOCADDRT ioctl to add ipv6 routes. Allow the SIOCDELRT ioctl to delete ipv6 routes. Allow creation of ipv6 raw sockets. Allow setting the IPV6_JOIN_ANYCAST socket option. Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR socket option. Allow setting the IPV6_TRANSPARENT socket option. Allow setting the IPV6_HOPOPTS socket option. Allow setting the IPV6_RTHDRDSTOPTS socket option. Allow setting the IPV6_DSTOPTS socket option. Allow setting the IPV6_IPSEC_POLICY socket option. Allow setting the IPV6_XFRM_POLICY socket option. Allow sending packets with the IPV6_2292HOPOPTS control message. Allow sending packets with the IPV6_2292DSTOPTS control message. Allow sending packets with the IPV6_RTHDRDSTOPTS control message. Allow setting the multicast routing socket options on non multicast routing sockets. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for setting up, changing and deleting tunnels over ipv6. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for setting up, changing and deleting ipv6 over ipv4 tunnels. Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding, deleting, and changing the potential router list for ISATAP tunnels. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dfc47ef8 |
|
15-Nov-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Push capable(CAP_NET_ADMIN) into the rtnl methods - In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check to ns_capable(net->user-ns, CAP_NET_ADMIN). Allowing unprivileged users to make netlink calls to modify their local network namespace. - In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so that calls that are not safe for unprivileged users are still protected. Later patches will remove the extra capable calls from methods that are safe for unprivilged users. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
464dc801 |
|
15-Nov-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Don't export sysctls to unprivileged users In preparation for supporting the creation of network namespaces by unprivileged users, modify all of the per net sysctl exports and refuse to allow them to unprivileged users. This makes it safe for unprivileged users in general to access per net sysctls, and allows sysctls to be exported to unprivileged users on an individual basis as they are deemed safe. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5cb04436 |
|
06-Nov-2012 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: add knob to send unsolicited ND on link-layer address change This patch introduces a new knob ndisc_notify. If enabled, the kernel will transmit an unsolicited neighbour advertisement on link-layer address change to update the neighbour tables of the corresponding hosts more quickly. This is the equivalent to arp_notify in ipv4 world. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
94e187c0 |
|
28-Oct-2012 |
Amerigo Wang <amwang@redhat.com> |
ipv6: introduce ip6_rt_put() As suggested by Eric, we could introduce a helper function for ipv6 too, to avoid checking if rt is NULL before dst_release(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a940835 |
|
28-Oct-2012 |
Amerigo Wang <amwang@redhat.com> |
ipv6: remove a useless NULL check In dev_forward_change(), it is useless to check if idev->dev is NULL, it is always non-NULL here. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07a93626 |
|
29-Oct-2012 |
Amerigo Wang <amwang@redhat.com> |
ipv6: use IS_ENABLED() #if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE) can be replaced by #if IS_ENABLED(CONFIG_FOO) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
76f8f6cb |
|
25-Oct-2012 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
rtnl/ipv6: add support of RTM_GETNETCONF This message allows to get the devconf for an interface. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3a1bfb1 |
|
25-Oct-2012 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
rtnl/ipv6: use netconf msg to advertise forwarding status Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9f0d3c27 |
|
16-Oct-2012 |
Eric Dumazet <edumazet@google.com> |
ipv6: addrconf: fix /proc/net/if_inet6 Commit 1d5783030a1 (ipv6/addrconf: speedup /proc/net/if_inet6 filling) added bugs hiding some devices from if_inet6 and breaking applications. "ip -6 addr" could still display all IPv6 addresses, while "ifconfig -a" couldnt. One way to reproduce the bug is by starting in a shell : unshare -n /bin/bash ifconfig lo up And in original net namespace, lo device disappeared from if_inet6 Reported-by: Jan Hinnerk Stosch <janhinnerk.stosch@gmail.com> Tested-by: Jan Hinnerk Stosch <janhinnerk.stosch@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mihai Maruseac <mihai.maruseac@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
62b54dd9 |
|
01-Oct-2012 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipv6: don't add link local route when there is no link local address When an address is added on loopback (ip -6 a a 2002::1/128 dev lo), a route to fe80::/64 is added in the main table: unreachable fe80::/64 dev lo proto kernel metric 256 error -101 This route does not match any prefix (no fe80:: address on lo). In fact, addrconf_dev_config() will not add link local address because this function filters interfaces by type. If the link local address is added manually, the route to the link local prefix will be automatically added by addrconf_add_linklocal(). Note also, that this route is not deleted when the address is removed. After looking at the code, it seems that addrconf_add_lroute() is redundant with addrconf_add_linklocal(), because this function will add the link local route when the link local address is configured. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
64c6d08e |
|
25-Sep-2012 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipv6: del unreachable route when an addr is deleted on lo When an address is added on loopback (ip -6 a a 2002::1/128 dev lo), two routes are added: - one in the local table: local 2002::1 via :: dev lo proto none metric 0 - one the in main table (for the prefix): unreachable 2002::1 dev lo proto kernel metric 256 error -101 When the address is deleted, the route inserted in the main table remains because we use rt6_lookup(), which returns NULL when dst->error is set, which is the case here! Thus, it is better to use ip6_route_lookup() to avoid this kind of filter. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5744dd9b |
|
11-Sep-2012 |
Li RongQing <roy.qing.li@gmail.com> |
ipv6: replace write lock with read lock when get route info geting route info does not write rt->rt6i_table, so replace write lock with read lock Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
91b4b04f |
|
10-Sep-2012 |
YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> |
ipv6: Compare addresses only bits up to the prefix length (RFC6724). Compare bits up to the source address's prefix length only to allows DNS load balancing to continue to be used as a tie breaker. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
15e47304 |
|
07-Sep-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
netlink: Rename pid to portid to avoid confusion It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb7e0575 |
|
29-Aug-2012 |
Sorin Dumitru <sdumitru@ixiacom.com> |
ipv6: remove some deadcode __ipv6_regen_rndid no longer returns anything other than 0 so there's no point in verifying what it returns Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3f644fc |
|
26-Aug-2012 |
Patrick McHardy <kaber@trash.net> |
netfilter: ip6tables: add MASQUERADE target Signed-off-by: Patrick McHardy <kaber@trash.net>
|
#
748e2d93 |
|
22-Aug-2012 |
Eric Dumazet <edumazet@google.com> |
net: reinstate rtnl in call_netdevice_notifiers() Eric Biederman pointed out that not holding RTNL while calling call_netdevice_notifiers() was racy. This patch is a direct transcription his feedback against commit 0115e8e30d6fc (net: remove delay at device dismantle) Thanks Eric ! Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0115e8e3 |
|
22-Aug-2012 |
Eric Dumazet <edumazet@google.com> |
net: remove delay at device dismantle I noticed extra one second delay in device dismantle, tracked down to a call to dst_dev_event() while some call_rcu() are still in RCU queues. These call_rcu() were posted by rt_free(struct rtable *rt) calls. We then wait a little (but one second) in netdev_wait_allrefs() before kicking again NETDEV_UNREGISTER. As the call_rcu() are now completed, dst_dev_event() can do the needed device swap on busy dst. To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called after a rcu_barrier(), but outside of RTNL lock. Use NETDEV_UNREGISTER_FINAL with care ! Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after IP cache removal. With help from Gao feng Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4acd4945 |
|
14-Aug-2012 |
Ben Hutchings <bhutchings@solarflare.com> |
ipv6: addrconf: Avoid calling netdevice notifiers with RCU read-side lock Cong Wang reports that lockdep detected suspicious RCU usage while enabling IPV6 forwarding: [ 1123.310275] =============================== [ 1123.442202] [ INFO: suspicious RCU usage. ] [ 1123.558207] 3.6.0-rc1+ #109 Not tainted [ 1123.665204] ------------------------------- [ 1123.768254] include/linux/rcupdate.h:430 Illegal context switch in RCU read-side critical section! [ 1123.992320] [ 1123.992320] other info that might help us debug this: [ 1123.992320] [ 1124.307382] [ 1124.307382] rcu_scheduler_active = 1, debug_locks = 0 [ 1124.522220] 2 locks held by sysctl/5710: [ 1124.648364] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81768498>] rtnl_trylock+0x15/0x17 [ 1124.882211] #1: (rcu_read_lock){.+.+.+}, at: [<ffffffff81871df8>] rcu_lock_acquire+0x0/0x29 [ 1125.085209] [ 1125.085209] stack backtrace: [ 1125.332213] Pid: 5710, comm: sysctl Not tainted 3.6.0-rc1+ #109 [ 1125.441291] Call Trace: [ 1125.545281] [<ffffffff8109d915>] lockdep_rcu_suspicious+0x109/0x112 [ 1125.667212] [<ffffffff8107c240>] rcu_preempt_sleep_check+0x45/0x47 [ 1125.781838] [<ffffffff8107c260>] __might_sleep+0x1e/0x19b [...] [ 1127.445223] [<ffffffff81757ac5>] call_netdevice_notifiers+0x4a/0x4f [...] [ 1127.772188] [<ffffffff8175e125>] dev_disable_lro+0x32/0x6b [ 1127.885174] [<ffffffff81872d26>] dev_forward_change+0x30/0xcb [ 1128.013214] [<ffffffff818738c4>] addrconf_forward_change+0x85/0xc5 [...] addrconf_forward_change() uses RCU iteration over the netdev list, which is unnecessary since it already holds the RTNL lock. We also cannot reasonably require netdevice notifier functions not to sleep. Reported-by: Cong Wang <amwang@redhat.com> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ddbe5032 |
|
18-Jul-2012 |
Eric Dumazet <edumazet@google.com> |
ipv6: add ipv6_addr_hash() helper Introduce ipv6_addr_hash() helper doing a XOR on all bits of an IPv6 address, with an optimized x86_64 version. Use it in flow dissector, as suggested by Andrew McGregor, to reduce hash collision probabilities in fq_codel (and other users of flow dissector) Use it in ip6_tunnel.c and use more bit shuffling, as suggested by David Laight, as existing hash was ignoring most of them. Use it in sunrpc and use more bit shuffling, using hash_32(). Use it in net/ipv6/addrconf.c, using hash_32() as well. As a cleanup, use it in net/ipv4/tcp_metrics.c Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrew McGregor <andrewmcgr@gmail.com> Cc: Dave Taht <dave.taht@gmail.com> Cc: Tom Herbert <therbert@google.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
91df42be |
|
15-May-2012 |
Joe Perches <joe@perches.com> |
net: ipv4 and ipv6: Convert printk(KERN_DEBUG to pr_debug Use the current debugging style and enable dynamic_debug. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3213831 |
|
15-May-2012 |
Joe Perches <joe@perches.com> |
net: ipv6: Standardize prefixes for message logging Add #define pr_fmt(fmt) as appropriate. Add "IPv6: " to appropriate files. Convert printk(KERN_<LEVEL> to pr_<level> (but not KERN_DEBUG). Standardize on "%s: " not "%s(): " when emitting __func__. Use "%s: ", __func__ instead of embedding function name. Coalesce formats, align arguments. ADDRCONF output is now prefixed with "IPv6: " Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
211ed865 |
|
10-May-2012 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
net: delete all instances of special processing for token ring We are going to delete the Token ring support. This removes any special processing in the core networking for token ring, (aside from net/tr.c itself), leaving the drivers and remaining tokenring support present but inert. The mass removal of the drivers and net/tr.c will be in a separate commit, so that the history of these files that we still care about won't have the giant deletion tied into their history. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
e87cc472 |
|
13-May-2012 |
Joe Perches <joe@perches.com> |
net: Convert net_ratelimit uses to net_<level>_ratelimited Standardize the net core ratelimited logging functions. Coalesce formats, align arguments. Change a printk then vprintk sequence to use printf extension %pV. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
06a4c1c5 |
|
09-May-2012 |
alex.bluesman.smirnov@gmail.com <alex.bluesman.smirnov@gmail.com> |
6lowpan: IPv6 link local address According to the RFC4944 (Transmission of IPv6 Packets over IEEE 802.15.4 Networks), chapter 7: The IPv6 link-local address [RFC4291] for an IEEE 802.15.4 interface is formed by appending the Interface Identifier, as defined above, to the prefix FE80::/64. 10 bits 54 bits 64 bits +----------+-----------------------+----------------------------+ |1111111010| (zeros) | Interface Identifier | +----------+-----------------------+----------------------------+ This patch adds IPv6 address generation support for the 6lowpan interfaces. Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6105e293 |
|
19-Apr-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net ipv6: Convert addrconf to use register_net_sysctl Using an ascii path to register_net_sysctl as opposed to the slightly awkward ctl_path allows for much simpler code. We no longer need to malloc dev_name to keep it alive the length of our sysctl register instead we can use a small temporary buffer on the stack. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cf22f9a2 |
|
14-Apr-2012 |
David S. Miller <davem@davemloft.net> |
ipv6: Remove unused argument to addrconf_dad_start(). Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1716a961 |
|
05-Apr-2012 |
Gao feng <gaofeng@cn.fujitsu.com> |
ipv6: fix problem with expired dst cache If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet. this dst cache will not check expire because it has no RTF_EXPIRES flag. So this dst cache will always be used until the dst gc run. Change the struct dst_entry,add a union contains new pointer from and expires. When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use. we can use this field to point to where the dst cache copy from. The dst.from is only used in IPV6. rt6_check_expired check if rt6_info.dst.from is expired. ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF and RTF_DEFAULT.then hold the ort. ip6_dst_destroy release the ort. Add some functions to operate the RTF_EXPIRES flag and expires(from) together. and change the code to use these new adding functions. Changes from v5: modify ip6_route_add and ndisc_router_discovery to use new adding functions. Only set dst.from when the ort has flag RTF_ADDRCONF and RTF_DEFAULT.then hold the ort. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e5e8f30 |
|
01-Apr-2012 |
Eldad Zack <eldad@fogrefinery.com> |
net/ipv6/addrconf.c: Checkpatch cleanups net/ipv6/addrconf.c:340: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable net/ipv6/addrconf.c:342: ERROR: "foo * bar" should be "foo *bar" net/ipv6/addrconf.c:444: ERROR: "foo * bar" should be "foo *bar" net/ipv6/addrconf.c:1337: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable net/ipv6/addrconf.c:1526: ERROR: "(foo*)" should be "(foo *)" net/ipv6/addrconf.c:1671: ERROR: open brace '{' following function declarations go on the next line net/ipv6/addrconf.c:1914: ERROR: "foo * bar" should be "foo *bar" net/ipv6/addrconf.c:2368: ERROR: "foo * bar" should be "foo *bar" net/ipv6/addrconf.c:2370: ERROR: "foo * bar" should be "foo *bar" net/ipv6/addrconf.c:2416: ERROR: "foo * bar" should be "foo *bar" net/ipv6/addrconf.c:2437: ERROR: "foo * bar" should be "foo *bar" net/ipv6/addrconf.c:2573: ERROR: "foo * bar" should be "foo *bar" net/ipv6/addrconf.c:3797: ERROR: "foo* bar" should be "foo *bar" Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c78679e8 |
|
01-Apr-2012 |
David S. Miller <davem@davemloft.net> |
ipv6: Stop using NLA_PUT*(). These macros contain a hidden goto, and are thus extremely error prone and make code hard to audit. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8b2aaede |
|
07-Mar-2012 |
Li Wei <lw@cn.fujitsu.com> |
ipv6: Fix Smatch warning. With commit d6ddef9e641d(IPv6: Fix not join all-router mcast group when forwarding set.) I check 'dev' after it's dereference that leads to a Smatch complaint: net/ipv6/addrconf.c:438 ipv6_add_dev() warn: variable dereferenced before check 'dev' (see line 432) net/ipv6/addrconf.c 431 /* protected by rtnl_lock */ 432 rcu_assign_pointer(dev->ip6_ptr, ndev); ^^^^^^^^^^^^ Old dereference. 433 434 /* Join all-node multicast group */ 435 ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes); 436 437 /* Join all-router multicast group if forwarding is set */ 438 if (ndev->cnf.forwarding && dev && (dev->flags & IFF_MULTICAST)) ^^^ Remove the check to avoid the complaint as 'dev' can't be NULL. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d6ddef9e |
|
05-Mar-2012 |
Li Wei <lw@cn.fujitsu.com> |
IPv6: Fix not join all-router mcast group when forwarding set. When forwarding was set and a new net device is register, we need add this device to the all-router mcast group. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
013d97e9 |
|
16-Jan-2012 |
Francesco Ruggeri <fruggeri@aristanetworks.com> |
net: race condition in ipv6 forwarding and disable_ipv6 parameters There is a race condition in addrconf_sysctl_forward() and addrconf_sysctl_disable(). These functions change idev->cnf.forwarding (resp. idev->cnf.disable_ipv6) and then try to grab the rtnl lock before performing any actions. If that fails they restore the original value and restart the syscall. This creates race conditions if ipv6 code tries to access these parameters, or if multiple instances try to do the same operation. As an example of the former, if __ipv6_ifa_notify() finds a 0 in idev->cnf.forwarding when invoked by addrconf_ifdown() it may not free anycast addresses, ultimately resulting in the net_device not being freed. This patch reads the user parameters into a temporary location and only writes the actual parameters when the rtnl lock is acquired. Tested in 2.6.38.8. Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cf778b00 |
|
11-Jan-2012 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: reintroduce missing rcu_assign_pointer() calls commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER) did a lot of incorrect changes, since it did a complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x, y). We miss needed barriers, even on x86, when y is not NULL. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1d578303 |
|
03-Jan-2012 |
Mihai Maruseac <mihai.maruseac@gmail.com> |
ipv6/addrconf: speedup /proc/net/if_inet6 filling This ensures a linear behaviour when filling /proc/net/if_inet6 thus making ifconfig run really fast on IPv6 only addresses. In fact, with this patch and the IPv4 one sent a while ago, ifconfig will run in linear time regardless of address type. IPv4 related patch: f04565ddf52e401880f8ba51de0dff8ba51c99fd dev: use name hash for dev_seq_ops ... Some statistics (running ifconfig > /dev/null on a different setup): iface count / IPv6 no-patch time / IPv6 patched time / IPv4 time ---------------------------------------------------------------- 6250 | 0.23 s | 0.13 s | 0.11 s 12500 | 0.62 s | 0.28 s | 0.22 s 25000 | 2.91 s | 0.57 s | 0.46 s 50000 | 11.37 s | 1.21 s | 0.94 s 128000 | 86.78 s | 3.05 s | 2.54 s Signed-off-by: Mihai Maruseac <mmaruseac@ixiacom.com> Cc: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e6bff995 |
|
04-Jan-2012 |
Neil Horman <nhorman@tuxdriver.com> |
ipv6: Check RA for sllao when configuring optimistic ipv6 address (v2) Recently Dave noticed that a test we did in ipv6_add_addr to see if we next hop route for the interface we're adding an addres to was wrong (see commit 7ffbcecbeed91e5874e9a1cfc4c0cbb07dac3069). for one, it never triggers, and two, it was completely wrong to begin with. This test was meant to cover this section of RFC 4429: 3.3 Modifications to RFC 2462 Stateless Address Autoconfiguration * (modifies section 5.5) A host MAY choose to configure a new address as an Optimistic Address. A host that does not know the SLLAO of its router SHOULD NOT configure a new address as Optimistic. A router SHOULD NOT configure an Optimistic Address. This patch should bring us into proper compliance with the above clause. Since we only add a SLAAC address after we've received a RA which may or may not contain a source link layer address option, we can pass a pointer to that option to addrconf_prefix_rcv (which may be null if the option is not present), and only set the optimistic flag if the option was found in the RA. Change notes: (v2) modified the new parameter to addrconf_prefix_rcv to be a bool rather than a pointer to make its use more clear as per request from davem. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d1918542 |
|
28-Dec-2011 |
David S. Miller <davem@davemloft.net> |
ipv6: Kill rt6i_dev and rt6i_expires defines. It just obscures that the netdevice pointer and the expires value are implemented in the dst_entry sub-object of the ipv6 route. And it makes grepping for dst_entry member uses much harder too. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7ffbcecb |
|
27-Dec-2011 |
David Miller <davem@davemloft.net> |
ipv6: Remove optimistic DAD flag test in ipv6_add_addr() The route we have here is for the address being added to the interface, ie. for input packet processing. Therefore using that route to determine whether an output nexthop gateway is known and resolved doesn't make any sense. So, simply remove this test, it never triggered anyways. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-By: Neil Horman <nhorman@tuxdriver.com>
|
#
4af04aba |
|
06-Dec-2011 |
Li Wei <lw@cn.fujitsu.com> |
ipv6: Fix for adding multicast route for loopback device automatically. There is no obvious reason to add a default multicast route for loopback devices, otherwise there would be a route entry whose dst.error set to -ENETUNREACH that would blocking all multicast packets. ==================== [ more detailed explanation ] The problem is that the resulting routing table depends on the sequence of interface's initialization and in some situation, that would block all muticast packets. Suppose there are two interfaces on my computer (lo and eth0), if we initailize 'lo' before 'eth0', the resuting routing table(for multicast) would be # ip -6 route show | grep ff00:: unreachable ff00::/8 dev lo metric 256 error -101 ff00::/8 dev eth0 metric 256 When sending multicasting packets, routing subsystem will return the first route entry which with a error set to -101(ENETUNREACH). I know the kernel will set the default ipv6 address for 'lo' when it is up and won't set the default multicast route for it, but there is no reason to stop 'init' program from setting address for 'lo', and that is exactly what systemd did. I am sure there is something wrong with kernel or systemd, currently I preferred kernel caused this problem. ==================== Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8f031519 |
|
06-Dec-2011 |
David S. Miller <davem@davemloft.net> |
ipv6: Make third arg to anycast_dst_alloc() bool. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27217455 |
|
02-Dec-2011 |
David Miller <davem@davemloft.net> |
net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}. To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
|
#
4e3fd7a0 |
|
20-Nov-2011 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: remove ipv6_addr_copy() C assignment can handle struct in6_addr copying. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bc3b2d7f |
|
15-Jul-2011 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules These files are non modular, but need to export symbols using the macros now living in export.h -- call out the include so that things won't break when we remove the implicit presence of module.h from everywhere. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
14ef37b6 |
|
25-Oct-2011 |
Andreas Hofmeister <andi@collax.com> |
ipv6: fix route lookup in addrconf_prefix_rcv() The route lookup to find a previously auto-configured route for a prefixes used to use rt6_lookup(), with the prefix from the RA used as an address. However, that kind of lookup ignores routing tables, the prefix length and route flags, so when there were other matching routes, even in different tables and/or with a different prefix length, the wrong route would be manipulated. Now, a new function "addrconf_get_prefix_route()" is used for the route lookup, which searches in RT6_TABLE_PREFIX and takes the prefix-length and route flags into account. Signed-off-by: Andreas Hofmeister <andi@collax.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8603e33d |
|
20-Sep-2011 |
Roy Li <rongqing.li@windriver.com> |
ipv6: fix a possible double free When calling snmp6_alloc_dev fails, the snmp6 relevant memory are freed by snmp6_alloc_dev. Calling in6_dev_finish_destroy will free these memory twice. Double free will lead that undefined behavior occurs. Signed-off-by: Roy Li <rongqing.li@windriver.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
026359bc |
|
28-Aug-2011 |
Tore Anderson <tore@fud.no> |
ipv6: Send ICMPv6 RSes only when RAs are accepted This patch improves the logic determining when to send ICMPv6 Router Solicitations, so that they are 1) always sent when the kernel is accepting Router Advertisements, and 2) never sent when the kernel is not accepting RAs. In other words, the operational setting of the "accept_ra" sysctl is used. The change also makes the special "Hybrid Router" forwarding mode ("forwarding" sysctl set to 2) operate exactly the same as the standard Router mode (forwarding=1). The only difference between the two was that RSes was being sent in the Hybrid Router mode only. The sysctl documentation describing the special Hybrid Router mode has therefore been removed. Rationale for the change: Currently, the value of forwarding sysctl is the only thing determining whether or not to send RSes. If it has the value 0 or 2, they are sent, otherwise they are not. This leads to inconsistent behaviour in the following cases: * accept_ra=0, forwarding=0 * accept_ra=0, forwarding=2 * accept_ra=1, forwarding=2 * accept_ra=2, forwarding=1 In the first three cases, the kernel will send RSes, even though it will not accept any RAs received in reply. In the last case, it will not send any RSes, even though it will accept and process any RAs received. (Most routers will send unsolicited RAs periodically, so suppressing RSes in the last case will merely delay auto-configuration, not prevent it.) Also, it is my opinion that having the forwarding sysctl control RS sending behaviour (completely independent of whether RAs are being accepted or not) is simply not what most users would intuitively expect to be the case. Signed-off-by: Tore Anderson <tore@fud.no> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f2c31e32 |
|
29-Jul-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: fix NULL dereferences in check_peer_redir() Gergely Kalman reported crashes in check_peer_redir(). It appears commit f39925dbde778 (ipv4: Cache learned redirect information in inetpeer.) added a race, leading to possible NULL ptr dereference. Since we can now change dst neighbour, we should make sure a reader can safely use a neighbour. Add RCU protection to dst neighbour, and make sure check_peer_redir() can be called safely by different cpus in parallel. As neighbours are already freed after one RCU grace period, this patch should not add typical RCU penalty (cache cold effects) Many thanks to Gergely for providing a pretty report pointing to the bug. Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a9b3cd7f |
|
01-Aug-2011 |
Stephen Hemminger <shemminger@vyatta.com> |
rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER When assigning a NULL value to an RCU protected pointer, no barrier is needed. The rcu_assign_pointer, used to handle that but will soon change to not handle the special case. Convert all rcu_assign_pointer of NULL value. //smpl @@ expression P; @@ - rcu_assign_pointer(P, NULL) + RCU_INIT_POINTER(P, NULL) // </smpl> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
76f793e3 |
|
26-Jul-2011 |
Lorenzo Colitti <lorenzo@google.com> |
ipv6: updates to privacy addresses per RFC 4941. Update the code to handle some of the differences between RFC 3041 and RFC 4941, which obsoletes it. Also a couple of janitorial fixes. - Allow router advertisements to increase the lifetime of temporary addresses. This was not allowed by RFC 3041, but is specified by RFC 4941. It is useful when RA lifetimes are lower than TEMP_{VALID,PREFERRED}_LIFETIME: in this case, the previous code would delete or deprecate addresses prematurely. - Change the default of MAX_RETRY to 3 per RFC 4941. - Add a comment to clarify that the preferred and valid lifetimes in inet6_ifaddr are relative to the timestamp. - Shorten lines to 80 characters in a couple of places. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
32019e65 |
|
24-Jul-2011 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
ipv6: Do not leave router anycast address for /127 prefixes. Original commit 2bda8a0c8af... "Disable router anycast address for /127 prefixes" says: | No need for matching code in addrconf_leave_anycast() as it | will silently ignore any attempt to leave an unknown anycast | address. After analysis, because 1) we may add two or more prefixes on the same interface, or 2)user may have manually joined that anycast, we may hit chances to have anycast address which as if we had generated one by /127 prefix and we should not leave from subnet- router anycast address unconditionally. CC: Bjørn Mork <bjorn@mork.no> CC: Brian Haley <brian.haley@hp.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69cce1d1 |
|
18-Jul-2011 |
David S. Miller <davem@davemloft.net> |
net: Abstract dst->neighbour accesses behind helpers. dst_{get,set}_neighbour() Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9cbb7ecb |
|
17-Jul-2011 |
David S. Miller <davem@davemloft.net> |
ipv6: Get rid of rt6i_nexthop macro. It just makes it harder to see 1) what the code is doing and 2) grep for all users of dst{->,.}neighbour Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2bda8a0c |
|
05-Jul-2011 |
Bjørn Mork <bjorn@mork.no> |
Disable router anycast address for /127 prefixes RFC 6164 requires that routers MUST disable Subnet-Router anycast for the prefix when /127 prefixes are used. No need for matching code in addrconf_leave_anycast() as it will silently ignore any attempt to leave an unknown anycast address. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c7ac8679 |
|
09-Jun-2011 |
Greg Rose <gregory.v.rose@intel.com> |
rtnetlink: Compute and store minimum ifinfo dump size The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
aee80b54 |
|
08-Jun-2011 |
stephen hemminger <shemminger@vyatta.com> |
ipv6: generate link local address for GRE tunnel Use same logic as SIT tunnel to handle link local address for GRE tunnel. OSPFv3 requires link-local address to function. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be281e55 |
|
18-May-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv6: reduce per device ICMP mib sizes ipv6 has per device ICMP SNMP counters, taking too much space because they use percpu storage. needed size per device is : (512+4)*sizeof(long)*number_of_possible_cpus*2 On a 32bit kernel, 16 possible cpus, this wastes more than 64kbytes of memory per ipv6 enabled network device, taken in vmalloc pool. Since ICMP messages are rare, just use shared counters (atomic_long_t) Per network space ICMP counters are still using percpu memory, we might also convert them to shared counters in a future patch. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5785985 |
|
15-Mar-2011 |
Lai Jiangshan <laijs@cn.fujitsu.com> |
net,rcu: convert call_rcu(inet6_ifa_finish_destroy_rcu) to kfree_rcu() The rcu callback inet6_ifa_finish_destroy_rcu() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(inet6_ifa_finish_destroy_rcu). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
38f57d1a |
|
15-Mar-2011 |
Lai Jiangshan <laijs@cn.fujitsu.com> |
net,rcu: convert call_rcu(in6_dev_finish_destroy_rcu) to kfree_rcu() The rcu callback in6_dev_finish_destroy_rcu() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(in6_dev_finish_destroy_rcu). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
ff538818 |
|
30-Apr-2011 |
Lucian Adrian Grijincu <lucian.grijincu@gmail.com> |
sysctl: net: call unregister_net_sysctl_table where needed ctl_table_headers registered with register_net_sysctl_table should have been unregistered with the equivalent unregister_net_sysctl_table Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b71d1d42 |
|
21-Apr-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
inet: constify ip headers and in6_addr Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c3968a85 |
|
13-Apr-2011 |
Daniel Walter <sahne@0x90.at> |
ipv6: RTA_PREFSRC support for ipv6 route source address selection [ipv6] Add support for RTA_PREFSRC This patch allows a user to select the preferred source address for a specific IPv6-Route. It can be set via a netlink message setting RTA_PREFSRC to a valid IPv6 address which must be up on the device the route will be bound to. Signed-off-by: Daniel Walter <dwalter@barracuda.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
25985edc |
|
30-Mar-2011 |
Lucas De Marchi <lucas.demarchi@profusion.mobi> |
Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
|
#
96d796a3 |
|
24-Feb-2011 |
Hagen Paul Pfeifer <hagen@jauu.net> |
ipv6: hash is calculated but not used afterwards hash is declared and assigned but not used anymore. ipv6_addr_hash() exhibit no side-effects. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
73a8bd74 |
|
24-Jan-2011 |
David S. Miller <davem@davemloft.net> |
ipv6: Revert 'administrative down' address handling changes. This reverts the following set of commits: d1ed113f1669390da9898da3beddcc058d938587 ("ipv6: remove duplicate neigh_ifdown") 29ba5fed1bbd09c2cba890798c8f9eaab251401d ("ipv6: don't flush routes when setting loopback down") 9d82ca98f71fd686ef2f3017c5e3e6a4871b6e46 ("ipv6: fix missing in6_ifa_put in addrconf") 2de795707294972f6c34bae9de713e502c431296 ("ipv6: addrconf: don't remove address state on ifdown if the address is being kept") 8595805aafc8b077e01804c9a3668e9aa3510e89 ("IPv6: only notify protocols if address is compeletely gone") 27bdb2abcc5edb3526e25407b74bf17d1872c329 ("IPv6: keep tentative addresses in hash table") 93fa159abe50d3c55c7f83622d3f5c09b6e06f4b ("IPv6: keep route for tentative address") 8f37ada5b5f6bfb4d251a7f510f249cb855b77b3 ("IPv6: fix race between cleanup and add/delete address") 84e8b803f1e16f3a2b8b80f80a63fa2f2f8a9be6 ("IPv6: addrconf notify when address is unavailable") dc2b99f71ef477a31020511876ab4403fb7c4420 ("IPv6: keep permanent addresses on admin down") because the core semantic change to ipv6 address handling on ifdown has broken some things, in particular "disable_ipv6" sysctl handling. Stephen has made several attempts to get things back in working order, but nothing has restored disable_ipv6 fully yet. Reported-by: Eric W. Biederman <ebiederm@xmission.com> Tested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2fdc1c80 |
|
17-Jan-2011 |
Romain Francoise <romain@orebokech.com> |
ipv6: Silence privacy extensions initialization When a network namespace is created (via CLONE_NEWNET), the loopback interface is automatically added to the new namespace, triggering a printk in ipv6_add_dev() if CONFIG_IPV6_PRIVACY is set. This is problematic for applications which use CLONE_NEWNET as part of a sandbox, like Chromium's suid sandbox or recent versions of vsftpd. On a busy machine, it can lead to thousands of useless "lo: Disabled Privacy Extensions" messages appearing in dmesg. It's easy enough to check the status of privacy extensions via the use_tempaddr sysctl, so just removing the printk seems like the most sensible solution. Signed-off-by: Romain Francoise <romain@orebokech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d1ed113f |
|
16-Dec-2010 |
stephen hemminger <shemminger@vyatta.com> |
ipv6: remove duplicate neigh_ifdown When device is being set to down, neigh_ifdown was being called twice. Once from addrconf notifier and once from ndisc notifier. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
29ba5fed |
|
16-Dec-2010 |
stephen hemminger <shemminger@vyatta.com> |
ipv6: don't flush routes when setting loopback down When loopback device is being brought down, then keep the route table entries because they are special. The entries in the local table for linklocal routes and ::1 address should not be purged. This is a sub optimal solution to the problem and should be replaced by a better fix in future. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5f75a104 |
|
07-Dec-2010 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
ipv6: fix nl group when advertising a new link New idev are advertised with NL group RTNLGRP_IPV6_IFADDR, but should use RTNLGRP_IPV6_IFINFO. Bug was introduced by commit 8d7a76c9. Signed-off-by: Wang Xuefu <xuefu.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cf7afbfe |
|
21-Nov-2010 |
Thomas Graf <tgraf@infradead.org> |
rtnl: make link af-specific updates atomic As David pointed out correctly, updates to af-specific attributes are currently not atomic. If multiple changes are requested and one of them fails, previous updates may have been applied already leaving the link behind in a undefined state. This patch splits the function parse_link_af() into two functions validate_link_af() and set_link_at(). validate_link_af() is placed to validate_linkmsg() check for errors as early as possible before any changes to the link have been made. set_link_af() is called to commit the changes later. This method is not fail proof, while it is currently sufficient to make set_link_af() inerrable and thus 100% atomic, the validation function method will not be able to detect all error scenarios in the future, there will likely always be errors depending on states which are f.e. not protected by rtnl_mutex and thus may change between validation and setting. Also, instead of silently ignoring unknown address families and config blocks for address families which did not register a set function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are returned to avoid comitting 4 out of 5 update requests without notifying the user. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
88b2a9a3 |
|
15-Nov-2010 |
John Fastabend <john.r.fastabend@intel.com> |
ipv6: fix missing in6_ifa_put in addrconf Fix ref count bug introduced by commit 2de795707294972f6c34bae9de713e502c431296 Author: Lorenzo Colitti <lorenzo@google.com> Date: Wed Oct 27 18:16:49 2010 +0000 ipv6: addrconf: don't remove address state on ifdown if the address is being kept Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr refcnt correctly with in6_ifa_put(). Reported-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
18a31e1e |
|
16-Nov-2010 |
Thomas Graf <tgraf@infradead.org> |
ipv6: Expose reachable and retrans timer values as msecs Expose reachable and retrans timer values in msecs instead of jiffies. Both timer values are already exposed as msecs in the neighbour table netlink interface. The creation timestamp format with increased precision is kept but cleaned up. Signed-off-by: Thomas Graf <tgraf@infradead.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93908d19 |
|
16-Nov-2010 |
Thomas Graf <tgraf@infradead.org> |
ipv6: Expose IFLA_PROTINFO timer values in msecs instead of jiffies IFLA_PROTINFO exposes timer related per device settings in jiffies. Change it to expose these values in msecs like the sysctl interface does. I did not find any users of IFLA_PROTINFO which rely on any of these values and even if there are, they are likely already broken because there is no way for them to reliably convert such a value to another time format. Signed-off-by: Thomas Graf <tgraf@infradead.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b382b191 |
|
15-Nov-2010 |
Thomas Graf <tgraf@infradead.org> |
ipv6: AF_INET6 link address family IPv6 already exposes some address family data via netlink in the IFLA_PROTINFO attribute if RTM_GETLINK request is sent with the address family set to AF_INET6. We take over this format and reuse all the code. Signed-off-by: Thomas Graf <tgraf@infradead.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9d82ca98 |
|
15-Nov-2010 |
John Fastabend <john.r.fastabend@intel.com> |
ipv6: fix missing in6_ifa_put in addrconf Fix ref count bug introduced by commit 2de795707294972f6c34bae9de713e502c431296 Author: Lorenzo Colitti <lorenzo@google.com> Date: Wed Oct 27 18:16:49 2010 +0000 ipv6: addrconf: don't remove address state on ifdown if the address is being kept Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr refcnt correctly with in6_ifa_put(). Reported-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2de79570 |
|
27-Oct-2010 |
Lorenzo Colitti <lorenzo@google.com> |
ipv6: addrconf: don't remove address state on ifdown if the address is being kept Currently, addrconf_ifdown does not delete statically configured IPv6 addresses when the interface is brought down. The intent is that when the interface comes back up the address will be usable again. However, this doesn't actually work, because the system stops listening on the corresponding solicited-node multicast address, so the address cannot respond to neighbor solicitations and thus receive traffic. Also, the code notifies the rest of the system that the address is being deleted (e.g, RTM_DELADDR), even though it is not. Fix it so that none of this state is updated if the address is being kept on the interface. Tested: Added a statically configured IPv6 address to an interface, started ping, brought link down, brought link up again. When link came up ping kept on going and "ip -6 maddr" showed that the host was still subscribed to there Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
853dc2e0 |
|
24-Oct-2010 |
Ursula Braun <ursula.braun@de.ibm.com> |
ipv6: fix refcnt problem related to POSTDAD state After running this bonding setup script modprobe bonding miimon=100 mode=0 max_bonds=1 ifconfig bond0 10.1.1.1/16 ifenslave bond0 eth1 ifenslave bond0 eth3 on s390 with qeth-driven slaves, modprobe -r fails with this message unregister_netdevice: waiting for bond0 to become free. Usage count = 1 due to twice detection of duplicate address. Problem is caused by a missing decrease of ifp->refcnt in addrconf_dad_failure. An extra call of in6_ifa_put(ifp) solves it. Problem has been introduced with commit f2344a131bccdbfc5338e17fa71a807dee7944fa. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7a876b0e |
|
27-Sep-2010 |
Glenn Wurster <gwurster@scs.carleton.ca> |
IPv6: Temp addresses are immediately deleted. There is a bug in the interaction between ipv6_create_tempaddr and addrconf_verify. Because ipv6_create_tempaddr uses the cstamp and tstamp from the public address in creating a private address, if we have not received a router advertisement in a while, tstamp + temp_valid_lft might be < now. If this happens, the new address is created inside ipv6_create_tempaddr, then the loop within addrconf_verify starts again and the address is immediately deleted. We are left with no temporary addresses on the interface, and no more will be created until the public IP address is updated. To avoid this, set the expiry time to be the minimum of the time left on the public address or the config option PLUS the current age of the public interface. Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aed65501 |
|
27-Sep-2010 |
Glenn Wurster <gwurster@scs.carleton.ca> |
IPv6: Create temporary address if none exists. If privacy extentions are enabled, but no current temporary address exists, then create one when we get a router advertisement. Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c61393ea |
|
04-Oct-2010 |
stephen hemminger <shemminger@vyatta.com> |
ipv6: make __ipv6_isatap_ifid static Another exported symbol only used in one file Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2cc6d2bf |
|
24-Sep-2010 |
Neil Horman <nhorman@tuxdriver.com> |
ipv6: add a missing unregister_pernet_subsys call Clean up a missing exit path in the ipv6 module init routines. In addrconf_init we call ipv6_addr_label_init which calls register_pernet_subsys for the ipv6_addr_label_ops structure. But if module loading fails, or if the ipv6 module is removed, there is no corresponding unregister_pernet_subsys call, which leaves a now-bogus address on the pernet_list, leading to oopses in subsequent registrations. This patch cleans up both the failed load path and the unload path. Tested by myself with good results. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> include/net/addrconf.h | 1 + net/ipv6/addrconf.c | 11 ++++++++--- net/ipv6/addrlabel.c | 5 +++++ 3 files changed, 14 insertions(+), 3 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a02cec21 |
|
22-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: return operator cleanup Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c3bccac2 |
|
02-Sep-2010 |
Thomas Graf <tgraf@infradead.org> |
ipv6: add special mode forwarding=2 to send RS while configured as router Similar to accepting router advertisement, the IPv6 stack does not send router solicitations if forwarding is enabled. This patch enables this behavior to be overruled by setting forwarding to the special value 2. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
64e724f6 |
|
20-Jul-2010 |
Brian Haley <brian.haley@hp.com> |
ipv6: Don't add routes to ipv6 disabled interfaces. If the interface has IPv6 disabled, don't add a multicast or link-local route since we won't be adding a link-local address. Reported-by: Mahesh Kelkar <maheshkelkar@gmail.com> Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4ce3c183 |
|
30-Jun-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
snmp: 64bit ipstats_mib for all arches /proc/net/snmp and /proc/net/netstat expose SNMP counters. Width of these counters is either 32 or 64 bits, depending on the size of "unsigned long" in kernel. This means user program parsing these files must already be prepared to deal with 64bit values, regardless of user program being 32 or 64 bit. This patch introduces 64bit snmp values for IPSTAT mib, where some counters can wrap pretty fast if they are 32bit wide. # netstat -s|egrep "InOctets|OutOctets" InOctets: 244068329096 OutOctets: 244069348848 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
784e2710 |
|
26-Jun-2010 |
Ben Hutchings <ben@decadent.org.uk> |
ipv6: Use interface max_desync_factor instead of static default max_desync_factor can be configured per-interface, but nothing is using the value. Reported-by: Piotr Lewandowski <piotr.lewandowski@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f56619fc |
|
26-Jun-2010 |
Ben Hutchings <ben@decadent.org.uk> |
ipv6: Clamp reported valid_lft to a minimum of 0 Since addresses are only revalidated every 2 minutes, the reported valid_lft can underflow shortly before the address is deleted. Clamp it to a minimum of 0, as for prefered_lft. Reported-by: Piotr Lewandowski <piotr.lewandowski@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1823e4c8 |
|
22-Jun-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
snmp: add align parameter to snmp_mib_init() In preparation for 64bit snmp counters for some mibs, add an 'align' parameter to snmp_mib_init(), instead of assuming mibs only contain 'unsigned long' fields. Callers can use __alignof__(type) to provide correct alignment. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d8d1f30b |
|
11-Jun-2010 |
Changli Gao <xiaosuo@gmail.com> |
net-next: remove useless union keyword remove useless union keyword in rtable, rt6_info and dn_route. Since there is only one member in a union, the union keyword isn't useful. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
622ccdf1 |
|
18-May-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
ipv6: Never schedule DAD timer on dead address This patch ensures that all places that schedule the DAD timer look at the address state in a safe manner before scheduling the timer. This ensures that we don't end up with pending timers after deleting an address. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f2344a13 |
|
18-May-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
ipv6: Use POSTDAD state This patch makes use of the new POSTDAD state. This prevents a race between DAD completion and failure. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4c5ff6a6 |
|
18-May-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
ipv6: Use state_lock to protect ifa state This patch makes use of the new state_lock to synchronise between updates to the ifa state. This fixes the issue where a remotely triggered address deletion (through DAD failure) coincides with a local administrative address deletion, causing certain actions to be performed twice incorrectly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e9d3e084 |
|
18-May-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
ipv6: Replace inet6_ifaddr->dead with state This patch replaces the boolean dead flag on inet6_ifaddr with a state enum. This allows us to roll back changes when deleting an address according to whether DAD has completed or not. This patch only adds the state field and does not change the logic. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eedf042a |
|
17-May-2010 |
Stephen Hemminger <shemminger@vyatta.com> |
ipv6: fix the bug of address check The duplicate address check code got broken in the conversion to hlist (2.6.35). The earlier patch did not fix the case where two addresses match same hash value. Use two exit paths, rather than depending on state of loop variables (from macro). Based on earlier fix by Shan Wei. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4f70ecca |
|
03-May-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: rcu fixes Add hlist_for_each_entry_rcu_bh() and hlist_for_each_entry_continue_rcu_bh() macros, and use them in ipv6_get_ifaddr(), if6_get_first() and if6_get_next() to fix lockdeps warnings. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0eae88f3 |
|
20-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: Fix various endianness glitches Sparse can help us find endianness bugs, but we need to make some cleanups to be able to more easily spot real bugs. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8595805a |
|
11-Apr-2010 |
stephen hemminger <shemminger@vyatta.com> |
IPv6: only notify protocols if address is compeletely gone The notifier for address down should only be called if address is completely gone, not just being marked as tentative on link transistion. The code in net-next would case bonding/sctp/s390 to see address disappear on link down, but they would never see it reappear on link up. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d1f84c63 |
|
11-Apr-2010 |
stephen hemminger <shemminger@vyatta.com> |
ipv6: additional ref count for hash list unnecessary Since an address in hash list has to already have a ref count, no additional ref count is needed. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27bdb2ab |
|
11-Apr-2010 |
stephen hemminger <shemminger@vyatta.com> |
IPv6: keep tentative addresses in hash table When link goes down, want address to be preserved but in a tentative state, therefore it has to stay in hash list. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93fa159a |
|
11-Apr-2010 |
stephen hemminger <shemminger@vyatta.com> |
IPv6: keep route for tentative address Recent changes preserve IPv6 address when link goes down (good). But would cause address to point to dead dst entry (bad). The simplest fix is to just not delete route if address is being held for later use. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
4b97efdf |
|
26-Mar-2010 |
Patrick McHardy <kaber@trash.net> |
net: fix netlink address dumping in IPv4/IPv6 When a dump is interrupted at the last device in a hash chain and then continued, "idx" won't get incremented past s_idx, so s_ip_idx is not reset when moving on to the next device. This means of all following devices only the last n - s_ip_idx addresses are dumped. Tested-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Patrick McHardy <kaber@trash.net>
|
#
b79d1d54 |
|
25-Mar-2010 |
David S. Miller <davem@davemloft.net> |
ipv6: Fix result generation in ipv6_get_ifaddr(). Finishing naturally from hlist_for_each_entry(x, ...) does not result in 'x' being NULL. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b54c9b98 |
|
25-Mar-2010 |
David S. Miller <davem@davemloft.net> |
ipv6: Preserve pervious behavior in ipv6_link_dev_addr(). Use list_add_tail() to get the behavior we had before the list_head conversion for ipv6 address lists. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3e81c6da |
|
20-Mar-2010 |
David S. Miller <davem@davemloft.net> |
ipv6: Fix bug in ipv6_chk_same_addr(). hlist_for_each_entry(p...) will not necessarily initialize 'p' to anything if the hlist is empty. GCC notices this and emits a warning. Just return true explicitly when we hit a match, and return false is we fall out of the loop without one. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b2db7564 |
|
20-Mar-2010 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
ipv6: Reduce timer events for addrconf_verify(). This patch reduces timer events while keeping accuracy by rounding our timer and/or batching several address validations in addrconf_verify(). addrconf_verify() is called at earliest timeout among interface addresses' timeouts, but at maximum ADDR_CHECK_FREQUENCY (120 secs). In most cases, all of timeouts of interface addresses are long enough (e.g. several hours or days vs 2 minutes), this timer is usually called every ADDR_CHECK_FREQUENCY, and it is okay to be lazy. (Note this timer could be eliminated if all code paths which modifies variables related to timeouts call us manually, but it is another story.) However, in other least but important cases, we try keeping accuracy. When the real interface address timeout is coming, and the timeout is just before the rounded timeout, we accept some error. When a timeout has been reached, we also try batching other several events in very near future. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
88949cf4 |
|
17-Mar-2010 |
stephen hemminger <shemminger@vyatta.com> |
IPv6: addrconf cleanup addrconf_verify The variable regen_advance is only used in the privacy case. Move it to simplify code and eliminate ifdef's Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e21e8467 |
|
20-Mar-2010 |
Stephen Hemminger <shemminger@vyatta.com> |
addrconf: checkpatch fixes Fix some of the checkpatch complaints. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bcdd553f |
|
20-Mar-2010 |
Stephen Hemminger <shemminger@vyatta.com> |
IPv6: addrconf cleanups Some minor stuff, reformat comments and add whitespace for clarity Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
502a2ffd |
|
17-Mar-2010 |
stephen hemminger <shemminger@vyatta.com> |
ipv6: convert idev_list to list macros Convert to list macro's for the list of addresses per interface in IPv6. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3a88a81d |
|
17-Mar-2010 |
stephen hemminger <shemminger@vyatta.com> |
ipv6: user better hash for addrconf The existing hash function has a couple of issues: * it is hardwired to 16 for IN6_ADDR_HSIZE * limited to 256 and callers using int * use jhash2 rather than some old BSD algorithm No need for random seed since this is local only (based on assigned addresses) table. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5c578aed |
|
17-Mar-2010 |
stephen hemminger <shemminger@vyatta.com> |
IPv6: convert addrconf hash list to RCU Convert from reader/writer lock to RCU and spinlock for addrconf hash list. Adds an additional helper macro for hlist_for_each_entry_continue_rcu to handle the continue case. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c2e21293 |
|
17-Mar-2010 |
stephen hemminger <shemminger@vyatta.com> |
ipv6: convert addrconf list to hlist Using hash list macros, simplifies code and helps later RCU. This patch includes some initialization that is not strictly necessary, since an empty hlist node/list is all zero; and list is in BSS and node is allocated with kzalloc. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
372e6c8f |
|
17-Mar-2010 |
stephen hemminger <shemminger@vyatta.com> |
ipv6: convert temporary address list to list macros Use list macros instead of open coded linked list. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93d9b7d7 |
|
10-Mar-2010 |
Jiri Pirko <jpirko@redhat.com> |
net: rename notifier defines for netdev type change Since generally there could be more netdevices changing type other than bonding, making this event type name "bonding-unrelated" Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e2577a06 |
|
13-Mar-2010 |
Herbert Xu <herbert@gondor.apana.org.au> |
ipv6: Send netlink notification when DAD fails If we are managing IPv6 addresses using DHCP, it would be nice for user-space to be notified if an address configured through DHCP fails DAD. Otherwise user-space would have to poll to see whether DAD succeeds. This patch uses the existing notification mechanism and simply hooks it into the DAD failure code path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8f37ada5 |
|
03-Mar-2010 |
stephen hemminger <shemminger@vyatta.com> |
IPv6: fix race between cleanup and add/delete address This solves a potential race problem during the cleanup process. The issue is that addrconf_ifdown() needs to traverse address list, but then drop lock to call the notifier. The version in -next could get confused if add/delete happened during this window. Original code (2.6.32 and earlier) was okay because all addresses were always deleted. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
84e8b803 |
|
02-Mar-2010 |
stephen hemminger <shemminger@vyatta.com> |
IPv6: addrconf notify when address is unavailable My recent change in net-next to retain permanent addresses caused regression. Device refcount would not go to zero when device was unregistered because left over anycast reference would hold ipv6 dev reference which would hold device references... The correct procedure is to call notify chain when address is no longer available for use. When interface comes back DAD timer will notify back that address is available. Also, link local addresses should be purged when interface is brought down. The address might be changed. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b2a1953 |
|
02-Mar-2010 |
stephen hemminger <shemminger@vyatta.com> |
IPv6: addrconf timer race The Router Solicitation timer races with device state changes because it doesn't lock the device. Use local variable to avoid one repeated dereference. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
122e4519 |
|
02-Mar-2010 |
stephen hemminger <shemminger@vyatta.com> |
IPv6: addrconf dad timer unnecessary bh_disable Timer code runs in bottom half, so there is no need for using _bh form of locking. Also check if device is not ready to avoid race with address that is no longer active. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
45bb0060 |
|
25-Feb-2010 |
Ulrich Weber <uweber@astaro.com> |
ipv6: Remove IPV6_ADDR_RESERVED RFC 4291 section 2.4 states that all uncategorized addresses should be considered as Global Unicast. This will remove IPV6_ADDR_RESERVED completely and return IPV6_ADDR_UNICAST in ipv6_addr_type() instead. Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
88af182e |
|
19-Feb-2010 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Fix sysctl restarts... Yuck. It turns out that when we restart sysctls we were restarting with the values already changed. Which unfortunately meant that the second time through we thought there was no change and skipped all kinds of work, despite the fact that there was indeed a change. I have fixed this the simplest way possible by restoring the changed values when we restart the sysctl write. One of my coworkers spotted this bug when after disabling forwarding on an interface pings were still forwarded. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7d720c3e |
|
16-Feb-2010 |
Tejun Heo <tj@kernel.org> |
percpu: add __percpu sparse annotations to net Add __percpu sparse annotations to net. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. The macro and type tricks around snmp stats make things a bit interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All snmp_mib_*() users which used to cast the argument to (void **) are updated to cast it to (void __percpu **). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
54716e3b |
|
13-Feb-2010 |
Eric W. Biederman <ebiederm@xmission.com> |
net neigh: Decouple per interface neighbour table controls from binary sysctls Stop computing the number of neighbour table settings we have by counting the number of binary sysctls. This behaviour was silly and meant that we could not add another neighbour table setting without also adding another binary sysctl. Don't pass the binary sysctl path for neighour table entries into neigh_sysctl_register. These parameters are no longer used and so are just dead code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
21809faf |
|
08-Feb-2010 |
stephen hemminger <shemminger@vyatta.com> |
IPv6: remove trivial nested _bh suffix Don't need to disable bottom half it is already down in the previous lock. Move some blank lines to group locking in same context. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dc2b99f7 |
|
08-Feb-2010 |
stephen hemminger <shemminger@vyatta.com> |
IPv6: keep permanent addresses on admin down Permanent IPV6 addresses should not be removed when the link is set to admin down, only when device is removed. When link is lost permanent addresses should be marked as tentative so that when link comes back they are subject to duplicate address detection (if DAD was enabled for that address). Other routing systems keep manually configured IPv6 addresses when link is set down. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2c8c1e72 |
|
16-Jan-2010 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: spread __net_init, __net_exit __net_init/__net_exit are apparently not going away, so use them to full extent. In some cases __net_init was removed, because it was called from __net_exit code. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
09ad9bc7 |
|
25-Nov-2009 |
Octavian Purdila <opurdila@ixiacom.com> |
net: use net_eq to compare nets Generated with the following semantic patch @@ struct net *n1; struct net *n2; @@ - n1 == n2 + net_eq(n1, n2) @@ struct net *n1; struct net *n2; @@ - n1 != n2 + !net_eq(n1, n2) applied over {include,net,drivers/net}. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
234b27c3 |
|
11-Nov-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv6: speedup inet6_dump_addr() When handling large number of netdevices, inet6_dump_addr() is very slow because it has O(N^2) complexity. Instead of scanning one single list, we can use the NETDEV_HASHENTRIES sub lists of the dev_index hash table, and RCU lookups. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f8572d8f |
|
05-Nov-2009 |
Eric W. Biederman <ebiederm@xmission.com> |
sysctl net: Remove unused binary sysctl code Now that sys_sysctl is a compatiblity wrapper around /proc/sys all sysctl strategy routines, and all ctl_name and strategy entries in the sysctl tables are unused, and can be revmoed. In addition neigh_sysctl_register has been modified to no longer take a strategy argument and it's callers have been modified not to pass one. Cc: "David Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
#
434a8a58 |
|
11-Nov-2009 |
David S. Miller <davem@davemloft.net> |
ipv6: Remove unused var in inet6_dump_ifinfo() Reported by Stephen Rothwell: -------------------- Today's linux-next build (x86_64 allmodconfig) produced this warning: net/ipv6/addrconf.c: In function 'inet6_dump_ifinfo': net/ipv6/addrconf.c:3833: warning: unused variable 'err' Introduced by commit 84d2697d9649339215675551eae28ba04068dea1 ("ipv6: speedup inet6_dump_ifinfo()"). -------------------- Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bcd32326 |
|
09-Nov-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv6: Allow inet6_dump_addr() to handle more than 64 addresses Apparently, inet6_dump_addr() is not able to handle more than 64 ipv6 addresses per device. We must break from inner loops in case skb is full, or else cursor is put at the end of list. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
84d2697d |
|
08-Nov-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv6: speedup inet6_dump_ifinfo() When handling large number of netdevice, inet6_dump_ifinfo() is very slow because it has O(N^2) complexity. Instead of scanning one single list, we can use the 256 sub lists of the dev_index hash table, and RCU lookups. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c6d14c84 |
|
04-Nov-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: Introduce for_each_netdev_rcu() iterator Adds RCU management to the list of netdevices. Convert some for_each_netdev() users to RCU version, if it can avoid read_lock-ing dev_base_lock Ie: read_lock(&dev_base_loack); for_each_netdev(net, dev) some_action(); read_unlock(&dev_base_lock); becomes : rcu_read_lock(); for_each_netdev_rcu(net, dev) some_action(); rcu_read_unlock(); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c3faca05 |
|
08-Oct-2009 |
Cosmin Ratiu <cratiu@ixiacom.com> |
ipv6: fix devconf after adding force_tllao option Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f7734fdf |
|
02-Oct-2009 |
Octavian Purdila <opurdila@ixiacom.com> |
make TLLAO option for NA packets configurable On Friday 02 October 2009 20:53:51 you wrote: > This is good although I would have shortened the name. Ah, I knew I forgot something :) Here is v4. tavi >From 24d96d825b9fa832b22878cc6c990d5711968734 Mon Sep 17 00:00:00 2001 From: Octavian Purdila <opurdila@ixiacom.com> Date: Fri, 2 Oct 2009 00:51:15 +0300 Subject: [PATCH] ipv6: new sysctl for sending TLLAO with unicast NAs Neighbor advertisements responding to unicast neighbor solicitations did not include the target link-layer address option. This patch adds a new sysctl option (disabled by default) which controls whether this option should be sent even with unicast NAs. The need for this arose because certain routers expect the TLLAO in some situations even as a response to unicast NS packets. Moreover, RFC 2461 recommends sending this to avoid a race condition (section 4.4, Target link-layer address) Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d65af78 |
|
23-Sep-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
sysctl: remove "struct file *" argument of ->proc_handler It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0522fea6 |
|
17-Sep-2009 |
Jens Rosenboom <me@jayr.de> |
ipv6: Log the affected address when DAD failure occurs If an interface has multiple addresses, the current message for DAD failure isn't really helpful, so this patch adds the address itself to the printk. Signed-off-by: Jens Rosenboom <me@jayr.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75c78500 |
|
15-Sep-2009 |
Moni Shoua <monis@voltaire.com> |
bonding: remap muticast addresses without using dev_close() and dev_open() This patch fixes commit e36b9d16c6a6d0f59803b3ef04ff3c22c3844c10. The approach there is to call dev_close()/dev_open() whenever the device type is changed in order to remap the device IP multicast addresses to HW multicast addresses. This approach suffers from 2 drawbacks: *. It assumes tha the device is UP when calling dev_close(), or otherwise dev_close() has no affect. It is worth to mention that initscripts (Redhat) and sysconfig (Suse) doesn't act the same in this matter. *. dev_close() has other side affects, like deleting entries from the routing table, which might be unnecessary. The fix here is to directly remap the IP multicast addresses to HW multicast addresses for a bonding device that changes its type, and nothing else. Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc411d0b |
|
09-Sep-2009 |
Brian Haley <brian.haley@hp.com> |
ipv6: Add IFA_F_DADFAILED flag Add IFA_F_DADFAILED flag to denote an IPv6 address that has failed Duplicate Address Detection, that way tools like /sbin/ip can be more informative. 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2001:db8::1/64 scope global tentative dadfailed valid_lft forever preferred_lft forever Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a1ed0526 |
|
02-Jul-2009 |
Brian Haley <brian.haley@hp.com> |
IPv6: preferred lifetime of address not getting updated There's a bug in addrconf_prefix_rcv() where it won't update the preferred lifetime of an IPv6 address if the current valid lifetime of the address is less than 2 hours (the minimum value in the RA). For example, If I send a router advertisement with a prefix that has valid lifetime = preferred lifetime = 2 hours we'll build this address: 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic valid_lft 7175sec preferred_lft 7175sec If I then send the same prefix with valid lifetime = preferred lifetime = 0 it will be ignored since the minimum valid lifetime is 2 hours: 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic valid_lft 7161sec preferred_lft 7161sec But according to RFC 4862 we should always reset the preferred lifetime even if the valid lifetime is invalid, which would cause the address to immediately get deprecated. So with this patch we'd see this: 5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000 inet6 2001:1890:1109:a20:21f:29ff:fe5a:ef04/64 scope global deprecated dynamic valid_lft 7163sec preferred_lft 0sec The comment winds-up being 5x the size of the code to fix the problem. Update the preferred lifetime of IPv6 addresses derived from a prefix info option in a router advertisement even if the valid lifetime in the option is invalid, as specified in RFC 4862 Section 5.5.3e. Fixes an issue where an address will not immediately become deprecated. Reported by Jens Rosenboom. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a1faa698 |
|
24-Jun-2009 |
Jens Rosenboom <me@jayr.de> |
ipv6: avoid wraparound for expired preferred lifetime Avoid showing wrong high values when the preferred lifetime of an address is expired. Signed-off-by: Jens Rosenboom <me@jayr.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
590a9887 |
|
08-Jun-2009 |
Masatake YAMATO <yamato@redhat.com> |
trivial: Fix a typo in comment of addrconf_dad_start() Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
56d417b1 |
|
01-Jun-2009 |
Brian Haley <brian.haley@hp.com> |
IPv6: Add 'autoconf' and 'disable_ipv6' module parameters Add 'autoconf' and 'disable_ipv6' parameters to the IPv6 module. The first controls if IPv6 addresses are autoconfigured from prefixes received in Router Advertisements. The IPv6 loopback (::1) and link-local addresses are still configured. The second controls if IPv6 addresses are desired at all. No IPv6 addresses will be added to any interfaces. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9af28511 |
|
18-May-2009 |
Sascha Hlusiak <contact@saschahlusiak.de> |
addrconf: refuse isatap eui64 for INADDR_ANY A tunnel with no local ipv4 endpoint would otherwise use the ISATAP linklocal address fe80::5efe:0:0, which is invalid. Rather not add a linklocal address at all. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5007392d |
|
13-May-2009 |
Eric W. Biederman <ebiederm@xmission.com> |
net: FIX ipv6_forward sysctl restart Just returning -ERESTARTSYS without a signal pending is not good that will just leak it to userspace. We need return -ERESTARTNOINTR so we always restart and set signal pending so that we fall of the fast path of syscall return and setup the system call restart. So use restart_syscall() which does all of this for us. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b2f5e7cd |
|
24-Mar-2009 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
ipv6: Fix conflict resolutions during ipv6 binding The ipv6 version of bind_conflict code calls ipv6_rcv_saddr_equal() which at times wrongly identified intersections between addresses. It particularly broke down under a few instances and caused erroneous bind conflicts. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a0bffffc |
|
21-Mar-2009 |
Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> |
net/*: use linux/kernel.h swap() tcp_sack_swap seems unnecessary so I pushed swap to the caller. Also removed comment that seemed then pointless, and added include when not already there. Compile tested. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9bdd8d40 |
|
18-Mar-2009 |
Brian Haley <brian.haley@hp.com> |
ipv6: Fix incorrect disable_ipv6 behavior Fix the behavior of allowing both sysctl and addrconf_dad_failure() to set the disable_ipv6 parameter without any bad side-effects. If DAD fails and accept_dad > 1, we will still set disable_ipv6=1, but then instead of allowing an RA to add an address then immediately fail DAD, we simply don't allow the address to be added in the first place. This also lets the user set this flag and disable all IPv6 addresses on the interface, or on the entire system. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
176c39af |
|
03-Mar-2009 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
netns: fix addrconf_ifdown kernel panic When a network namespace is destroyed the network interfaces are all unregistered, making addrconf_ifdown called by the netdevice notifier. In the other hand, the addrconf exit method does a loop on the network devices and does addrconf_ifdown on each of them. But the ordering of the netns subsystem is not right because it uses the register_pernet_device instead of register_pernet_subsys. If we handle the loopback as any network device, we can safely use register_pernet_subsys. But if we use register_pernet_subsys, the addrconf exit method will do exactly what was already done with the unregistering of the network devices. So in definitive, this code is pointless. I removed the netns addrconf exit method and moved the code to the addrconf cleanup function. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b325fddb |
|
25-Feb-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
ipv6: Fix sysctl unregistration deadlock Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ce85fe4 |
|
25-Feb-2009 |
Pablo Neira Ayuso <pablo@netfilter.org> |
netlink: change nlmsg_notify() return value logic This patch changes the return value of nlmsg_notify() as follows: If NETLINK_BROADCAST_ERROR is set by any of the listeners and an error in the delivery happened, return the broadcast error; else if there are no listeners apart from the socket that requested a change with the echo flag, return the result of the unicast notification. Thus, with this patch, the unicast notification is handled in the same way of a broadcast listener that has set the NETLINK_BROADCAST_ERROR socket flag. This patch is useful in case that the caller of nlmsg_notify() wants to know the result of the delivery of a netlink notification (including the broadcast delivery) and take any action in case that the delivery failed. For example, ctnetlink can drop packets if the event delivery failed to provide reliable logging and state-synchronization at the cost of dropping packets. This patch also modifies the rtnetlink code to ignore the return value of rtnl_notify() in all callers. The function rtnl_notify() (before this patch) returned the error of the unicast notification which makes rtnl_set_sk_err() reports errors to all listeners. This is not of any help since the origin of the change (the socket that requested the echoing) notices the ENOBUFS error if the notification fails and should resync itself. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b5f348e5 |
|
07-Feb-2009 |
Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> |
ipv6/addrconf: common code located $ codiff net/ipv6/addrconf.o net/ipv6/addrconf.o.new net/ipv6/addrconf.c: addrconf_notify | -267 1 function changed, 267 bytes removed net/ipv6/addrconf.c: add_addr | +86 1 function changed, 86 bytes added net/ipv6/addrconf.o.new: 2 functions changed, 86 bytes added, 267 bytes removed, diff: -181 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a4e6db07 |
|
27-Jan-2009 |
David S. Miller <davem@davemloft.net> |
ipv6: Make mc_forwarding sysctl read-only. The kernel manages this value internally, as necessary, as VIFs are added/removed and as multicast routers are registered and deregistered. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5bc3eb7e |
|
19-Nov-2008 |
Stephen Hemminger <shemminger@vyatta.com> |
ip: convert to net_device_ops for ioctl Convert to net_device_ops function table pointer for ioctl. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e3ec6cfc |
|
05-Nov-2008 |
Benjamin Thery <benjamin.thery@bull.net> |
ipv6: fix run pending DAD when interface becomes ready With some net devices types, an IPv6 address configured while the interface was down can stay 'tentative' forever, even after the interface is set up. In some case, pending IPv6 DADs are not executed when the device becomes ready. I observed this while doing some tests with kvm. If I assign an IPv6 address to my interface eth0 (kvm driver rtl8139) when it is still down then the address is flagged tentative (IFA_F_TENTATIVE). Then, I set eth0 up, and to my surprise, the address stays 'tentative', no DAD is executed and the address can't be pinged. I also observed the same behaviour, without kvm, with virtual interfaces types macvlan and veth. Some easy steps to reproduce the issue with macvlan: 1. ip link add link eth0 type macvlan 2. ip -6 addr add 2003::ab32/64 dev macvlan0 3. ip addr show dev macvlan0 ... inet6 2003::ab32/64 scope global tentative ... 4. ip link set macvlan0 up 5. ip addr show dev macvlan0 ... inet6 2003::ab32/64 scope global tentative ... Address is still tentative I think there's a bug in net/ipv6/addrconf.c, addrconf_notify(): addrconf_dad_run() is not always run when the interface is flagged IF_READY. Currently it is only run when receiving NETDEV_CHANGE event. Looks like some (virtual) devices doesn't send this event when becoming up. For both NETDEV_UP and NETDEV_CHANGE events, when the interface becomes ready, run_pending should be set to 1. Patch below. 'run_pending = 1' could be moved below the if/else block but it makes the code less readable. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6d9f239a |
|
03-Nov-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: '&' redux I want to compile out proc_* and sysctl_* handlers totally and stub them to NULL depending on config options, however usage of & will prevent this, since taking adress of NULL pointer will break compilation. So, drop & in front of every ->proc_handler and every ->strategy handler, it was never needed in fact. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4b7a4274 |
|
29-Oct-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: replace %#p6 format specifier with %pi6 gcc warns when using the # modifier with the %p format specifier, so we can't use this to omit the colons when needed, introduces %pi6 instead. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b071195d |
|
28-Oct-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: replace all current users of NIP6_SEQFMT with %#p6 The define in kernel.h can be done away with at a later time. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f221e726 |
|
15-Oct-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
sysctl: simplify ->strategy name and nlen parameters passed to ->strategy hook are unused, remove them. In general ->strategy hook should know what it's doing, and don't do something tricky for which, say, pointer to original userspace array may be needed (name). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> [ networking bits ] Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: Matt Mackall <mpm@selenic.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f410a1fb |
|
23-Aug-2008 |
Stephen Hemminger <shemminger@vyatta.com> |
ipv6: protocol for address routes This fixes a problem spotted with zebra, but not sure if it is necessary a kernel problem. With IPV6 when an address is added to an interface, Zebra creates a duplicate RIB entry, one as a connected route, and other as a kernel route. When an address is added to an interface the RTN_NEWADDR message causes Zebra to create a connected route. In IPV4 when an address is added to an interface a RTN_NEWROUTE message is set to user space with the protocol RTPROT_KERNEL. Zebra ignores these messages, because it already has the connected route. The problem is that route created in IPV6 has route protocol == RTPROT_BOOT. Was this a design decision or a bug? This fixes it. Same patch applies to both net-2.6 and stable. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
191cd582 |
|
14-Aug-2008 |
Brian Haley <brian.haley@hp.com> |
netns: Add network namespace argument to rt6_fill_node() and ipv6_dev_get_saddr() ipv6_dev_get_saddr() blindly de-references dst_dev to get the network namespace, but some callers might pass NULL. Change callers to pass a namespace pointer instead. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
547b792c |
|
25-Jul-2008 |
Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> |
net: convert BUG_TRAP to generic WARN_ON Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
888c848e |
|
22-Jul-2008 |
Adrian Bunk <bunk@kernel.org> |
ipv6: make struct ipv6_devconf static struct ipv6_devconf can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
702beb87 |
|
20-Jul-2008 |
David Miller <davem@davemloft.net> |
ipv6: Fix warning in addrconf code. Reported by Linus. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
53b7997f |
|
19-Jul-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
ipv6 netns: Make several "global" sysctl variables namespace aware. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
05297949 |
|
09-Jul-2008 |
David S. Miller <davem@davemloft.net> |
pkt_sched: Add qdisc_tx_is_noop() helper and use in IPV6. This indicates if the NOOP scheduler is what is active for TX on a given device. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b0e1e646 |
|
08-Jul-2008 |
David S. Miller <davem@davemloft.net> |
netdev: Move rest of qdisc state into struct netdev_queue Now qdisc, qdisc_sleeping, and qdisc_list also live there. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b2238566 |
|
08-Jul-2008 |
Andrey Vagin <avagin@parallels.com> |
ipv6: fix race between ipv6_del_addr and DAD timer Consider the following scenario: ipv6_del_addr(ifp) ipv6_ifa_notify(RTM_DELADDR, ifp) ip6_del_rt(ifp->rt) after returning from the ipv6_ifa_notify and enabling BH-s back, but *before* calling the addrconf_del_timer the ifp->timer fires and: addrconf_dad_timer(ifp) addrconf_dad_completed(ifp) ipv6_ifa_notify(RTM_NEWADDR, ifp) ip6_ins_rt(ifp->rt) then return back to the ipv6_del_addr and: in6_ifa_put(ifp) inet6_ifa_finish_destroy(ifp) dst_release(&ifp->rt->u.dst) After this we have an ifp->rt inserted into fib6 lists, but queued for gc, which in turn can result in oopses in the fib6_run_gc. Maybe some other nasty things, but we caught only the oops in gc so far. The solution is to disarm the ifp->timer before flushing the rt from it. Signed-off-by: Andrey Vagin <avagin@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1b34be74 |
|
27-Jun-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
ipv6 addrconf: add accept_dad sysctl to control DAD operation. - If 0, disable DAD. - If 1, perform DAD (default). - If >1, perform DAD and disable IPv6 operation if DAD for MAC-based link-local address has been failed (RFC4862 5.4.5). We do not follow RFC4862 by default. Refer to the netdev thread entitled "Linux IPv6 DAD not full conform to RFC 4862 ?" http://www.spinics.net/lists/netdev/msg52027.html Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
778d80be |
|
27-Jun-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
d68b8270 |
|
25-Jun-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
ipv6: Do not assign non-valid address on interface. Check the type of the address when adding a new one on interface. - the unspecified address (::) is always disallowed (RFC4291 2.5.2) - the loopback address is disallowed unless the interface is (one of) loopback (RFC4291 2.5.3). - multicast addresses are disallowed. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
0187bdfb |
|
19-Jun-2008 |
Ben Hutchings <bhutchings@solarflare.com> |
net: Disable LRO on devices that are forwarding Large Receive Offload (LRO) is only appropriate for packets that are destined for the host, and should be disabled if received packets may be forwarded. It can also confuse the GSO on output. Add dev_disable_lro() function which uses the appropriate ethtool ops to disable LRO if enabled. Add calls to dev_disable_lro() in br_add_if() and functions that enable IPv4 and IPv6 forwarding. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0b040829 |
|
10-Jun-2008 |
Adrian Bunk <bunk@kernel.org> |
net: remove CVS keywords This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3de23255 |
|
28-May-2008 |
Benjamin Thery <benjamin.thery@bull.net> |
ipv6 netns: Address labels per namespace This pacth makes IPv6 address labels per network namespace. It keeps the global label tables, ip6addrlbl_table, but adds a 'net' member to each ip6addrlbl_entry. This new member is taken into account when matching labels. Changelog ========= * v1: Initial version * v2: * Minize the penalty when network namespaces are not configured: * the 'net' member is added only if CONFIG_NET_NS is defined. This saves space when network namespaces are not configured. * 'net' value is retrieved with the inlined function ip6addrlbl_net() that always return &init_net when CONFIG_NET_NS is not defined. * 'net' member in ip6addrlbl_entry renamed to the less generic 'lbl_net' name (helps code search). Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
2b5ead46 |
|
12-May-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
ipv6 addrconf: Introduce addrconf_is_prefix_route() helper. This inline function, for readability, returns if the route is a "prefix" route regardless if it was installed by RA or by hand. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
4bed72e4 |
|
27-May-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Allow longer lifetime on 64bit archs. - Allow longer lifetimes (>= 0x7fffffff/HZ) on 64bit archs by using unsigned long. - Shadow this arithmetic overflow workaround by introducing helper functions: addrconf_timeout_fixup() and addrconf_finite_timeout(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
24ef0da7 |
|
28-May-2008 |
Thomas Graf <tgraf@suug.ch> |
[IPV6] ADDRCONF: Check range of prefix length As of now, the prefix length is not vaildated when adding or deleting addresses. The value is passed directly into the inet6_ifaddr structure and later passed on to memcmp() as length indicator which relies on the value never to exceed 128 (bits). Due to the missing check, the currently code allows for any 8 bit value to be passed on as prefix length while using the netlink interface, and any 32 bit value while using the ioctl interface. [Use unsigned int instead to generate better code - yoshfuji] Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
6f704992 |
|
19-May-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
ipv6 addrconf: Allow infinite prefix lifetime. We need to handle infinite prefix lifetime specially. With help from original reporter "Bonitch, Joseph" <Joseph.Bonitch@xerox.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0686caa3 |
|
19-May-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
ndisc: Add missing strategies for per-device retrans timer/reachable time settings. Noticed from Al Viro <viro@ftp.linux.org.uk> via David Miller <davem@davemloft.net>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
92998dd4 |
|
21-Apr-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS]: Remove empty ->init callback. The netns start-stop engine can happily live with any of init or exit callbacks set to NULL. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9625ed72 |
|
14-Apr-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Don't generate temporary address for ip6-ip6 interface. As far as I can remember, I was going to disable privacy extensions on all "tunnel" interfaces. Disable it on ip6-ip6 interface as well. Also, just remove ifdefs for SIT for simplicity. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b077d7ab |
|
14-Apr-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Ensure disabling multicast RS even if privacy extensions are disabled. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cee89473 |
|
14-Apr-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] MROUTE: Do not call ipv6_find_idev() directly. Since NETDEV_REGISTER notifier chain is responsible for creating inet6_dev{}, we do not need to call ipv6_find_idev() directly here. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d7aabf22 |
|
10-Apr-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Use in6addr_any where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
f3ee4010 |
|
10-Apr-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Define constants for link-local multicast addresses. - Define link-local all-node / all-router multicast addresses. - Remove ipv6_addr_all_nodes() and ipv6_addr_all_routers(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
9acd9f3a |
|
10-Apr-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Make address arguments const. - net/ipv6/addrconf.c: ipv6_get_ifaddr(), ipv6_dev_get_saddr() - net/ipv6/mcast.c: ipv6_sock_mc_join(), ipv6_sock_mc_drop(), inet6_mc_check(), ipv6_dev_mc_inc(), __ipv6_dev_mc_dec(), ipv6_dev_mc_dec(), ipv6_chk_mcast_addr() - net/ipv6/route.c: rt6_lookup(), icmp6_dst_alloc() - net/ipv6/ip6_output.c: ip6_nd_hdr() - net/ipv6/ndisc.c: ndisc_send_ns(), ndisc_send_rs(), ndisc_send_redirect(), ndisc_get_neigh(), __ndisc_send() Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
dfd982ba |
|
10-Apr-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Uninline ipv6_isatap_eui64(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
3eb84f49 |
|
10-Apr-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Uninline ipv6_addr_hash(). The function is only used in net/ipv6/addrconf.c. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
caad295f |
|
10-Apr-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Use ipv6_addr_equal() instead of !ipv6_addr_cmp(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
7bc570c8 |
|
02-Apr-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] MROUTE: Support multicast forwarding. Based on ancient patch by Mickael Hoerdt <hoerdt@clarinet.u-strasbg.fr>, which is available at <http://www-r2.u-strasbg.fr/~hoerdt/dev/linux_ipv6_mforwarding/patch-linux-ipv6-mforwarding-0.1a>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
f6a07b29 |
|
24-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Fix array size for sysctls. We have been using __NET_IPV6_MAX for adjusting the size of array for sysctl table, but it does not work any longer because of the deprecation of NET_IPV6_xxx constants. Let's use DEVCONF_MAX instead. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
eb867579 |
|
03-Apr-2008 |
Denis V. Lunev <den@openvz.org> |
[IPV6]: inet6_dev on loopback should be kept until namespace stop. In the other case it will be destroyed when last address will be removed from lo inside a namespace. This will break IPv6 in several places. The most obvious one is ip6_dst_ifdown. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
439e2385 |
|
03-Apr-2008 |
Denis V. Lunev <den@openvz.org> |
[IPV6]: Event type in addrconf_ifdown is mis-used. addrconf_ifdown is broken in respect to the usage of how parameter. This function is called with (event != NETDEV_DOWN) and (2) on the IPv6 stop. It the latter case inet6_dev from loopback device should be destroyed. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52eeeb84 |
|
15-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Unify ip6_onlink() and ipip6_onlink(). Both are identical, let's create ipv6_chk_prefix() and use it in both places.
|
#
eac55bf9 |
|
02-Apr-2008 |
Benoit Boissinot <benoit.boissinot@ens-lyon.org> |
IPv6: do not create temporary adresses with too short preferred lifetime From RFC341: A temporary address is created only if this calculated Preferred Lifetime is greater than REGEN_ADVANCE time units. In particular, an implementation must not create a temporary address with a zero Preferred Lifetime. Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c6fbfac2 |
|
02-Apr-2008 |
Benoit Boissinot <benoit.boissinot@ens-lyon.org> |
IPv6: only update the lifetime of the relevant temporary address When receiving a prefix information from a routeur, only update the lifetimes of the temporary address associated with that prefix. Otherwise if one deprecated prefix is advertized, all your temporary addresses will become deprecated. Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
878628fb |
|
25-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] NETNS: Omit namespace comparision without CONFIG_NET_NS. Introduce an inline net_eq() to compare two namespaces. Without CONFIG_NET_NS, since no namespace other than &init_net exists, it is always 1. We do not need to convert 1) inline vs inline and 2) inline vs &init_net comparisons. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
1218854a |
|
25-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] NETNS: Omit seq_net_private->net without CONFIG_NET_NS. Without CONFIG_NET_NS, no namespace other than &init_net exists, no need to store net in seq_net_private. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
3b1e0a65 |
|
25-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
c346dca1 |
|
25-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
7cbca67c |
|
24-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Support Source Address Selection API (RFC5014). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
a9b05723 |
|
01-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Clean-up ipv6_dev_get_saddr(). old: | text data bss dec hex filename | 28599 1416 96 30111 759f net/ipv6/addrconf.o new: | text data bss dec hex filename | 28007 1416 96 29519 734f net/ipv6/addrconf.o Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
0dc47877 |
|
05-Mar-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a05c44f6 |
|
05-Mar-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[IPV6]: Remove commented lines. Remove commented lines from netns patchset. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6fda7350 |
|
05-Mar-2008 |
Benjamin Thery <benjamin.thery@bull.net> |
[NETNS][IPV6] addrconf - make addrconf per namespace All the infrastructure to propagate the network namespace information is ready. Make use of it. There is a special case here between the initial network namespace and the other namespaces: * When ipv6 is initialized at boot time (aka in the init_net), it registers to the notifier callback. So addrconf_notify will be called as many time as there are network devices setup on the system and the function will add ipv6 addresses to the network devices. But the first device which needs to have its ipv6 address setup is the loopback, unfortunatly this is not the case. So the loopback address is setup manually in the ipv6 init function. * With the network namespace, this ordering problem does not appears because notifier is already setup and active, so as soon as we register the loopback the ipv6 address is setup and it will be the first device. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
af284937 |
|
05-Mar-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6] addrconf - Pass the proper network namespace parameters to addrconf This patch propagates the network namespace pointer to the address configuration routines which need it, which means adding a new parameter to these functions, and make them use it instead of using the initial network namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8ed67789 |
|
04-Mar-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6] rt6_info - move rt6_info structure inside the namespace The rt6_info structures are moved inside the network namespace structure. All references to these structures are now relative to the initial network namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bdb3289f |
|
04-Mar-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6] rt6_info - make rt6_info accessed as a pointer This patch make mindless changes and prepares the code to use dynamic allocation for rt6_info structure. The code accesses the rt6_info structure as a pointer instead of a global static variable. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7b4da532 |
|
04-Mar-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6] route6 - Pass the network namespace parameter to rt6_purge_dflt_routers Add a network namespace parameter to rt6_purge_dflt_routers. This is needed to call fib6_get_table with the appropriate network namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
606a2b48 |
|
04-Mar-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6] route6 - Pass the network namespace parameter to rt6_lookup Add a network namespace parameter to rt6_lookup(). Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3db4851 |
|
04-Mar-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6] ip6_fib - fib6_clean_all handle several network namespaces The function fib6_clean_all takes the network namespace as parameter. That allows to flush the routes related to a specific network namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5e5f3f0f |
|
03-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Convert ipv6_get_saddr() to ipv6_dev_get_saddr(). Since most users of ipv6_get_saddr() pass non-NULL as dst argument, use ipv6_dev_get_saddr() directly. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
99cd07a5 |
|
28-Feb-2008 |
Juha-Matti Tapio <jmtapio@verkkotelakka.net> |
[IPV6]: Fix source address selection for ORCHID addresses Skip the prefix length matching in source address selection for orchid -> non-orchid addresses. Overlay Routable Cryptographic Hash IDentifiers (RFC 4843, 2001:10::/28) are currenty not globally reachable. Without this check a host with an ORCHID address can end up preferring those over regular addresses when talking to other regular hosts in the 2001::/16 range thus breaking non-orchid connections. Signed-off-by: Juha-Matti Tapio <jmtapio@verkkotelakka.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f1243c2d |
|
26-Feb-2008 |
Benjamin Thery <benjamin.thery@bull.net> |
[IPV6]: Add missing initializations of the new nl_info.nl_net field Add some more missing initializations of the new nl_info.nl_net field in IPv6 stack. This field will be used when network namespaces are fully supported. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d5619b4 |
|
22-Jan-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Sparse: Make inet6_dump_addr() code paths more straight-forward. Fix the following sparse warning: | net/ipv6/addrconf.c:3384:2: warning: context imbalance in 'inet6_dump_addr' - different lock contexts for basic block Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
d20b3109 |
|
21-Jan-2008 |
Stephen Hemminger <shemminger@vyatta.com> |
[IPV6]: addrconf sparse warnings Get rid of a couple of sparse warnings in IPV6 addrconf code. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
389f6612 |
|
10-Jan-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6]: inet6_addr - make ipv6_chk_home_addr namespace aware Looks if the address is belonging to the network namespace, otherwise discard the address for the check. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1cab3da6 |
|
10-Jan-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6]: inet6_addr - ipv6_get_ifaddr namespace aware The inet6_addr_lst is browsed taking into account the network namespace specified as parameter. If an address does not belong to the specified namespace, it is ignored. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
06bfe655 |
|
10-Jan-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6]: inet6_addr - ipv6_chk_same_addr namespace aware This patch makes ipv6_chk_same_addr function to be aware of the network namespace. The addresses not belonging to the network namespace are discarded. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bfeade08 |
|
10-Jan-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6]: inet6_addr - check ipv6 address per namespace When a new address is added, we must check if the new address does not already exists. This patch makes this check to be aware of a network namespace, so the check will look if the address already exists for the specified network namespace. While the addresses are browsed, the addresses which do not belong to the namespace are discarded. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3c40090a |
|
10-Jan-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6]: inet6_addr - isolate inet6 addresses from proc file Make /proc/net/if_inet6 show only inet6 addresses belonging to the namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e186932b |
|
10-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS]: Use the per-net ipv6_devconf(_all) in sysctl handlers Actually the net->ipv6.devconf_all can be used in a few places, but to keep the /proc/sys/net/ipv6/conf/ sysctls work consistently in the namespace we should use the per-net devconf_all in the sysctl "forwarding" handler. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
441fc2a2 |
|
10-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS]: Use the per-net ipv6_devconf_dflt All its users are in net/ipv6/addrconf.c's sysctl handlers. Since they already have the struct net to get from, the per-net ipv6_devconf_dflt can already be used. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e0da5a48 |
|
10-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS]: Create ipv6 devconf-s for namespaces This is the core. Declare and register the pernet subsys for addrconf. The init callback the will create the devconf-s. The init_net will reuse the existing statically declared confs, so that accessing them from inside the ipv6 code will still work. The register_pernet_subsys() is moved above the ipv6_add_dev() call for loopback, because this function will need the net->devconf_dflt pointer to be already set. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bff16c2f |
|
10-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS]: Make the ctl-tables per-namespace This includes passing the net to __addrconf_sysctl_register and saving this on the ctl_table->extra2 to be used in handlers (those, needing it). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
95897312 |
|
10-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS]: Make the __addrconf_sysctl_register return an error This error code will be needed to abort the namespace creation if needed. Probably, this is to be checked when a new device is created (currently it is ignored). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
408c4768 |
|
10-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS]: Clean out the ipv6-related sysctls creation/destruction The addrconf sysctls and neigh sysctls are registered and unregistered always in pairs, so they can be joined into one (well, two) functions, that accept the struct inet6_dev and do all the job. This also get rids of unneeded ifdefs inside the code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
09f7709f |
|
13-Dec-2007 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[IPV6]: fix section mismatch warnings Removed useless and buggy __exit section in the different ipv6 subsystems. Otherwise they will be called inside an init section during rollbacking in case of an error in the protocol initialization. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c69bce20 |
|
23-Jan-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET]: Remove unused "mibalign" argument for snmp_mib_init(). With fixes from Arnaldo Carvalho de Melo. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c8fecf22 |
|
05-Dec-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[IPV6]: Eliminate difference in actions of sysctl and proc handler for conf.all.forwarding The only difference in this case is that updating all.forwarding causes the update in default.forwarding when done via proc, but not via the system call. Besides, this consolidates a good portion of code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1dab6222 |
|
01-Dec-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[IPV6]: Use ctl paths to register addrconf sysctls This looks very much like the patch for ipv4's devinet. This is also intended to help us with the net namespaces and saves the ipv6.ko size by ~320 bytes. The difference from the first version is just the patch offsets, that changed due to changes in the patch #2. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f52295a9 |
|
01-Dec-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[IPV6]: Unify and cleanup calls to addrconf_sysctl_register Currently this call is (ab)used similar to devinet one - it registers sysctls for devices and for the "default" confs, while the "all" sysctls are registered separately. But unlike its devinet brother, the passed inet6_device is needed. The fix is to make a __addrconf_sysctl_register(), which registers sysctls for all "devices" we need, including "default" and "all" :) The original addrconf_sysctl_register() calls the introduced function, passing the inet6_device, device name and ifindex (to be used as procname and ctl_name) into it. Thanks to Herbert again for pointing out, that we can shrink the argument list to 1 :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f68635e6 |
|
01-Dec-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[IPV6]: Cleanup the addconf_sysctl_register This only includes fixing the space-indented lines and removing one unneeded else after the goto. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c7dc89c0 |
|
29-Nov-2007 |
Fred L. Templin <fred.l.templin@boeing.com> |
[IPV6]: Add RFC4214 support This patch includes support for the Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) per RFC4214. It uses the SIT module, and is configured using extensions to the "iproute2" utility. The diffs are specific to the Linux 2.6.24-rc2 kernel distribution. This version includes the diff for ./include/linux/if.h which was missing in the v2.4 submission and is needed to make the patch compile. The patch has been installed, compiled and tested in a clean 2.6.24-rc2 kernel build area. Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
97c53cac |
|
19-Nov-2007 |
Denis V. Lunev <den@openvz.org> |
[NET]: Make rtnetlink infrastructure network namespace aware (v3) After this patch none of the netlink callback support anything except the initial network namespace but the rtnetlink infrastructure now handles multiple network namespaces. Changes from v2: - IPv6 addrlabel processing Changes from v1: - no need for special rtnl_unlock handling - fixed IPv6 ndisc Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b854272b |
|
30-Nov-2007 |
Denis V. Lunev <den@openvz.org> |
[NET]: Modify all rtnetlink methods to only work in the initial namespace (v2) Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. Changes from v1: - added IPv6 addrlabel protection Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
#
2a8cc6c8 |
|
13-Nov-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table. Policy table is implemented as an RCU linear list since we do not expect large list nor frequent updates. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
303065a8 |
|
13-Nov-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Allow address selection policy with ifindex. This patch allows ifindex to be a key for address selection policy table. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c1ee656c |
|
13-Nov-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Rename ipv6_saddr_label() to ipv6_addr_label(). This patch renames ipv6_saddr_label() to ipv6_addr_label() because address label is used for both of source address and destination address. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b24b8a24 |
|
23-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NET]: Convert init_timer into setup_timer Many-many code in the kernel initialized the timer->function and timer->data together with calling init_timer(timer). There is already a helper for this. Use it for networking code. The patch is HUGE, but makes the code 130 lines shorter (98 insertions(+), 228 deletions(-)). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d31c7b8f |
|
30-Nov-2007 |
Evgeniy Polyakov <johnpol@2ka.mipt.ru> |
[IPV6]: Restore IPv6 when MTU is big enough Avaid provided test application, so bug got fixed. IPv6 addrconf removes ipv6 inner device from netdev each time cmu changes and new value is less than IPV6_MIN_MTU (1280 bytes). When mtu is changed and new value is greater than IPV6_MIN_MTU, it does not add ipv6 addresses and inner device bac. This patch fixes that. Tested with Avaid's application, which works ok now. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
#
3b6d821c |
|
20-Nov-2007 |
Joe Perches <joe@perches.com> |
[IPV6]: Add missing "space" Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1675c7b2 |
|
30-Oct-2007 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[IPV6]: remove duplicate call to proc_net_remove The file /proc/net/if_inet6 is removed twice. First time in: inet6_exit ->addrconf_cleanup And followed a few lines after by: inet6_exit -> if6_proc_exit Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aaf70ec7 |
|
17-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[IPV6]: Cleanup snmp6_alloc_dev() This functions is never called with NULL or not setup argument, so the checks inside are redundant. Also, the return value is always -ENOMEM, so no need in additional variable for this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
16910b98 |
|
17-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[IPV6]: Fix return type for snmp6_free_dev() This call is essentially void. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f24e3d65 |
|
10-Oct-2007 |
Mitsuru Chinen <mitch@linux.vnet.ibm.com> |
[IPV6]: Defer IPv6 device initialization until a valid qdisc is specified To judge the timing for DAD, netif_carrier_ok() is used. However, there is a possibility that dev->qdisc stays noop_qdisc even if netif_carrier_ok() returns true. In that case, DAD NS is not sent out. We need to defer the IPv6 device initialization until a valid qdisc is specified. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cf7732e4 |
|
10-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[NET]: Make core networking code use seq_open_private This concerns the ipv4 and ipv6 code mostly, but also the netlink and unix sockets. The netlink code is an example of how to use the __seq_open_private() call - it saves the net namespace on this private. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2774c7ab |
|
26-Sep-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Make the loopback device per network namespace. This patch makes loopback_dev per network namespace. Adding code to create a different loopback device for each network namespace and adding the code to free a loopback device when a network namespace exits. This patch modifies all users the loopback_dev so they access it as init_net.loopback_dev, keeping all of the code compiling and working. A later pass will be needed to update the users to use something other than the initial network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
de3cb747 |
|
25-Sep-2007 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NET]: Dynamically allocate the loopback device, part 1. This patch replaces all occurences to the static variable loopback_dev to a pointer loopback_dev. That provides the mindless, trivial, uninteressting change part for the dynamic allocation for the loopback. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-By: Kirill Korotaev <dev@sw.ru> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
14878f75 |
|
16-Sep-2007 |
David L Stevens <dlstevens@us.ibm.com> |
[IPV6]: Add ICMPMsgStats MIB (RFC 4293) [rev 2] Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches "remove" (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for "Icmp" to get the named data from new counters. [new to 2nd revision] 6) support per-interface ICMP stats 7) use common macro for per-device stat macros Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0b69d4bd |
|
15-Sep-2007 |
Milan Kocian <milon@wq.cz> |
[IPV6]: Remove redundant RTM_DELLINK message. Remove useless message. We get the right message from another subsystem. Signed-off-by: Milan Kocian <milon@wq.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
881d966b |
|
17-Sep-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e9dc8653 |
|
12-Sep-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Make device event notification network namespace safe Every user of the network device notifiers is either a protocol stack or a pseudo device. If a protocol stack that does not have support for multiple network namespaces receives an event for a device that is not in the initial network namespace it quite possibly can get confused and do the wrong thing. To avoid problems until all of the protocol stacks are converted this patch modifies all netdev event handlers to ignore events on devices that are not in the initial network namespace. As the rest of the code is made network namespace aware these checks can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
457c4cbc |
|
11-Sep-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Make /proc/net per network namespace This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ae5f983 |
|
16-Sep-2007 |
Jiri Kosina <jkosina@suse.cz> |
[IPV6]: Fix source address selection. The commit 95c385 broke proper source address selection for cases in which there is a address which is makred 'deprecated'. The commit mistakenly changed ifa->flags to ifa_result->flags (probably copy/paste error from a few lines above) in the 'Rule 3' address selection code. The patch restores the previous RFC-compliant behavior. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b217d616 |
|
30-Jul-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV4/IPV6]: Fail registration if inet device construction fails Now that netdev notifications can fail, we can use this to signal errors during registration for IPv4/IPv6. In particular, if we fail to allocate memory for the inet device, we can fail the netdev registration. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
063ed369 |
|
15-Jul-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[IPV6]: Call inet6addr_chain notifiers on link down Currently if the link is brought down via ip link or ifconfig down, the inet6addr_chain notifiers are not called even though all the addresses are removed from the interface. This caused SCTP to add duplicate addresses to it's list. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
56b3d975 |
|
11-Jul-2007 |
Philippe De Muyter <phdm@macqel.be> |
[NET]: Make all initialized struct seq_operations const. Make all initialized struct seq_operations in net/ const Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dffe4f04 |
|
11-Jul-2007 |
Micah Gruber <micah.gruber@gmail.com> |
[IPV6]: Remove unneeded pointer idev from addrconf_cleanup(). This trivial patch removes the unneeded pointer idev returned from __in6_dev_get(), which is never used. The check for NULL can be simply done by if (__in6_dev_get(dev) == NULL). Signed-off-by: Micah Gruber <micah.gruber@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
59fbb3a6 |
|
27-Jun-2007 |
Masahide NAKAMURA <nakam@linux-ipv6.org> |
[IPV6] MIP6: Loadable module support for MIPv6. This patch makes MIPv6 loadable module named "mip6". Here is a modprobe.conf(5) example to load it automatically when user application uses XFRM state for MIPv6: alias xfrm-type-10-43 mip6 alias xfrm-type-10-60 mip6 Some MIPv6 feature is not included by this modular, however, it should not be affected to other features like either IPsec or IPv6 with and without the patch. We may discuss XFRM, MH (RAW socket) and ancillary data/sockopt separately for future work. Loadable features: * MH receiving check (to send ICMP error back) * RO header parsing and building (i.e. RH2 and HAO in DSTOPTS) * XFRM policy/state database handling for RO These are NOT covered as loadable: * Home Address flags and its rule on source address selection * XFRM sub policy (depends on its own kernel option) * XFRM functions to receive RO as IPv6 extension header * MH sending/receiving through raw socket if user application opens it (since raw socket allows to do so) * RH2 sending as ancillary data * RH2 operation with setsockopt(2) Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c2edacf8 |
|
09-Jul-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding / ipv6: no addrconf for slaves separately from master At present, when a device is enslaved to bonding, if ipv6 is active then addrconf will be initated on the slave (because it is closed then opened during the enslavement processing). This causes DAD and RS packets to be sent from the slave. These packets in turn can confuse switches that perform ipv6 snooping, causing them to incorrectly update their forwarding tables (if, e.g., the slave being added is an inactve backup that won't be used right away) and direct traffic away from the active slave to a backup slave (where the incoming packets will be dropped). This patch alters the behavior so that addrconf will only run on the master device itself. I believe this is logically correct, as it prevents slaves from having an IPv6 identity independent from the master. This is consistent with the IPv4 behavior for bonding. This is accomplished by (a) having bonding set IFF_SLAVE sooner in the enslavement processing than currently occurs (before open, not after), and (b) having ipv6 addrconf ignore UP and CHANGE events on slave devices. The eql driver also uses the IFF_SLAVE flag. I inspected eql, and I believe this change is reasonable for its usage of IFF_SLAVE, but I did not test it. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
74235a25 |
|
14-Jun-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV6] addrconf: Fix IPv6 on tuntap tunnels The recent patch that added ipv6_hwtype is broken on tuntap tunnels. Indeed, it's broken on any device that does not pass the ipv6_hwtype test. The reason is that the original test only applies to autoconfiguration, not IPv6 support. IPv6 support is allowed on any device. In fact, even with the ipv6_hwtype patch applied you can still add IPv6 addresses to any interface that doesn't pass thw ipv6_hwtype test provided that they have a sufficiently large MTU. This is a serious problem because come deregistration time these devices won't be cleaned up properly. I've gone back and looked at the rationale for the patch. It appears that the real problem is that we were creating IPv6 devices even if the MTU was too small. So here's a patch which fixes that and reverts the ipv6_hwtype stuff. Thanks to Kanru Chen for reporting this issue. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ef7c79ed |
|
05-Jun-2007 |
Patrick McHardy <kaber@trash.net> |
[NETLINK]: Mark netlink policies const Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bbb711e6 |
|
23-May-2007 |
Oliver Hartkopp <socketcan@hartkopp.net> |
[IPV6]: Ignore ipv6 events on non-IPV6 capable devices. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Urs Thuermann <urs@isnogud.escape.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9a6bf6fe |
|
09-May-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ROUTE: Assign rt6i_idev for ip6_{prohibit,blk_hole}_entry. I think this is less critical, but is also suitable for -stable release. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7562f876 |
|
03-May-2007 |
Pavel Emelianov <xemul@openvz.org> |
[NET]: Rework dev_base via list_head (v3) Cleanup of dev_base list use, with the aim to simplify making device list per-namespace. In almost every occasion, use of dev_base variable and dev->next pointer could be easily replaced by for_each_netdev loop. A few most complicated places were converted to using first_netdev()/next_netdev(). Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5632c515 |
|
28-Apr-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[IPV6]: Track device renames in snmp6. When network device's are renamed, the IPV6 snmp6 code gets confused. It doesn't track name changes so it will OOPS when network device's are removed. The fix is trivial, just unregister/re-register in notify handler. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df8981dc |
|
24-Apr-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Export in6addr_any for future use. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
7f7d9a6b |
|
24-Apr-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV6]: Consolidate common SNMP code This patch moves the non-proc SNMP code into addrconf.c and reuses IPv4 SNMP code where applicable. As a result we can skip proc.o if /proc is disabled. Note that I've made a number of functions static since they're only used by addrconf.c for now. If they ever get used elsewhere we can always remove the static. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ff50b79 |
|
20-Apr-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[NET]: cleanup extra semicolons Spring cleaning time... There seems to be a lot of places in the network code that have extra bogus semicolons after conditionals. Most commonly is a bogus semicolon after: switch() { } Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bf99f1bd |
|
20-Apr-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] SNMP: Netlink interface. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6313c1e0 |
|
16-Apr-2007 |
Patrick McHardy <kaber@trash.net> |
[RTNETLINK]: Remove unnecessary locking in dump callbacks Since we're now holding the rtnl during the entire dump operation, we can remove additional locking for rtnl protected data. This patch does that for all simple cases (dev_base_lock for dev_base walking, RCU protection for FIB rule dumping). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c127ea2c |
|
22-Mar-2007 |
Thomas Graf <tgraf@suug.ch> |
[IPv6]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
95c385b4 |
|
25-Apr-2007 |
Neil Horman <nhorman@tuxdriver.com> |
[IPV6] ADDRCONF: Optimistic Duplicate Address Detection (RFC 4429) Support. Nominally an autoconfigured IPv6 address is added to an interface in the Tentative state (as per RFC 2462). Addresses in this state remain in this state while the Duplicate Address Detection process operates on them to determine their uniqueness on the network. During this period, these tentative addresses may not be used for communication, increasing the time before a node may be able to communicate on a network. Using Optimistic Duplicate Address Detection, autoconfigured addresses may be used immediately for communication on the network, as long as certain rules are followed to avoid conflicts with other nodes during the Duplicate Address Detection process. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7159039a |
|
22-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Decentralize EXPORT_SYMBOLs. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
0bcbc926 |
|
24-Apr-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Disallow RH0 by default. A security issue is emerging. Disallow Routing Header Type 0 by default as we have been doing for IPv4. Note: We allow RH2 by default because it is harmless. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
53aadcc9 |
|
27-Mar-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV6]: Set IF_READY if the device is up and has carrier We still need to set the IF_READY flag in ipv6_add_dev for the case where all addresses (including the link-local) are deleted and then recreated. In that case the IPv6 device too will be destroyed and then recreated. In order to prevent the original problem, we simply ensure that the device is up before setting IF_READY. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b6f99a21 |
|
22-Mar-2007 |
Dave Jones <davej@redhat.com> |
[NET]: fix up misplaced inlines. Turning up the warnings on gcc makes it emit warnings about the placement of 'inline' in function declarations. Here's everything that was under net/ Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c7ababbd |
|
07-Mar-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV6]: Do not set IF_READY if device is down Now that we add the IPv6 device at registration time we don't need to set IF_READY in ipv6_add_dev anymore because we will always get a NETDEV_UP event later on should the device ever become ready. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2c12a74c |
|
26-Feb-2007 |
Michal Wrobel <xmxwx@asn.pl> |
[IPV6]: anycast refcnt fix This patch fixes a bug in Linux IPv6 stack which caused anycast address to be added to a device prior DAD has been completed. This led to incorrect reference count which resulted in infinite wait for unregister_netdevice completion on interface removal. Signed-off-by: Michal Wrobel <xmxwx@asn.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
45ba9dd2 |
|
14-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Register inet6_dev earlier. Allocate inet6_dev earlier to allow users to set up per-interface variables. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
46d48046 |
|
07-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Manage prefix route corresponding to address manually added. It is more natural to manage prefix routes corresponding to address which is being added manually. With help from Masafumi Aramoto <aramoto@linux-ipv6.org>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
8c14b7ce |
|
21-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Statically link __ipv6_addr_type() for sunrpc subsystem. Link __ipv6_addr_type() statically for sunrpc code even if IPv6 is built as module. Signed-off-by: YOSHIFUJI Hidaki <yoshfuji@linux-ipv6.org>
|
#
3fbfa981 |
|
14-Feb-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables It isn't needed anymore, all of the users are gone, and all of the ctl_table initializers have been converted to use explicit names of the fields they are initializing. [akpm@osdl.org: NTFS fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0b4d4147 |
|
14-Feb-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] sysctl: remove insert_at_head from register_sysctl The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cd354f1a |
|
14-Feb-2007 |
Tim Schmielau <tim@physik3.uni-rostock.de> |
[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9a32144e |
|
12-Feb-2007 |
Arjan van de Ven <arjan@linux.intel.com> |
[PATCH] mark struct file_operations const 7 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1ab1457c |
|
09-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] IPV6: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
26932566 |
|
01-Feb-2007 |
Patrick McHardy <kaber@trash.net> |
[NETLINK]: Don't BUG on undersized allocations Currently netlink users BUG when the allocated skb for an event notification is undersized. While this is certainly a kernel bug, its not critical and crashing the kernel is too drastic, especially when considering that these errors have appeared multiple times in the past and it BUGs even if no listeners are present. This patch replaces BUG by WARN_ON and changes the notification functions to inform potential listeners of undersized allocations using a unique error code (EMSGSIZE). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fa03ef38 |
|
30-Jan-2007 |
Neil Horman <nhorman@tuxdriver.com> |
[IPV6]: Fix up some CONFIG typos Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d88ae4cc |
|
14-Jan-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] MCAST: Fix joining all-node multicast group on device initialization. Join all-node multicast group after assignment of dev->ip6_ptr because it must be assigned when ipv6_dev_mc_inc() is called. This fixes Bug#7817, reported by <gernoth@informatik.uni-erlangen.de>. Closes: 7817 Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
30c4cf57 |
|
04-Jan-2007 |
David L Stevens <dlstevens@us.ibm.com> |
[IPV4/IPV6]: Fix inet{,6} device initialization order. It is important that we only assign dev->ip{,6}_ptr only after all portions of the inet{,6} are setup. Otherwise we can receive packets before the multicast spinlocks et al. are initialized. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f29bcd7 |
|
10-Dec-2006 |
Alexey Dobriyan <adobriyan@gmail.com> |
[PATCH] sysctl: remove unused "context" param Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andi Kleen <ak@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Howells <dhowells@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
af879cc7 |
|
16-Nov-2006 |
Arnaldo Carvalho de Melo <acme@mandriva.com> |
[IPV6]: Use kmemdup Code diff stats: [acme@newtoy net-2.6.20]$ codiff /tmp/ipv6.ko.before /tmp/ipv6.ko.after /pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv6/ip6_output.c: ip6_output | -52 ip6_append_data | +2 2 functions changed, 2 bytes added, 52 bytes removed /pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv6/addrconf.c: addrconf_sysctl_register | -27 1 function changed, 27 bytes removed /pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv6/tcp_ipv6.c: tcp_v6_syn_recv_sock | -32 tcp_v6_parse_md5_keys | -24 2 functions changed, 56 bytes removed /tmp/ipv6.ko.after: 5 functions changed, 2 bytes added, 135 bytes removed [acme@newtoy net-2.6.20]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
|
#
e69a4adc |
|
14-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[IPV6]: Misc endianness annotations. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6051e2f4 |
|
14-Nov-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPv6] prefix: Convert RTM_NEWPREFIX notifications to use the new netlink api RTM_GETPREFIX is completely unused and is thus removed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
04561c1f |
|
14-Nov-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPv6] iflink: Convert IPv6's RTM_GETLINK to use the new netlink api By replacing the current method of exporting the device configuration which included allocating a temporary buffer, copying ipv6_devconf into it and copying that buffer into the message with a method that uses nla_reserve() allowing to copy the device configuration directly into the skb data buffer, a GFP_ATOMIC allocation could be removed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
339bf98f |
|
10-Nov-2006 |
Thomas Graf <tgraf@suug.ch> |
[NETLINK]: Do precise netlink message allocations where possible Account for the netlink message header size directly in nlmsg_new() instead of relying on the caller calculate it correctly. Replaces error handling of message construction functions when constructing notifications with bug traps since a failure implies a bug in calculating the size of the skb. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7a3025b1 |
|
13-Oct-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Introduce ip6_dst_idev() to get inet6_dev{} stored in dst_entry{}. Otherwise, we will see a lot of casts... Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
0be669bb |
|
10-Oct-2006 |
Joerg Roedel <joro-lkml@zlug.org> |
[IPV6]: Seperate sit driver to extra module (addrconf.c changes) This patch contains the changes to net/ipv6/addrconf.c to remove sit specific code if the sit driver is not selected. Signed-off-by: Joerg Roedel <joro-lkml@zlug.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
82103232 |
|
27-Sep-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[IPV4]: inet_rcv_saddr() annotations inet_rcv_saddr() returns net-endian Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3b9f9a1c |
|
22-Sep-2006 |
Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> |
[IPV6] ADDRCONF: Mobile IPv6 Home Address support. IFA_F_HOMEADDRESS is introduced for Mobile IPv6 Home Addresses on Mobile Node. The IFA_F_HOMEADDRESS flag should be set for Mobile IPv6 Home Addresses for 2 purposes. 1) We need to check this on receipt of Type 2 Routing Header (RFC3775 Secion 6.4), 2) We prefer Home Address(es) in source address selection (RFC3484 Section 5 Rule 4). Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
55ebaef1 |
|
22-Sep-2006 |
Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> |
[IPV6] ADDRCONF: Allow non-DAD'able addresses. IFA_F_NODAD flag, similar to IN6_IFF_NODAD in BSDs, is introduced to skip DAD. This flag should be set to Mobile IPv6 Home Address(es) on Mobile Node because DAD would fail if we should perform DAD; our Home Agent protects our Home Address(es). Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8814c4b5 |
|
22-Sep-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Convert addrconf_lock to RCU. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fbea49e1 |
|
22-Sep-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] NDISC: Add proxy_ndp sysctl. We do not always need proxy NDP functionality even we enable forwarding. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7198f8ce |
|
18-Sep-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPV6] address: Support NLM_F_EXCL when adding addresses iproute2 doesn't provide the NLM_F_CREATE flag when adding addresses, it is assumed to be implied. The existing code issues a check on said flag when the modify operation fails (likely due to ENOENT) before continueing to create it, this leads to a hard to predict result, therefore the NLM_F_CREATE check is removed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
680a27a2 |
|
18-Sep-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPV6] address: Allow address changes while device is administrative down Same behaviour as IPv4, using IFF_UP is a no-no anyway. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0ab6803b |
|
18-Sep-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPV6] address: Convert address dumping to new netlink api Replaces INET6_IFADDR_RTA_SPACE with a new function calculating the total required message size for all address messages. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
101bb229 |
|
18-Sep-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPV6] address: Add put_ifaddrmsg() and rt_scope() Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
85486af0 |
|
18-Sep-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPV6] address: Add put_cacheinfo() to dump struct cacheinfo Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1b29fc2c |
|
18-Sep-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPV6] address: Convert address lookup to new netlink api Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b933f716 |
|
18-Sep-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPV6] address: Convert address deletion to new netlink api Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
461d8837 |
|
18-Sep-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPV6] address: Convert address addition to new netlink api Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
86872cb5 |
|
22-Aug-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPv6] route: FIB6 configuration using struct fib6_config Replaces the struct in6_rtmsg based interface orignating from the ioctl interface with a struct fib6_config based on. Allows changing the interface without breaking the ioctl interface and avoids passing on tons of parameters. The recently introduced struct nl_info is used to pass on netlink authorship information for notifications. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
40e22e8f |
|
22-Aug-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPv6] route: Simplify ip6_ins_rt() Provide a simple ip6_ins_rt() for the majority of users and an alternative for the exception via netlink. Avoids code obfuscation. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e0a1ad73 |
|
22-Aug-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPv6] route: Simplify ip6_del_rt() Provide a simple ip6_del_rt() for the majority of users and an alternative for the exception via netlink. Avoids code obfuscation. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ab32ea5d |
|
22-Sep-2006 |
Brian Haley <brian.haley@hp.com> |
[NET/IPV4/IPV6]: Change some sysctl variables to __read_mostly Change net/core, ipv4 and ipv6 sysctl variables to __read_mostly. Couldn't actually measure any performance increase while testing (.3% I consider noise), but seems like the right thing to do. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8c384bfa |
|
15-Aug-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPv6] prefix: Convert prefix notifications to use rtnl_notify() Fixes a wrong use of current->pid as netlink pid. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d7a76c9 |
|
15-Aug-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPv6] link: Convert link notifications to use rtnl_notify() Fixes a wrong use of current->pid as netlink pid. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d620266 |
|
15-Aug-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPv6] address: Convert address notification to use rtnl_notify() Fixes a wrong use of current->pid as netlink pid. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2942e900 |
|
15-Aug-2006 |
Thomas Graf <tgraf@suug.ch> |
[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8423a9aa |
|
07-Aug-2006 |
David S. Miller <davem@sunset.davemloft.net> |
[IPV6]: Protect RTM_GETRULE table entry with IPV6_MULTIPLE_TABLES ifdef This is how IPv4 handles this case too. Based upon a patch from Andrew Morton. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1823730f |
|
05-Aug-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPv4]: Move interface address bits to linux/if_addr.h Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
101367c2 |
|
04-Aug-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPV6]: Policy Routing Rules Adds support for policy routing rules including a new local table for routes with a local destination. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c71099ac |
|
05-Aug-2006 |
Thomas Graf <tgraf@suug.ch> |
[IPV6]: Multiple Routing Tables Adds the framework to support multiple IPv6 routing tables. Currently all automatically generated routes are put into the same table. This could be changed at a later point after considering the produced locking overhead. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
57f5f544 |
|
29-Aug-2006 |
Keir Fraser <keir.fraser@cl.cam.ac.uk> |
[IPV6]: ipv6_add_addr should install dstentry earlier ipv6_add_addr allocates a struct inet6_ifaddr and a dstentry, but it doesn't install the dstentry in ifa->rt until after it releases the addrconf_hash_lock. This means other CPUs will be able to see the new address while it hasn't been initialized completely yet. One possible fix would be to grab the ifp->lock spinlock when creating the address struct; a simpler fix is to just move the assignment. Acked-by: jbeulich@novell.com Acked-by: okir@suse.de Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
06aebfb7 |
|
09-Aug-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV6]: The ifa lock is a BH lock The ifa lock is expected to be taken in BH context (by addrconf timers) so we must disable BH when accessing it from user context. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
081bba5b |
|
28-Jul-2006 |
Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> |
[IPV6] ADDRCONF: NLM_F_REPLACE support for RTM_NEWADDR Based on MIPL2 kernel patch. Signed-off-by: Noriaki YAKAMIYA <takamiya@po.ntts.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
6c223828 |
|
28-Jul-2006 |
Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> |
[IPV6] ADDRCONF: Support get operation of single address Based on MIPL2 kernel patch. Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
8f27ebb9 |
|
28-Jul-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Do not verify an address with infinity lifetime We also do not try regenarating new temporary address corresponding to an address with infinite preferred lifetime. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
0778769d |
|
28-Jul-2006 |
Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> |
[IPV6] ADDRCONF: Allow user-space to specify address lifetime Based on MIPL2 kernel patch. Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
64316225 |
|
28-Jul-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Check payload length for IFA_LOCAL attribute in RTM_{ADD,DEL}MSG message Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
8a6ce0c0 |
|
11-Jul-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Use ipv6_addr_src_scope for link address sorting. In the source address selection, the address must be sorted from global to node-local. But, ifp->scope is different from the scope for source address selection. 2001::1 fe80::1 ::1 ifp->scope 0 0x02 0x01 ipv6_addr_src_scope(&ifp->addr) 0x0e 0x02 0x01 So, we need to use ipv6_addr_src_scope(&ifp->addr) for sorting. And, for backward compatibility, addresses should be sorted from new one to old one. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e55ffac6 |
|
10-Jul-2006 |
Brian Haley <brian.haley@hp.com> |
[IPV6]: order addresses by scope If IPv6 addresses are ordered by scope, then ipv6_dev_get_saddr() can break-out of the device addr_list for() loop when the candidate source address scope is less than the destination address scope. Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ab3d562 |
|
30-Jun-2006 |
Jörn Engel <joern@wohnheim.fh-wedel.de> |
Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
5e2707fa |
|
22-Jun-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] ADDRCONF: Fix default source address selection without CONFIG_IPV6_PRIVACY We need to update hiscore.rule even if we don't enable CONFIG_IPV6_PRIVACY, because we have more less significant rule; longest match. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
102128e3 |
|
22-Jun-2006 |
Łukasz Stelmach <stlman@poczta.fm> |
[IPV6]: Fix source address selection. Two additional labels (RFC 3484, sec. 10.3) for IPv6 addreses are defined to make a distinction between global unicast addresses and Unique Local Addresses (fc00::/7, RFC 4193) and Teredo (2001::/32, RFC 4380). It is necessary to avoid attempts of connection that would either fail (eg. fec0:: to 2001:feed::) or be sub-optimal (2001:0:: to 2001:feed::). Signed-off-by: Łukasz Stelmach <stlman@poczta.fm> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c5396a31 |
|
17-Jun-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Sum real space for RTAs. This patch fixes RTNLGRP_IPV6_IFINFO netlink notifications. Issue pointed out by Patrick McHardy <kaber@trash.net>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e041c683 |
|
27-Mar-2006 |
Alan Stern <stern@rowland.harvard.edu> |
[PATCH] Notifier chain update: API changes The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
322f74a4 |
|
21-Mar-2006 |
Ingo Oeser <ioe-lkml@rameria.de> |
[IPV6]: Cleanups for net/ipv6/addrconf.c (kzalloc, early exit) v2 Here are some possible (and trivial) cleanups. - use kzalloc() where possible - invert allocation failure test like if (object) { /* Rest of function here */ } to if (object == NULL) return NULL; /* Rest of function here */ Signed-off-by: Ingo Oeser <ioe-lkml@rameria.de> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
09c884d4 |
|
20-Mar-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: ROUTE: Add accept_ra_rt_info_max_plen sysctl. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52e16356 |
|
20-Mar-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: ROUTE: Add router_probe_interval sysctl. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
930d6ff2 |
|
20-Mar-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: ROUTE: Add accept_ra_rtr_pref sysctl. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c4fd30eb |
|
20-Mar-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: ADDRCONF: Add accept_ra_pinfo sysctl. This controls whether we accept Prefix Information in RAs. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
65f5c7c1 |
|
20-Mar-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: ROUTE: Add accept_ra_defrtr sysctl. This controls whether we accept default router information in RAs. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
073a8e0e |
|
20-Mar-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: ADDRCONF: Split up ipv6_generate_eui64() by device type. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
955189ef |
|
20-Mar-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: ADDRCONF: Use our standard algorithm for randomized ifid. RFC 3041 describes an algorithm to generate random interface identifier. In RFC 3041bis, it is allowed to use different algorithm than one described in RFC 3041. So, let's use our standard pseudo random algorithm to simplify our implementation. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
74a3a0ed |
|
20-Mar-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: TUNNEL6: Don't try to add multicast route twice. Since addrconf_add_dev() has already called addrconf_add_mroute() to added route for multicast prefix, there's no point to call it again in addrconf_ip6_tnl_config(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0d27b427 |
|
11-Mar-2006 |
Brian Haley <brian.haley@hp.com> |
[IPV6]: fix ipv6_saddr_score struct element The scope element in the ipv6_saddr_score struct used in ipv6_dev_get_saddr() is an unsigned integer, but __ipv6_addr_src_scope() returns a signed integer (and can return -1). Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
99081049 |
|
08-Feb-2006 |
Kristian Slavov <kristian.slavov@nomadiclab.com> |
[IPV6]: Address autoconfiguration does not work after device down/up cycle If you set network interface down and up again, the IPv6 address autoconfiguration does not work. 'ip addr' shows that the link-local address is in tentative state. We don't even react to periodical router advertisements. During NETDEV_DOWN we clear IF_READY, and we don't set it back in NETDEV_UP. While starting to perform DAD on the link-local address, we notice that the device is not in IF_READY, and we abort autoconfiguration process (which would eventually send router solicitations). Acked-by: Juha-Matti Tapio <jmtapio@verkkotelakka.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4641e7a3 |
|
02-Feb-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV6]: Don't hold extra ref count in ipv6_ifa_notify Currently the logic in ipv6_ifa_notify is to hold an extra reference count for addrconf dst's that get added to the routing table. Thus, when addrconf dst entries are taken out of the routing table, we need to drop that dst. However, addrconf dst entries may be removed from the routing table by means other than __ipv6_ifa_notify. So we're faced with the choice of either fixing up all places where addrconf dst entries are removed, or dropping the extra reference count altogether. I chose the latter because the ifp itself always holds a dst reference count of 1 while it's alive. This is dropped just before we kfree the ifp object. Therefore we know that in __ipv6_ifa_notify we will always hold that count. This bug was found by Eric W. Biederman. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9343e79a |
|
17-Jan-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Preserve procfs IPV6 address output format Procfs always output IPV6 addresses without the colon characters, and we cannot change that. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
46b86a2d |
|
13-Jan-2006 |
Joe Perches <joe@perches.com> |
[NET]: Use NIP6_FMT in kernel.h There are errors and inconsistency in the display of NIP6 strings. ie: net/ipv6/ip6_flowlabel.c There are errors and inconsistency in the display of NIPQUAD strings too. ie: net/netfilter/nf_conntrack_ftp.c This patch: adds NIP6_FMT to kernel.h changes all code to use NIP6_FMT fixes net/ipv6/ip6_flowlabel.c adds NIPQUAD_FMT to kernel.h fixes net/netfilter/nf_conntrack_ftp.c changes a few uses of "%u.%u.%u.%u" to NIPQUAD_FMT for symmetry to NIP6_FMT Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4fc268d2 |
|
11-Jan-2006 |
Randy Dunlap <rdunlap@infradead.org> |
[PATCH] capable/capability.h (net/) net: Use <linux/capability.h> where capable() is used. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
9f5336e2 |
|
07-Jan-2006 |
Adrian Bunk <bunk@stusta.de> |
[IPV6]: small cleanups This patch contains the following cleanups: - addrconf.c: make addrconf_dad_stop() static - inet6_connection_sock.c should #include <net/inet6_connection_sock.h> for getting the prototypes of it's global functions Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0fa1a53e |
|
14-Dec-2005 |
Arnaldo Carvalho de Melo <acme@mandriva.com> |
[IPV6]: Introduce inet6_timewait_sock Out of tcp6_timewait_sock, that now is just an aggregation of inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct inet_timewait_sock, that is common to the IPv6 transport protocols that use timewait sockets, like DCCP and TCP. tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic code to find the IPv6 area in a timewait sock. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6732bade |
|
27-Dec-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Fix addrconf dead lock. We need to release idev->lcok before we call addrconf_dad_stop(). It calls ipv6_addr_del(), which will hold idev->lock. Bug spotted by Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
291d809b |
|
23-Dec-2005 |
Hiroyuki YAMAMORI <h-yamamo@db3.so-net.ne.jp> |
[IPV6]: Fix Temporary Address Generation From: Hiroyuki YAMAMORI <h-yamamo@db3.so-net.ne.jp> Since regen_count is stored in the public address, we need to reset it when we start renewing temporary address. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3dd3bf83 |
|
23-Dec-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Fix dead lock. We need to relesae ifp->lock before we call addrconf_dad_stop(), which will hold ifp->lock. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1d142804 |
|
21-Dec-2005 |
Kristian Slavov <kristian.slavov@nomadiclab.com> |
[IPV6]: Fix address deletion If you add more than one IPv6 address belonging to the same prefix and delete the address that was last added, routing table entry for that prefix is also deleted. Tested on 2.6.14.4 To reproduce: ip addr add 3ffe::1/64 dev eth0 ip addr add 3ffe::2/64 dev eth0 /* wait DAD */ sleep 1 ip addr del 3ffe::2/64 dev eth0 ip -6 route (route to 3ffe::/64 should be gone) In ipv6_del_addr(), if ifa == ifp, we set ifa->if_next to NULL, and later assign ifap = &ifa->if_next, effectively terminating the for-loop. This prevents us from checking if there are other addresses using the same prefix that are valid, and thus resulting in deletion of the prefix. This applies only if the first entry in idev->addr_list is the address to be deleted. Signed-off-by: Kristian Slavov <kristian.slavov@nomadiclab.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b3ae80a |
|
21-Dec-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Don't select a tentative address as a source address. A tentative address is not considered "assigned to an interface" in the traditional sense (RFC2462 Section 4). Don't try to select such an address for the source address. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
c5e33bdd |
|
21-Dec-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Run DAD when the link becomes ready. If the link was not available when the interface was created, run DAD for pending tentative addresses when the link becomes ready. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
3c21edbd |
|
21-Dec-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Defer IPv6 device initialization until the link becomes ready. NETDEV_UP might be sent even if the link attached to the interface was not ready. DAD does not make sense in such case, so we won't do so. After interface Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
3dd4bc68 |
|
19-Dec-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Fix route lifetime. The route expiration time is stored in rt6i_expires in jiffies. The argument of rt6_route_add() for adding a route is not the expiration time in jiffies nor in clock_t, but the lifetime (or time left before expiration) in clock_t. Because of the confusion, we sometimes saw several strange errors (FAILs) in TAHI IPv6 Ready Logo Phase-2 Self Test. The symptoms were analyzed by Mitsuru Chinen <CHINEN@jp.ibm.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a1493d9c |
|
13-Dec-2005 |
David S. Miller <davem@davemloft.net> |
[IPV6] addrconf: Do not print device pointer in privacy log message. Noticed by Andi Kleen, it is pointless to emit the device structure pointer in the kernel logs like this. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
220bbd74 |
|
28-Nov-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Implement appropriate dummy rule 4 in ipv6_dev_get_saddr(). Ensure to update hiscore.rule in dummy rule 4 in ipv6_dev_get_saddr(). Pointed out by Yan Zheng <yanzheng@21cn.com>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d5780df |
|
20-Nov-2005 |
Yan Zheng <yanzheng@21cn.com> |
[IPV6]: Acquire addrconf_hash_lock for read in addrconf_verify(...) addrconf_verify(...) only traverse address hash table when addrconf_hash_lock is held for writing, and it may hold addrconf_hash_lock for a long time. So I think it's better to acquire addrconf_hash_lock for reading instead of writing Signed-off-by: Yan Zheng <yanzheng@21cn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
12da2a43 |
|
14-Nov-2005 |
Yan Zheng <yanzheng@21cn.com> |
[IPV6]: small fix for ipv6_dev_get_saddr(...) The "score.rule++" doesn't make any sense for me. According to codes above, I think it should be "hiscore.rule++;" . Signed-off-by: Yan Zheng<yanzheng@21cn.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
44fd0261 |
|
09-Nov-2005 |
Peter Chubb <peterc@gelato.unsw.edu.au> |
[IPV6]: Fix fallout from CONFIG_IPV6_PRIVACY Trying to build today's 2.6.14+git snapshot gives undefined references to use_tempaddr Looks like an ifdef got left out. Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a51482bd |
|
08-Nov-2005 |
Jesper Juhl <jesper.juhl@gmail.com> |
[NET]: kfree cleanup From: Jesper Juhl <jesper.juhl@gmail.com> This is the net/ part of the big kfree cleanup patch. Remove pointless checks for NULL prior to calling kfree() in net/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br> Acked-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Andrew Morton <akpm@osdl.org>
|
#
072047e4 |
|
08-Nov-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: RFC3484 compliant source address selection Choose more appropriate source address; e.g. - outgoing interface - non-deprecated - scope - matching label Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b1cacb68 |
|
08-Nov-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Make ipv6_addr_type() more generic so that we can use it for source address selection. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
979ad663 |
|
14-Oct-2005 |
Yan Zheng <yanzheng@21cn.com> |
[IPV6]: inet6_ifinfo_notify should use RTM_DELLINK in addrconf_ifdown Signed-off-by: Yan Zheng <yanzheng@21cn.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
|
#
378f058c |
|
17-Sep-2005 |
David Hardeman <david@2gen.com> |
[PATCH] Use sg_set_buf/sg_init_one where applicable This patch uses sg_set_buf/sg_init_one in some places where it was duplicated. Signed-off-by: David Hardeman <david@2gen.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
#
e5ed6399 |
|
03-Oct-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl The following patch renames __in_dev_get() to __in_dev_get_rtnl() and introduces __in_dev_get_rcu() to cover the second case. 1) RCU with refcnt should use in_dev_get(). 2) RCU without refcnt should use __in_dev_get_rcu(). 3) All others must hold RTNL and use __in_dev_get_rtnl(). There is one exception in net/ipv4/route.c which is in fact a pre-existing race condition. I've marked it as such so that we remember to fix it. This patch is based on suggestions and prior work by Suzanne Wood and Paul McKenney. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c62dba90 |
|
26-Sep-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV6]: Fix [Bug 5306] Oops on IPv6 route lookup > Steps to reproduce: > 1. Boot Linux, do NOT setup any IPv6 routes > 2. ip route get 2001::1 (or any unroutable address) Well caught. We never set rt6i_idev on ip6_null_entry. This patch should make the problem go away. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d06afab |
|
09-Sep-2005 |
Ingo Molnar <mingo@elte.hu> |
[PATCH] timer initialization cleanup: DEFINE_TIMER Clean up timer initialization by introducing DEFINE_TIMER a'la DEFINE_SPINLOCK. Build and boot-tested on x86. A similar patch has been been in the -RT tree for some time. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
573dbd95 |
|
01-Sep-2005 |
Jesper Juhl <jesper.juhl@gmail.com> |
[CRYPTO]: crypto_free_tfm() callers no longer need to check for NULL Since the patch to add a NULL short-circuit to crypto_free_tfm() went in, there's no longer any need for callers of that function to check for NULL. This patch removes the redundant NULL checks and also a few similar checks for NULL before calls to kfree() that I ran into while doing the crypto_free_tfm bits. I've succesfuly compile tested this patch, and a kernel with the patch applied boots and runs just fine. When I posted the patch to LKML (and other lists/people on Cc) it drew the following comments : J. Bruce Fields commented "I've no problem with the auth_gss or nfsv4 bits.--b." Sridhar Samudrala said "sctp change looks fine." Herbert Xu signed off on the patch. So, I guess this is ready to be dropped into -mm and eventually mainline. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
20380731 |
|
15-Aug-2005 |
Arnaldo Carvalho de Melo <acme@mandriva.com> |
[NET]: Fix sparse warnings Of this type, mostly: CHECK net/ipv6/netfilter.c net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static? net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static? Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ac6d439d |
|
14-Aug-2005 |
Patrick McHardy <kaber@trash.net> |
[NETLINK]: Convert netlink users to use group numbers instead of bitmasks Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
463c84b9 |
|
09-Aug-2005 |
Arnaldo Carvalho de Melo <acme@ghostprotocols.net> |
[NET]: Introduce inet_connection_sock This creates struct inet_connection_sock, moving members out of struct tcp_sock that are shareable with other INET connection oriented protocols, such as DCCP, that in my private tree already uses most of these members. The functions that operate on these members were renamed, using a inet_csk_ prefix while not being moved yet to a new file, so as to ease the review of these changes. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8feaf0c0 |
|
09-Aug-2005 |
Arnaldo Carvalho de Melo <acme@ghostprotocols.net> |
[INET]: Generalise tcp_tw_bucket, aka TIME_WAIT sockets This paves the way to generalise the rest of the sock ID lookup routines and saves some bytes in TCPv4 TIME_WAIT sockets on distro kernels (where IPv6 is always built as a module): [root@qemu ~]# grep tw_sock /proc/slabinfo tw_sock_TCPv6 0 0 128 31 1 tw_sock_TCP 0 0 96 41 1 [root@qemu ~]# Now if a protocol wants to use the TIME_WAIT generic infrastructure it only has to set the sk_prot->twsk_obj_size field with the size of its inet_timewait_sock derived sock and proto_register will create sk_prot->twsk_slab, for now its only for INET sockets, but we can introduce timewait_sock later if some non INET transport protocolo wants to use this stuff. Next changesets will take advantage of this new infrastructure to generalise even more TCP code. [acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size /tmp/before.size: 188646 11764 5068 205478 322a6 net/ipv4/built-in.o /tmp/after.size: 188144 11764 5068 204976 320b0 net/ipv4/built-in.o [acme@toy net-2.6.14]$ Tested with both IPv4 & IPv6 (::1 (localhost) & ::ffff:172.20.0.1 (qemu host)). Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae9cda5d |
|
28-Jun-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Don't dump temporary addresses twice Each IPv6 Temporary Address (w/ CONFIG_IPV6_PRIVACY) is dumped twice to netlink. Because temporary addresses are listed in idev->addr_list, there's no need to dump idev->tempaddr separately. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a47077a |
|
28-Jun-2005 |
Patrick McHardy <kaber@trash.net> |
[NETLINK]: Missing padding fields in dumped structures Plug holes with padding fields and initialized them to zero. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ef1d4c7 |
|
28-Jun-2005 |
Patrick McHardy <kaber@trash.net> |
[NETLINK]: Missing initializations in dumped data Mostly missing initialization of padding fields of 1 or 2 bytes length, two instances of uninitialized nlmsgerr->msg of 16 bytes length. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
543537bd |
|
23-Jun-2005 |
Paulo Marques <pmarques@grupopie.com> |
[PATCH] create a kstrdup library function This patch creates a new kstrdup library function and changes the "local" implementations in several places to use this function. Most of the changes come from the sound and net subsystems. The sound part had already been acknowledged by Takashi Iwai and the net part by David S. Miller. I left UML alone for now because I would need more time to read the code carefully before making changes there. Signed-off-by: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
0d51aa80 |
|
21-Jun-2005 |
Jamal Hadi Salim <hadi@cyberus.ca> |
[IPV6]: V6 route events reported with wrong netlink PID and seq number Essentially netlink at the moment always reports a pid and sequence of 0 always for v6 route activities. To understand the repurcassions of this look at: http://lists.quagga.net/pipermail/quagga-dev/2005-June/003507.html While fixing this, i took the liberty to resolve the outstanding issue of IPV6 routes inserted via ioctls to have the correct pids as well. This patch tries to behave as close as possible to the v4 routes i.e maintains whatever PID the socket issuing the command owns as opposed to the process. That made the patch a little bulky. I have tested against both netlink derived utility to add/del routes as well as ioctl derived one. The Quagga folks have tested against quagga. This fixes the problem and so far hasnt been detected to introduce any new issues. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ed19f33 |
|
18-Jun-2005 |
Jamal Hadi Salim <hadi@cyberus.ca> |
[NETLINK]: Set correct pid for ioctl originating netlink events This patch ensures that netlink events created as a result of programns using ioctls (such as ifconfig, route etc) contains the correct PID of those events. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e431b8c0 |
|
18-Jun-2005 |
Jamal Hadi Salim <hadi@cyberus.ca> |
[NETLINK]: Explicit typing This patch converts "unsigned flags" to use more explict types like u16 instead and incrementally introduces NLMSG_NEW(). Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b6544c0b |
|
18-Jun-2005 |
Jamal Hadi Salim <hadi@cyberus.ca> |
[NETLINK]: Correctly set NLM_F_MULTI without checking the pid This patch rectifies some rtnetlink message builders that derive the flags from the pid. It is now explicit like the other cases which get it right. Also fixes half a dozen dumpers which did not set NLM_F_MULTI at all. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
77bd9196 |
|
13-Jun-2005 |
Rémi Denis-Courmont <rdenis@simphalempin.com> |
[IPv6] Don't generate temporary for TUN devices Userland layer-2 tunneling devices allocated through the TUNTAP driver (drivers/net/tun.c) have a type of ARPHRD_NONE, and have no link-layer address. The kernel complains at regular interval when IPv6 Privacy extension are enabled because it can't find an hardware address : Dec 29 11:02:04 auguste kernel: __ipv6_regen_rndid(idev=cb3e0c00): cannot get EUI64 identifier; use random bytes. IPv6 Privacy extensions should probably be disabled on that sort of device. They won't work anyway. If userland wants a more usual Ethernet-ish interface with usual IPv6 autoconfiguration, it will use a TAP device with an emulated link-layer and a random hardware address rather than a TUN device. As far as I could fine, TUN virtual device from TUNTAP is the very only sort of device using ARPHRD_NONE as kernel device type. Signed-off-by: R�mi Denis-Courmont <rdenis@simphalempin.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
db46edc6 |
|
03-May-2005 |
Thomas Graf <tgraf@suug.ch> |
[RTNETLINK] Cleanup rtnetlink_link tables Converts remaining rtnetlink_link tables to use c99 designated initializers to make greping a little bit easier. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fd92833a |
|
19-Apr-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Fix a branch prediction From: Tushar Gohad <tgohad@mvista.com> Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|