#
a4634aa7 |
|
21-Feb-2024 |
Praveen Kumar Kannoju <praveen.kannoju@oracle.com> |
bonding: rate-limit bonding driver inspect messages Through the routine bond_mii_monitor(), bonding driver inspects and commits the slave state changes. During the times when slave state change and failure in aqcuiring rtnl lock happen at the same time, the routine bond_mii_monitor() reschedules itself to come around after 1 msec to commit the new state. During this, it executes the routine bond_miimon_inspect() to re-inspect the state chane and prints the corresponding slave state on to the console. Hence we do see a message at every 1 msec till the rtnl lock is acquired and state chage is committed. This patch doesn't change how bond functions. It only simply limits this kind of log flood. Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240221082752.4660-1-praveen.kannoju@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
669966bc |
|
06-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
bonding: use exit_batch_rtnl() method exit_batch_rtnl() is called while RTNL is held, and devices to be unregistered can be queued in the dev_kill_list. This saves one rtnl_lock()/rtnl_unlock() pair, and one unregister_netdevice_many() call. v2: Added bond_net_pre_exit() method to make sure bond_destroy_sysfs() is called before we unregister the devices in bond_net_exit_batch_rtnl (Antoine Tenart : https://lore.kernel.org/netdev/170688415193.5216.10499830272732622816@kwain/) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Reviewed-by: Antoine Tenart <atenart@kernel.org> Link: https://lore.kernel.org/r/20240206144313.2050392-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
240fd405 |
|
02-Feb-2024 |
Aahil Awatramani <aahila@google.com> |
bonding: Add independent control state machine Add support for the independent control state machine per IEEE 802.1AX-2008 5.4.15 in addition to the existing implementation of the coupled control state machine. Introduces two new states, AD_MUX_COLLECTING and AD_MUX_DISTRIBUTING in the LACP MUX state machine for separated handling of an initial Collecting state before the Collecting and Distributing state. This enables a port to be in a state where it can receive incoming packets while not still distributing. This is useful for reducing packet loss when a port begins distributing before its partner is able to collect. Added new functions such as bond_set_slave_tx_disabled_flags and bond_set_slave_rx_enabled_flags to precisely manage the port's collecting and distributing states. Previously, there was no dedicated method to disable TX while keeping RX enabled, which this patch addresses. Note that the regular flow process in the kernel's bonding driver remains unaffected by this patch. The extension requires explicit opt-in by the user (in order to ensure no disruptions for existing setups) via netlink support using the new bonding parameter coupled_control. The default value for coupled_control is set to 1 so as to preserve existing behaviour. Signed-off-by: Aahil Awatramani <aahila@google.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240202175858.1573852-1-aahila@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
f267f262 |
|
05-Mar-2024 |
Daniel Borkmann <daniel@iogearbox.net> |
xdp, bonding: Fix feature flags when there are no slave devs anymore Commit 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY") changed the driver from reporting everything as supported before a device was bonded into having the driver report that no XDP feature is supported until a real device is bonded as it seems to be more truthful given eventually real underlying devices decide what XDP features are supported. The change however did not take into account when all slave devices get removed from the bond device. In this case after 9b0ed890ac2a, the driver keeps reporting a feature mask of 0x77, that is, NETDEV_XDP_ACT_MASK & ~NETDEV_XDP_ACT_XSK_ZEROCOPY whereas it should have reported a feature mask of 0. Fix it by resetting XDP feature flags in the same way as if no XDP program is attached to the bond device. This was uncovered by the XDP bond selftest which let BPF CI fail. After adjusting the starting masks on the latter to 0 instead of NETDEV_XDP_ACT_MASK the test passes again together with this fix. Fixes: 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Prashant Batra <prbatra.mail@gmail.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Message-ID: <20240305090829.17131-1-daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
#
9b0ed890 |
|
07-Feb-2024 |
Magnus Karlsson <magnus.karlsson@intel.com> |
bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY Do not report the XDP capability NETDEV_XDP_ACT_XSK_ZEROCOPY as the bonding driver does not support XDP and AF_XDP in zero-copy mode even if the real NIC drivers do. Note that the driver used to report everything as supported before a device was bonded. Instead of just masking out the zero-copy support from this, have the driver report that no XDP feature is supported until a real device is bonded. This seems to be more truthful as it is the real drivers that decide what XDP features are supported. Fixes: cb9e6e584d58 ("bonding: add xdp_features support") Reported-by: Prashant Batra <prbatra.mail@gmail.com> Link: https://lore.kernel.org/all/CAJ8uoz2ieZCopgqTvQ9ZY6xQgTbujmC6XkMTamhp68O-h_-rLg@mail.gmail.com/T/ Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20240207084737.20890-1-magnus.karlsson@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
486058f4 |
|
22-Nov-2023 |
Zhengchao Shao <shaozhengchao@huawei.com> |
bonding: remove print in bond_verify_device_path As suggested by Paolo in link[1], if the memory allocation fails, the mm layer will emit a lot warning comprising the backtrace, so remove the print. [1] https://lore.kernel.org/all/20231118081653.1481260-1-shaozhengchao@huawei.com/ Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d6b83f1e |
|
18-Nov-2023 |
Zhengchao Shao <shaozhengchao@huawei.com> |
bonding: return -ENOMEM instead of BUG in alb_upper_dev_walk If failed to allocate "tags" or could not find the final upper device from start_dev's upper list in bond_verify_device_path(), only the loopback detection of the current upper device should be affected, and the system is no need to be panic. So return -ENOMEM in alb_upper_dev_walk to stop walking, print some warn information when failed to allocate memory for vlan tags in bond_verify_device_path. I also think that the following function calls netdev_walk_all_upper_dev_rcu ---->>>alb_upper_dev_walk ---------->>>bond_verify_device_path From this way, "end device" can eventually be obtained from "start device" in bond_verify_device_path, IS_ERR(tags) could be instead of IS_ERR_OR_NULL(tags) in alb_upper_dev_walk. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20231118081653.1481260-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
b8768dc4 |
|
13-Nov-2023 |
Richard Cochran <richardcochran@gmail.com> |
net: ethtool: Refactor identical get_ts_info implementations. The vlan, macvlan and the bonding drivers call their "real" device driver in order to report the time stamping capabilities. Provide a core ethtool helper function to avoid copy/paste in the stack. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3cffa2dd |
|
09-Nov-2023 |
Eric Dumazet <edumazet@google.com> |
bonding: stop the device in bond_setup_by_slave() Commit 9eed321cde22 ("net: lapbether: only support ethernet devices") has been able to keep syzbot away from net/lapb, until today. In the following splat [1], the issue is that a lapbether device has been created on a bonding device without members. Then adding a non ARPHRD_ETHER member forced the bonding master to change its type. The fix is to make sure we call dev_close() in bond_setup_by_slave() so that the potential linked lapbether devices (or any other devices having assumptions on the physical device) are removed. A similar bug has been addressed in commit 40baec225765 ("bonding: fix panic on non-ARPHRD_ETHER enslave failure") [1] skbuff: skb_under_panic: text:ffff800089508810 len:44 put:40 head:ffff0000c78e7c00 data:ffff0000c78e7bea tail:0x16 end:0x140 dev:bond0 kernel BUG at net/core/skbuff.c:192 ! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 6007 Comm: syz-executor383 Not tainted 6.6.0-rc3-syzkaller-gbf6547d8715b #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : skb_panic net/core/skbuff.c:188 [inline] pc : skb_under_panic+0x13c/0x140 net/core/skbuff.c:202 lr : skb_panic net/core/skbuff.c:188 [inline] lr : skb_under_panic+0x13c/0x140 net/core/skbuff.c:202 sp : ffff800096a06aa0 x29: ffff800096a06ab0 x28: ffff800096a06ba0 x27: dfff800000000000 x26: ffff0000ce9b9b50 x25: 0000000000000016 x24: ffff0000c78e7bea x23: ffff0000c78e7c00 x22: 000000000000002c x21: 0000000000000140 x20: 0000000000000028 x19: ffff800089508810 x18: ffff800096a06100 x17: 0000000000000000 x16: ffff80008a629a3c x15: 0000000000000001 x14: 1fffe00036837a32 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000201 x10: 0000000000000000 x9 : cb50b496c519aa00 x8 : cb50b496c519aa00 x7 : 0000000000000001 x6 : 0000000000000001 x5 : ffff800096a063b8 x4 : ffff80008e280f80 x3 : ffff8000805ad11c x2 : 0000000000000001 x1 : 0000000100000201 x0 : 0000000000000086 Call trace: skb_panic net/core/skbuff.c:188 [inline] skb_under_panic+0x13c/0x140 net/core/skbuff.c:202 skb_push+0xf0/0x108 net/core/skbuff.c:2446 ip6gre_header+0xbc/0x738 net/ipv6/ip6_gre.c:1384 dev_hard_header include/linux/netdevice.h:3136 [inline] lapbeth_data_transmit+0x1c4/0x298 drivers/net/wan/lapbether.c:257 lapb_data_transmit+0x8c/0xb0 net/lapb/lapb_iface.c:447 lapb_transmit_buffer+0x178/0x204 net/lapb/lapb_out.c:149 lapb_send_control+0x220/0x320 net/lapb/lapb_subr.c:251 __lapb_disconnect_request+0x9c/0x17c net/lapb/lapb_iface.c:326 lapb_device_event+0x288/0x4e0 net/lapb/lapb_iface.c:492 notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93 raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461 call_netdevice_notifiers_info net/core/dev.c:1970 [inline] call_netdevice_notifiers_extack net/core/dev.c:2008 [inline] call_netdevice_notifiers net/core/dev.c:2022 [inline] __dev_close_many+0x1b8/0x3c4 net/core/dev.c:1508 dev_close_many+0x1e0/0x470 net/core/dev.c:1559 dev_close+0x174/0x250 net/core/dev.c:1585 lapbeth_device_event+0x2e4/0x958 drivers/net/wan/lapbether.c:466 notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93 raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461 call_netdevice_notifiers_info net/core/dev.c:1970 [inline] call_netdevice_notifiers_extack net/core/dev.c:2008 [inline] call_netdevice_notifiers net/core/dev.c:2022 [inline] __dev_close_many+0x1b8/0x3c4 net/core/dev.c:1508 dev_close_many+0x1e0/0x470 net/core/dev.c:1559 dev_close+0x174/0x250 net/core/dev.c:1585 bond_enslave+0x2298/0x30cc drivers/net/bonding/bond_main.c:2332 bond_do_ioctl+0x268/0xc64 drivers/net/bonding/bond_main.c:4539 dev_ifsioc+0x754/0x9ac dev_ioctl+0x4d8/0xd34 net/core/dev_ioctl.c:786 sock_do_ioctl+0x1d4/0x2d0 net/socket.c:1217 sock_ioctl+0x4e8/0x834 net/socket.c:1322 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __arm64_sys_ioctl+0x14c/0x1c8 fs/ioctl.c:857 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x58/0x16c arch/arm64/kernel/entry-common.c:678 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591 Code: aa1803e6 aa1903e7 a90023f5 94785b8b (d4210000) Fixes: 872254dd6b1f ("net/bonding: Enable bonding to enslave non ARPHRD_ETHER") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20231109180102.4085183-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
d93f3f99 |
|
10-Oct-2023 |
Jiri Wiesner <jwiesner@suse.de> |
bonding: Return pointer to data after pull on skb Since 429e3d123d9a ("bonding: Fix extraction of ports from the packet headers"), header offsets used to compute a hash in bond_xmit_hash() are relative to skb->data and not skb->head. If the tail of the header buffer of an skb really needs to be advanced and the operation is successful, the pointer to the data must be returned (and not a pointer to the head of the buffer). Fixes: 429e3d123d9a ("bonding: Fix extraction of ports from the packet headers") Signed-off-by: Jiri Wiesner <jwiesner@suse.de> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
691b2bf1 |
|
21-Aug-2023 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: update port speed when getting bond speed Andrew reported a bonding issue that if we put an active-back bond on top of a 802.3ad bond interface. When the 802.3ad bond's speed/duplex changed dynamically. The upper bonding interface's speed/duplex can't be changed at the same time, which will show incorrect speed. Fix it by updating the port speed when calling ethtool. Reported-by: Andrew Schorr <ajschorr@alumni.princeton.edu> Closes: https://lore.kernel.org/netdev/ZEt3hvyREPVdbesO@Laptop-X1/ Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20230821101008.797482-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
f5370ba3 |
|
10-Aug-2023 |
Zhengchao Shao <shaozhengchao@huawei.com> |
bonding: remove unnecessary NULL check in bond_destructor The free_percpu function also could check whether "rr_tx_counter" parameter is NULL. Therefore, remove NULL check in bond_destructor. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a8f3f4b4 |
|
10-Aug-2023 |
Zhengchao Shao <shaozhengchao@huawei.com> |
bonding: use bond_set_slave_arr to simplify code In bond_reset_slave_arr(), values are assigned and memory is released only when the variables "usable" and "all" are not NULL. But even if the "usable" and "all" variables are NULL, they can still work, because value will be checked in kfree_rcu. Therefore, use bond_set_slave_arr() and set the input parameters "usable_slaves" and "all_slaves" to NULL to simplify the code in bond_reset_slave_arr(). And the same to bond_uninit(). Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e08190ef |
|
10-Aug-2023 |
Zhengchao Shao <shaozhengchao@huawei.com> |
bonding: add modifier to initialization function and exit function Some functions are only used in initialization and exit functions, so add the __init/__net_init and __net_exit modifiers to these functions. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
92272ec4 |
|
02-Aug-2023 |
Jakub Kicinski <kuba@kernel.org> |
eth: add missing xdp.h includes in drivers Handful of drivers currently expect to get xdp.h by virtue of including netdevice.h. This will soon no longer be the case so add explicit includes. Reviewed-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230803010230.1755386-2-kuba@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
c0dabeb4 |
|
01-Aug-2023 |
Maxim Georgiev <glipus@gmail.com> |
net: bonding: convert to ndo_hwtstamp_get() / ndo_hwtstamp_set() bonding is one of the stackable net devices which pass the hardware timestamping ops to the real device through ndo_eth_ioctl(). This prevents converting any device driver to the new hwtimestamping API without regressions. Remove that limitation in bonding by using the newly introduced helpers for timestamping through lower devices, that handle both the new and the old driver API. Signed-off-by: Maxim Georgiev <glipus@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230801142824.1772134-6-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
01f4fd27 |
|
02-Aug-2023 |
Ziyang Xuan <william.xuanziyang@huawei.com> |
bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with following testcase: # ip netns add ns1 # ip netns exec ns1 ip link add bond0 type bond mode 0 # ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2 # ip netns exec ns1 ip link set bond_slave_1 master bond0 # ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad # ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad # ip netns exec ns1 ip link set bond_slave_1 nomaster # ip netns del ns1 The logical analysis of the problem is as follows: 1. create ETH_P_8021AD protocol vlan10 for bond_slave_1: register_vlan_dev() vlan_vid_add() vlan_info_alloc() __vlan_vid_add() // add [ETH_P_8021AD, 10] vid to bond_slave_1 2. create ETH_P_8021AD protocol bond0_vlan10 for bond0: register_vlan_dev() vlan_vid_add() __vlan_vid_add() vlan_add_rx_filter_info() if (!vlan_hw_filter_capable(dev, proto)) // condition established because bond0 without NETIF_F_HW_VLAN_STAG_FILTER return 0; if (netif_device_present(dev)) return dev->netdev_ops->ndo_vlan_rx_add_vid(dev, proto, vid); // will be never called // The slaves of bond0 will not refer to the [ETH_P_8021AD, 10] vid. 3. detach bond_slave_1 from bond0: __bond_release_one() vlan_vids_del_by_dev() list_for_each_entry(vid_info, &vlan_info->vid_list, list) vlan_vid_del(dev, vid_info->proto, vid_info->vid); // bond_slave_1 [ETH_P_8021AD, 10] vid will be deleted. // bond_slave_1->vlan_info will be assigned NULL. 4. delete vlan10 during delete ns1: default_device_exit_batch() dev->rtnl_link_ops->dellink() // unregister_vlan_dev() for vlan10 vlan_info = rtnl_dereference(real_dev->vlan_info); // real_dev of vlan10 is bond_slave_1 BUG_ON(!vlan_info); // bond_slave_1->vlan_info is NULL now, bug is triggered!!! Add S-VLAN tag related features support to bond driver. So the bond driver will always propagate the VLAN info to its slaves. Fixes: 8ad227ff89a7 ("net: vlan: add 802.1ad support") Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230802114320.4156068-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
da19a2b9 |
|
20-Jul-2023 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: reset bond's flags when down link is P2P device When adding a point to point downlink to the bond, we neglected to reset the bond's flags, which were still using flags like BROADCAST and MULTICAST. Consequently, this would initiate ARP/DAD for P2P downlink interfaces, such as when adding a GRE device to the bonding. To address this issue, let's reset the bond's flags for P2P interfaces. Before fix: 7: gre0@NONE: <POINTOPOINT,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond0 state UNKNOWN group default qlen 1000 link/gre6 2006:70:10::1 peer 2006:70:10::2 permaddr 167f:18:f188:: 8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/gre6 2006:70:10::1 brd 2006:70:10::2 inet6 fe80::200:ff:fe00:0/64 scope link valid_lft forever preferred_lft forever After fix: 7: gre0@NONE: <POINTOPOINT,NOARP,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue master bond2 state UNKNOWN group default qlen 1000 link/gre6 2006:70:10::1 peer 2006:70:10::2 permaddr c29e:557a:e9d9:: 8: bond0: <POINTOPOINT,NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/gre6 2006:70:10::1 peer 2006:70:10::2 inet6 fe80::1/64 scope link valid_lft forever preferred_lft forever Reported-by: Liang Li <liali@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2221438 Fixes: 872254dd6b1f ("net/bonding: Enable bonding to enslave non ARPHRD_ETHER") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
6a940abd |
|
22-Jun-2023 |
Eric Dumazet <edumazet@google.com> |
bonding: do not assume skb mac_header is set Drivers must not assume in their ndo_start_xmit() that skbs have their mac_header set. skb->data is all what is needed. bonding seems to be one of the last offender as caught by syzbot: WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 skb_mac_offset include/linux/skbuff.h:2913 [inline] WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 bond_xmit_hash drivers/net/bonding/bond_main.c:4170 [inline] WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 bond_xmit_3ad_xor_slave_get drivers/net/bonding/bond_main.c:5149 [inline] WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 bond_3ad_xor_xmit drivers/net/bonding/bond_main.c:5186 [inline] WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 __bond_start_xmit drivers/net/bonding/bond_main.c:5442 [inline] WARNING: CPU: 1 PID: 12155 at include/linux/skbuff.h:2907 bond_start_xmit+0x14ab/0x19d0 drivers/net/bonding/bond_main.c:5470 Modules linked in: CPU: 1 PID: 12155 Comm: syz-executor.3 Not tainted 6.1.30-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 RIP: 0010:skb_mac_header include/linux/skbuff.h:2907 [inline] RIP: 0010:skb_mac_offset include/linux/skbuff.h:2913 [inline] RIP: 0010:bond_xmit_hash drivers/net/bonding/bond_main.c:4170 [inline] RIP: 0010:bond_xmit_3ad_xor_slave_get drivers/net/bonding/bond_main.c:5149 [inline] RIP: 0010:bond_3ad_xor_xmit drivers/net/bonding/bond_main.c:5186 [inline] RIP: 0010:__bond_start_xmit drivers/net/bonding/bond_main.c:5442 [inline] RIP: 0010:bond_start_xmit+0x14ab/0x19d0 drivers/net/bonding/bond_main.c:5470 Code: 8b 7c 24 30 e8 76 dd 1a 01 48 85 c0 74 0d 48 89 c3 e8 29 67 2e fe e9 15 ef ff ff e8 1f 67 2e fe e9 10 ef ff ff e8 15 67 2e fe <0f> 0b e9 45 f8 ff ff e8 09 67 2e fe e9 dc fa ff ff e8 ff 66 2e fe RSP: 0018:ffffc90002fff6e0 EFLAGS: 00010283 RAX: ffffffff835874db RBX: 000000000000ffff RCX: 0000000000040000 RDX: ffffc90004dcf000 RSI: 00000000000000b5 RDI: 00000000000000b6 RBP: ffffc90002fff8b8 R08: ffffffff83586d16 R09: ffffffff83586584 R10: 0000000000000007 R11: ffff8881599fc780 R12: ffff88811b6a7b7e R13: 1ffff110236d4f6f R14: ffff88811b6a7ac0 R15: 1ffff110236d4f76 FS: 00007f2e9eb47700(0000) GS:ffff8881f6b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2e421000 CR3: 000000010e6d4000 CR4: 00000000003526e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> [<ffffffff8471a49f>] netdev_start_xmit include/linux/netdevice.h:4925 [inline] [<ffffffff8471a49f>] __dev_direct_xmit+0x4ef/0x850 net/core/dev.c:4380 [<ffffffff851d845b>] dev_direct_xmit include/linux/netdevice.h:3043 [inline] [<ffffffff851d845b>] packet_direct_xmit+0x18b/0x300 net/packet/af_packet.c:284 [<ffffffff851c7472>] packet_snd net/packet/af_packet.c:3112 [inline] [<ffffffff851c7472>] packet_sendmsg+0x4a22/0x64d0 net/packet/af_packet.c:3143 [<ffffffff8467a4b2>] sock_sendmsg_nosec net/socket.c:716 [inline] [<ffffffff8467a4b2>] sock_sendmsg net/socket.c:736 [inline] [<ffffffff8467a4b2>] __sys_sendto+0x472/0x5f0 net/socket.c:2139 [<ffffffff8467a715>] __do_sys_sendto net/socket.c:2151 [inline] [<ffffffff8467a715>] __se_sys_sendto net/socket.c:2147 [inline] [<ffffffff8467a715>] __x64_sys_sendto+0xe5/0x100 net/socket.c:2147 [<ffffffff8553071f>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff8553071f>] do_syscall_64+0x2f/0x50 arch/x86/entry/common.c:80 [<ffffffff85600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 7b8fc0103bb5 ("bonding: add a vlan+srcmac tx hashing option") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jarod Wilson <jarod@redhat.com> Cc: Moshe Tal <moshet@nvidia.com> Cc: Jussi Maki <joamaki@gmail.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230622152304.2137482-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ed3c9a2f |
|
13-Jun-2023 |
Jakub Kicinski <kuba@kernel.org> |
net: tls: make the offload check helper take skb not socket All callers of tls_is_sk_tx_device_offloaded() currently do an equivalent of: if (skb->sk && tls_is_skb_tx_device_offloaded(skb->sk)) Have the helper accept skb and do the skb->sk check locally. Two drivers have local static inlines with similar wrappers already. While at it change the ifdef condition to TLS_DEVICE. Only TLS_DEVICE selects SOCK_VALIDATE_XMIT, so the two are equivalent. This makes removing the duplicated IS_ENABLED() check in funeth more obviously correct. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Dimitris Michailidis <dmichail@fungible.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae9b15fb |
|
17-May-2023 |
Taehee Yoo <ap420073@gmail.com> |
net: fix stack overflow when LRO is disabled for virtual interfaces When the virtual interface's feature is updated, it synchronizes the updated feature for its own lower interface. This propagation logic should be worked as the iteration, not recursively. But it works recursively due to the netdev notification unexpectedly. This problem occurs when it disables LRO only for the team and bonding interface type. team0 | +------+------+-----+-----+ | | | | | team1 team2 team3 ... team200 If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE event to its own lower interfaces(team1 ~ team200). It is worked by netdev_sync_lower_features(). So, the NETDEV_FEAT_CHANGE notification logic of each lower interface work iteratively. But generated NETDEV_FEAT_CHANGE event is also sent to the upper interface too. upper interface(team0) generates the NETDEV_FEAT_CHANGE event for its own lower interfaces again. lower and upper interfaces receive this event and generate this event again and again. So, the stack overflow occurs. But it is not the infinite loop issue. Because the netdev_sync_lower_features() updates features before generating the NETDEV_FEAT_CHANGE event. Already synchronized lower interfaces skip notification logic. So, it is just the problem that iteration logic is changed to the recursive unexpectedly due to the notification mechanism. Reproducer: ip link add team0 type team ethtool -K team0 lro on for i in {1..200} do ip link add team$i master team0 type team ethtool -K team$i lro on done ethtool -K team0 lro off In order to fix it, the notifier_ctx member of bonding/team is introduced. Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20230517143010.3596250-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
613a0141 |
|
15-May-2023 |
Bagas Sanjaya <bagasdotme@gmail.com> |
net: bonding: Add SPDX identifier to remaining files Previous batches of SPDX conversion missed bond_main.c and bonding_priv.h because these files doesn't mention intended GPL version. Add SPDX identifier to these files, assuming GPL 1.0+. Cc: Thomas Davis <tadavis@lbl.gov> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
c1bc7d73 |
|
11-May-2023 |
Simon Horman <horms@kernel.org> |
bonding: Always assign be16 value to vlan_proto The type of the vlan_proto field is __be16. And most users of the field use it as such. In the case of setting or testing the field for the special VLAN_N_VID value, host byte order is used. Which seems incorrect. It also seems somewhat odd to store a VLAN ID value in a field that is otherwise used to store Ether types. Address this issue by defining BOND_VLAN_PROTO_NONE, a big endian value. 0xffff was chosen somewhat arbitrarily. What is important is that it doesn't overlap with any valid VLAN Ether types. I don't believe the problems described above are a bug because VLAN_N_VID in both little-endian and big-endian byte order does not conflict with any supported VLAN Ether types in big-endian byte order. Reported by sparse as: .../bond_main.c:2857:26: warning: restricted __be16 degrades to integer .../bond_main.c:2863:20: warning: restricted __be16 degrades to integer .../bond_main.c:2939:40: warning: incorrect type in assignment (different base types) .../bond_main.c:2939:40: expected restricted __be16 [usertype] vlan_proto .../bond_main.c:2939:40: got int No functional changes intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb9e6e58 |
|
04-May-2023 |
Lorenzo Bianconi <lorenzo@kernel.org> |
bonding: add xdp_features support Introduce xdp_features support for bonding driver according to the slave devices attached to the master one. xdp_features is required whenever we want to xdp_redirect traffic into a bond device and then into selected slaves attached to it. Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Fixes: 66c0e13ad236 ("drivers: net: turn on XDP features") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Jussi Maki <joamaki@gmail.com> Tested-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
980f0799 |
|
17-Apr-2023 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: add software tx timestamping support Currently, bonding only obtain the timestamp (ts) information of the active slave, which is available only for modes 1, 5, and 6. For other modes, bonding only has software rx timestamping support. However, some users who use modes such as LACP also want tx timestamp support. To address this issue, let's check the ts information of each slave. If all slaves support tx timestamping, we can enable tx timestamping support for the bond. Add a note that the get_ts_info may be called with RCU, or rtnl or reference on the device in ethtool.h> Suggested-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20230418034841.2566262-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c484fcc0 |
|
17-Apr-2023 |
Ido Schimmel <idosch@nvidia.com> |
bonding: Fix memory leak when changing bond type to Ethernet When a net device is put administratively up, its 'IFF_UP' flag is set (if not set already) and a 'NETDEV_UP' notification is emitted, which causes the 8021q driver to add VLAN ID 0 on the device. The reverse happens when a net device is put administratively down. When changing the type of a bond to Ethernet, its 'IFF_UP' flag is incorrectly cleared, resulting in the kernel skipping the above process and VLAN ID 0 being leaked [1]. Fix by restoring the flag when changing the type to Ethernet, in a similar fashion to the restoration of the 'IFF_SLAVE' flag. The issue can be reproduced using the script in [2], with example out before and after the fix in [3]. [1] unreferenced object 0xffff888103479900 (size 256): comm "ip", pid 329, jiffies 4294775225 (age 28.561s) hex dump (first 32 bytes): 00 a0 0c 15 81 88 ff ff 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81a6051a>] kmalloc_trace+0x2a/0xe0 [<ffffffff8406426c>] vlan_vid_add+0x30c/0x790 [<ffffffff84068e21>] vlan_device_event+0x1491/0x21a0 [<ffffffff81440c8e>] notifier_call_chain+0xbe/0x1f0 [<ffffffff8372383a>] call_netdevice_notifiers_info+0xba/0x150 [<ffffffff837590f2>] __dev_notify_flags+0x132/0x2e0 [<ffffffff8375ad9f>] dev_change_flags+0x11f/0x180 [<ffffffff8379af36>] do_setlink+0xb96/0x4060 [<ffffffff837adf6a>] __rtnl_newlink+0xc0a/0x18a0 [<ffffffff837aec6c>] rtnl_newlink+0x6c/0xa0 [<ffffffff837ac64e>] rtnetlink_rcv_msg+0x43e/0xe00 [<ffffffff839a99e0>] netlink_rcv_skb+0x170/0x440 [<ffffffff839a738f>] netlink_unicast+0x53f/0x810 [<ffffffff839a7fcb>] netlink_sendmsg+0x96b/0xe90 [<ffffffff8369d12f>] ____sys_sendmsg+0x30f/0xa70 [<ffffffff836a6d7a>] ___sys_sendmsg+0x13a/0x1e0 unreferenced object 0xffff88810f6a83e0 (size 32): comm "ip", pid 329, jiffies 4294775225 (age 28.561s) hex dump (first 32 bytes): a0 99 47 03 81 88 ff ff a0 99 47 03 81 88 ff ff ..G.......G..... 81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc ................ backtrace: [<ffffffff81a6051a>] kmalloc_trace+0x2a/0xe0 [<ffffffff84064369>] vlan_vid_add+0x409/0x790 [<ffffffff84068e21>] vlan_device_event+0x1491/0x21a0 [<ffffffff81440c8e>] notifier_call_chain+0xbe/0x1f0 [<ffffffff8372383a>] call_netdevice_notifiers_info+0xba/0x150 [<ffffffff837590f2>] __dev_notify_flags+0x132/0x2e0 [<ffffffff8375ad9f>] dev_change_flags+0x11f/0x180 [<ffffffff8379af36>] do_setlink+0xb96/0x4060 [<ffffffff837adf6a>] __rtnl_newlink+0xc0a/0x18a0 [<ffffffff837aec6c>] rtnl_newlink+0x6c/0xa0 [<ffffffff837ac64e>] rtnetlink_rcv_msg+0x43e/0xe00 [<ffffffff839a99e0>] netlink_rcv_skb+0x170/0x440 [<ffffffff839a738f>] netlink_unicast+0x53f/0x810 [<ffffffff839a7fcb>] netlink_sendmsg+0x96b/0xe90 [<ffffffff8369d12f>] ____sys_sendmsg+0x30f/0xa70 [<ffffffff836a6d7a>] ___sys_sendmsg+0x13a/0x1e0 [2] ip link add name t-nlmon type nlmon ip link add name t-dummy type dummy ip link add name t-bond type bond mode active-backup ip link set dev t-bond up ip link set dev t-nlmon master t-bond ip link set dev t-nlmon nomaster ip link show dev t-bond ip link set dev t-dummy master t-bond ip link show dev t-bond ip link del dev t-bond ip link del dev t-dummy ip link del dev t-nlmon [3] Before: 12: t-bond: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/netlink 12: t-bond: <BROADCAST,MULTICAST,MASTER,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 46:57:39:a4:46:a2 brd ff:ff:ff:ff:ff:ff After: 12: t-bond: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/netlink 12: t-bond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 66:48:7b:74:b6:8a brd ff:ff:ff:ff:ff:ff Fixes: e36b9d16c6a6 ("bonding: clean muticast addresses when device changes type") Fixes: 75c78500ddad ("bonding: remap muticast addresses without using dev_close() and dev_open()") Fixes: 9ec7eb60dcbc ("bonding: restore IFF_MASTER/SLAVE flags on bond enslave ether type change") Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Link: https://lore.kernel.org/netdev/78a8a03b-6070-3e6b-5042-f848dab16fb8@alu.unizg.hr/ Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4598380f |
|
06-Apr-2023 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: fix ns validation on backup slaves When arp_validate is set to 2, 3, or 6, validation is performed for backup slaves as well. As stated in the bond documentation, validation involves checking the broadcast ARP request sent out via the active slave. This helps determine which slaves are more likely to function in the event of an active slave failure. However, when the target is an IPv6 address, the NS message sent from the active interface is not checked on backup slaves. Additionally, based on the bond_arp_rcv() rule b, we must reverse the saddr and daddr when checking the NS message. Note that when checking the NS message, the destination address is a multicast address. Therefore, we must convert the target address to solicited multicast in the bond_get_targets_ip6() function. Prior to the fix, the backup slaves had a mii status of "down", but after the fix, all of the slaves' mii status was updated to "UP". Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets") Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e667d469 |
|
15-Mar-2023 |
Nikolay Aleksandrov <razor@blackwall.org> |
bonding: restore bond's IFF_SLAVE flag if a non-eth dev enslave fails syzbot reported a warning[1] where the bond device itself is a slave and we try to enslave a non-ethernet device as the first slave which fails but then in the error path when ether_setup() restores the bond device it also clears all flags. In my previous fix[2] I restored the IFF_MASTER flag, but I didn't consider the case that the bond device itself might also be a slave with IFF_SLAVE set, so we need to restore that flag as well. Use the bond_ether_setup helper which does the right thing and restores the bond's flags properly. Steps to reproduce using a nlmon dev: $ ip l add nlmon0 type nlmon $ ip l add bond1 type bond $ ip l add bond2 type bond $ ip l set bond1 master bond2 $ ip l set dev nlmon0 master bond1 $ ip -d l sh dev bond1 22: bond1: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noqueue master bond2 state DOWN mode DEFAULT group default qlen 1000 (now bond1's IFF_SLAVE flag is gone and we'll hit a warning[3] if we try to delete it) [1] https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef [2] commit 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") [3] example warning: [ 27.008664] bond1: (slave nlmon0): The slave device specified does not support setting the MAC address [ 27.008692] bond1: (slave nlmon0): Error -95 calling set_mac_address [ 32.464639] bond1 (unregistering): Released all slaves [ 32.464685] ------------[ cut here ]------------ [ 32.464686] WARNING: CPU: 1 PID: 2004 at net/core/dev.c:10829 unregister_netdevice_many+0x72a/0x780 [ 32.464694] Modules linked in: br_netfilter bridge bonding virtio_net [ 32.464699] CPU: 1 PID: 2004 Comm: ip Kdump: loaded Not tainted 5.18.0-rc3+ #47 [ 32.464703] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014 [ 32.464704] RIP: 0010:unregister_netdevice_many+0x72a/0x780 [ 32.464707] Code: 99 fd ff ff ba 90 1a 00 00 48 c7 c6 f4 02 66 96 48 c7 c7 20 4d 35 96 c6 05 fa c7 2b 02 01 e8 be 6f 4a 00 0f 0b e9 73 fd ff ff <0f> 0b e9 5f fd ff ff 80 3d e3 c7 2b 02 00 0f 85 3b fd ff ff ba 59 [ 32.464710] RSP: 0018:ffffa006422d7820 EFLAGS: 00010206 [ 32.464712] RAX: ffff8f6e077140a0 RBX: ffffa006422d7888 RCX: 0000000000000000 [ 32.464714] RDX: ffff8f6e12edbe58 RSI: 0000000000000296 RDI: ffffffff96d4a520 [ 32.464716] RBP: ffff8f6e07714000 R08: ffffffff96d63600 R09: ffffa006422d7728 [ 32.464717] R10: 0000000000000ec0 R11: ffffffff9698c988 R12: ffff8f6e12edb140 [ 32.464719] R13: dead000000000122 R14: dead000000000100 R15: ffff8f6e12edb140 [ 32.464723] FS: 00007f297c2f1740(0000) GS:ffff8f6e5d900000(0000) knlGS:0000000000000000 [ 32.464725] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.464726] CR2: 00007f297bf1c800 CR3: 00000000115e8000 CR4: 0000000000350ee0 [ 32.464730] Call Trace: [ 32.464763] <TASK> [ 32.464767] rtnl_dellink+0x13e/0x380 [ 32.464776] ? cred_has_capability.isra.0+0x68/0x100 [ 32.464780] ? __rtnl_unlock+0x33/0x60 [ 32.464783] ? bpf_lsm_capset+0x10/0x10 [ 32.464786] ? security_capable+0x36/0x50 [ 32.464790] rtnetlink_rcv_msg+0x14e/0x3b0 [ 32.464792] ? _copy_to_iter+0xb1/0x790 [ 32.464796] ? post_alloc_hook+0xa0/0x160 [ 32.464799] ? rtnl_calcit.isra.0+0x110/0x110 [ 32.464802] netlink_rcv_skb+0x50/0xf0 [ 32.464806] netlink_unicast+0x216/0x340 [ 32.464809] netlink_sendmsg+0x23f/0x480 [ 32.464812] sock_sendmsg+0x5e/0x60 [ 32.464815] ____sys_sendmsg+0x22c/0x270 [ 32.464818] ? import_iovec+0x17/0x20 [ 32.464821] ? sendmsg_copy_msghdr+0x59/0x90 [ 32.464823] ? do_set_pte+0xa0/0xe0 [ 32.464828] ___sys_sendmsg+0x81/0xc0 [ 32.464832] ? mod_objcg_state+0xc6/0x300 [ 32.464835] ? refill_obj_stock+0xa9/0x160 [ 32.464838] ? memcg_slab_free_hook+0x1a5/0x1f0 [ 32.464842] __sys_sendmsg+0x49/0x80 [ 32.464847] do_syscall_64+0x3b/0x90 [ 32.464851] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 32.464865] RIP: 0033:0x7f297bf2e5e7 [ 32.464868] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 32.464869] RSP: 002b:00007ffd96c824c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 32.464872] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f297bf2e5e7 [ 32.464874] RDX: 0000000000000000 RSI: 00007ffd96c82540 RDI: 0000000000000003 [ 32.464875] RBP: 00000000640f19de R08: 0000000000000001 R09: 000000000000007c [ 32.464876] R10: 00007f297bffabe0 R11: 0000000000000246 R12: 0000000000000001 [ 32.464877] R13: 00007ffd96c82d20 R14: 00007ffd96c82610 R15: 000055bfe38a7020 [ 32.464881] </TASK> [ 32.464882] ---[ end trace 0000000000000000 ]--- Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") Reported-by: syzbot+9dfc3f3348729cc82277@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3ef Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ec7eb60 |
|
15-Mar-2023 |
Nikolay Aleksandrov <razor@blackwall.org> |
bonding: restore IFF_MASTER/SLAVE flags on bond enslave ether type change Add bond_ether_setup helper which is used to fix ether_setup() calls in the bonding driver. It takes care of both IFF_MASTER and IFF_SLAVE flags, the former is always restored and the latter only if it was set. If the bond enslaves non-ARPHRD_ETHER device (changes its type), then releases it and enslaves ARPHRD_ETHER device (changes back) then we use ether_setup() to restore the bond device type but it also resets its flags and removes IFF_MASTER and IFF_SLAVE[1]. Use the bond_ether_setup helper to restore both after such transition. [1] reproduce (nlmon is non-ARPHRD_ETHER): $ ip l add nlmon0 type nlmon $ ip l add bond2 type bond mode active-backup $ ip l set nlmon0 master bond2 $ ip l set nlmon0 nomaster $ ip l add bond1 type bond (we use bond1 as ARPHRD_ETHER device to restore bond2's mode) $ ip l set bond1 master bond2 $ ip l sh dev bond2 37: bond2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether be:d7:c5:40:5b:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500 (notice bond2's IFF_MASTER is missing) Fixes: e36b9d16c6a6 ("bonding: clean muticast addresses when device changes type") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3fe57986 |
|
24-Jan-2023 |
Leon Romanovsky <leon@kernel.org> |
bonding: fill IPsec state validation failure reason Rely on extack to return failure reason. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7681a4f5 |
|
24-Jan-2023 |
Leon Romanovsky <leon@kernel.org> |
xfrm: extend add state callback to set failure reason Almost all validation logic is in the drivers, but they are missing reliable way to convey failure reason to userspace applications. Let's use extack to return this information to users. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
42c7ded0 |
|
20-Dec-2022 |
Eric Dumazet <edumazet@google.com> |
bonding: fix lockdep splat in bond_miimon_commit() bond_miimon_commit() is run while RTNL is held, not RCU. WARNING: suspicious RCU usage 6.1.0-syzkaller-09671-g89529367293c #0 Not tainted ----------------------------- drivers/net/bonding/bond_main.c:2704 suspicious rcu_dereference_check() usage! Fixes: e95cc44763a4 ("bonding: do failover when high prio link up") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Hangbin Liu <liuhangbin@gmail.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Link: https://lore.kernel.org/r/20221220130831.1480888-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
e95cc447 |
|
11-Dec-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: do failover when high prio link up Currently, when a high prio link enslaved, or when current link down, the high prio port could be selected. But when high prio link up, the new active slave reselection is not triggered. Fix it by checking link's prio when getting up. Making the do_failover after looping all slaves as there may be multi high prio slaves up. Reported-by: Liang Li <liali@redhat.com> Fixes: 0a2ff7cc8ad4 ("Bonding: add per-port priority for failover re-selection") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
3d0b738f |
|
11-Dec-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: add missed __rcu annotation for curr_active_slave There is one direct accesses to bond->curr_active_slave in bond_miimon_commit(). Protected it by rcu_access_pointer() since the later of this function also use this one. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8a321cf7 |
|
09-Dec-2022 |
Xin Long <lucien.xin@gmail.com> |
net: add IFF_NO_ADDRCONF and use it in bonding to prevent ipv6 addrconf Currently, in bonding it reused the IFF_SLAVE flag and checked it in ipv6 addrconf to prevent ipv6 addrconf. However, it is not a proper flag to use for no ipv6 addrconf, for bonding it has to move IFF_SLAVE flag setting ahead of dev_open() in bond_enslave(). Also, IFF_MASTER/SLAVE are historical flags used in bonding and eql, as Jiri mentioned, the new devices like Team, Failover do not use this flag. So as Jiri suggested, this patch adds IFF_NO_ADDRCONF in priv_flags of the device to indicate no ipv6 addconf, and uses it in bonding and moves IFF_SLAVE flag setting back to its original place. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
e5214f36 |
|
28-Nov-2022 |
Dan Carpenter <error27@gmail.com> |
bonding: uninitialized variable in bond_miimon_inspect() The "ignore_updelay" variable needs to be initialized to false. Fixes: f8a65ab2f3ff ("bonding: fix link recovery in mode 2 when updelay is nonzero") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/Y4SWJlh3ohJ6EPTL@kili Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
f8a65ab2 |
|
22-Nov-2022 |
Jonathan Toppins <jtoppins@redhat.com> |
bonding: fix link recovery in mode 2 when updelay is nonzero Before this change when a bond in mode 2 lost link, all of its slaves lost link, the bonding device would never recover even after the expiration of updelay. This change removes the updelay when the bond currently has no usable links. Conforming to bonding.txt section 13.1 paragraph 4. Fixes: 41f891004063 ("bonding: ignore updelay param when there is no active slave") Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
28581b9c |
|
25-Oct-2022 |
Tariq Toukan <tariqt@nvidia.com> |
bond: Disable TLS features indication Bond agnostically interacts with TLS device-offload requests via the .ndo_sk_get_lower_dev operation. Return value is true iff bond guarantees fixed mapping between the TLS connection and a lower netdev. Due to this nature, the bond TLS device offload features are not explicitly controllable in the bond layer. As of today, these are read-only values based on the evaluation of bond_sk_check(). However, this indication might be incorrect and misleading, when the feature bits are "fixed" by some dependency features. For example, NETIF_F_HW_TLS_TX/RX are forcefully cleared in case the corresponding checksum offload is disabled. But in fact the bond ability to still offload TLS connections to the lower device is not hurt. This means that these bits can not be trusted, and hence better become unused. This patch revives some old discussion [1] and proposes a much simpler solution: Clear the bond's TLS features bits. Everyone should stop reading them. [1] https://lore.kernel.org/netdev/20210526095747.22446-1-tariqt@nvidia.com/ Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20221025105300.4718-1-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
1f154f3b |
|
05-Dec-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: get correct NA dest address In commit 4d633d1b468b ("bonding: fix ICMPv6 header handling when receiving IPv6 messages"), there is a copy/paste issue for NA daddr. I found that in my testing and fixed it in my local branch. But I forgot to re-format the patch and sent the wrong mail. Fix it by reading the correct dest address. Fixes: 4d633d1b468b ("bonding: fix ICMPv6 header handling when receiving IPv6 messages") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/20221206032055.7517-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
4d633d1b |
|
17-Nov-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: fix ICMPv6 header handling when receiving IPv6 messages Currently, we get icmp6hdr via function icmp6_hdr(), which needs the skb transport header to be set first. But there is no rule to ask driver set transport header before netif_receive_skb() and bond_handle_frame(). So we will not able to get correct icmp6hdr on some drivers. Fix this by using skb_header_pointer to get the IPv6 and ICMPV6 headers. Reported-by: Liang Li <liali@redhat.com> Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20221118034353.1736727-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
a251c17a |
|
05-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use get_random_u32() when possible The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
fb3ceec1 |
|
30-Aug-2022 |
Wolfram Sang <wsa+renesas@sang-engineering.com> |
net: move from strlcpy with unused retval to strscpy Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN Link: https://lore.kernel.org/r/20220830201457.7984-1-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
0e400d60 |
|
20-Sep-2022 |
Jonathan Toppins <jtoppins@redhat.com> |
bonding: fix NULL deref in bond_rr_gen_slave_id Fix a NULL dereference of the struct bonding.rr_tx_counter member because if a bond is initially created with an initial mode != zero (Round Robin) the memory required for the counter is never created and when the mode is changed there is never any attempt to verify the memory is allocated upon switching modes. This causes the following Oops on an aarch64 machine: [ 334.686773] Unable to handle kernel paging request at virtual address ffff2c91ac905000 [ 334.694703] Mem abort info: [ 334.697486] ESR = 0x0000000096000004 [ 334.701234] EC = 0x25: DABT (current EL), IL = 32 bits [ 334.706536] SET = 0, FnV = 0 [ 334.709579] EA = 0, S1PTW = 0 [ 334.712719] FSC = 0x04: level 0 translation fault [ 334.717586] Data abort info: [ 334.720454] ISV = 0, ISS = 0x00000004 [ 334.724288] CM = 0, WnR = 0 [ 334.727244] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000008044d662000 [ 334.733944] [ffff2c91ac905000] pgd=0000000000000000, p4d=0000000000000000 [ 334.740734] Internal error: Oops: 96000004 [#1] SMP [ 334.745602] Modules linked in: bonding tls veth rfkill sunrpc arm_spe_pmu vfat fat acpi_ipmi ipmi_ssif ixgbe igb i40e mdio ipmi_devintf ipmi_msghandler arm_cmn arm_dsu_pmu cppc_cpufreq acpi_tad fuse zram crct10dif_ce ast ghash_ce sbsa_gwdt nvme drm_vram_helper drm_ttm_helper nvme_core ttm xgene_hwmon [ 334.772217] CPU: 7 PID: 2214 Comm: ping Not tainted 6.0.0-rc4-00133-g64ae13ed4784 #4 [ 334.779950] Hardware name: GIGABYTE R272-P31-00/MP32-AR1-00, BIOS F18v (SCP: 1.08.20211002) 12/01/2021 [ 334.789244] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 334.796196] pc : bond_rr_gen_slave_id+0x40/0x124 [bonding] [ 334.801691] lr : bond_xmit_roundrobin_slave_get+0x38/0xdc [bonding] [ 334.807962] sp : ffff8000221733e0 [ 334.811265] x29: ffff8000221733e0 x28: ffffdbac8572d198 x27: ffff80002217357c [ 334.818392] x26: 000000000000002a x25: ffffdbacb33ee000 x24: ffff07ff980fa000 [ 334.825519] x23: ffffdbacb2e398ba x22: ffff07ff98102000 x21: ffff07ff981029c0 [ 334.832646] x20: 0000000000000001 x19: ffff07ff981029c0 x18: 0000000000000014 [ 334.839773] x17: 0000000000000000 x16: ffffdbacb1004364 x15: 0000aaaabe2f5a62 [ 334.846899] x14: ffff07ff8e55d968 x13: ffff07ff8e55db30 x12: 0000000000000000 [ 334.854026] x11: ffffdbacb21532e8 x10: 0000000000000001 x9 : ffffdbac857178ec [ 334.861153] x8 : ffff07ff9f6e5a28 x7 : 0000000000000000 x6 : 000000007c2b3742 [ 334.868279] x5 : ffff2c91ac905000 x4 : ffff2c91ac905000 x3 : ffff07ff9f554400 [ 334.875406] x2 : ffff2c91ac905000 x1 : 0000000000000001 x0 : ffff07ff981029c0 [ 334.882532] Call trace: [ 334.884967] bond_rr_gen_slave_id+0x40/0x124 [bonding] [ 334.890109] bond_xmit_roundrobin_slave_get+0x38/0xdc [bonding] [ 334.896033] __bond_start_xmit+0x128/0x3a0 [bonding] [ 334.901001] bond_start_xmit+0x54/0xb0 [bonding] [ 334.905622] dev_hard_start_xmit+0xb4/0x220 [ 334.909798] __dev_queue_xmit+0x1a0/0x720 [ 334.913799] arp_xmit+0x3c/0xbc [ 334.916932] arp_send_dst+0x98/0xd0 [ 334.920410] arp_solicit+0xe8/0x230 [ 334.923888] neigh_probe+0x60/0xb0 [ 334.927279] __neigh_event_send+0x3b0/0x470 [ 334.931453] neigh_resolve_output+0x70/0x90 [ 334.935626] ip_finish_output2+0x158/0x514 [ 334.939714] __ip_finish_output+0xac/0x1a4 [ 334.943800] ip_finish_output+0x40/0xfc [ 334.947626] ip_output+0xf8/0x1a4 [ 334.950931] ip_send_skb+0x5c/0x100 [ 334.954410] ip_push_pending_frames+0x3c/0x60 [ 334.958758] raw_sendmsg+0x458/0x6d0 [ 334.962325] inet_sendmsg+0x50/0x80 [ 334.965805] sock_sendmsg+0x60/0x6c [ 334.969286] __sys_sendto+0xc8/0x134 [ 334.972853] __arm64_sys_sendto+0x34/0x4c [ 334.976854] invoke_syscall+0x78/0x100 [ 334.980594] el0_svc_common.constprop.0+0x4c/0xf4 [ 334.985287] do_el0_svc+0x38/0x4c [ 334.988591] el0_svc+0x34/0x10c [ 334.991724] el0t_64_sync_handler+0x11c/0x150 [ 334.996072] el0t_64_sync+0x190/0x194 [ 334.999726] Code: b9001062 f9403c02 d53cd044 8b040042 (b8210040) [ 335.005810] ---[ end trace 0000000000000000 ]--- [ 335.010416] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 335.017279] SMP: stopping secondary CPUs [ 335.021374] Kernel Offset: 0x5baca8eb0000 from 0xffff800008000000 [ 335.027456] PHYS_OFFSET: 0x80000000 [ 335.030932] CPU features: 0x0000,0085c029,19805c82 [ 335.035713] Memory Limit: none [ 335.038756] Rebooting in 180 seconds.. The fix is to allocate the memory in bond_open() which is guaranteed to be called before any packets are processed. Fixes: 848ca9182a7d ("net: bonding: Use per-cpu rr_tx_counter") CC: Jussi Maki <joamaki@gmail.com> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
86247aba |
|
07-Sep-2022 |
Benjamin Poirier <bpoirier@nvidia.com> |
net: bonding: Unsync device addresses on ndo_stop Netdev drivers are expected to call dev_{uc,mc}_sync() in their ndo_set_rx_mode method and dev_{uc,mc}_unsync() in their ndo_stop method. This is mentioned in the kerneldoc for those dev_* functions. The bonding driver calls dev_{uc,mc}_unsync() during ndo_uninit instead of ndo_stop. This is ineffective because address lists (dev->{uc,mc}) have already been emptied in unregister_netdevice_many() before ndo_uninit is called. This mistake can result in addresses being leftover on former bond slaves after a bond has been deleted; see test_LAG_cleanup() in the last patch in this series. Add unsync calls, via bond_hw_addr_flush(), at their expected location, bond_close(). Add dev_mc_add() call to bond_open() to match the above change. v3: * When adding or deleting a slave, only sync/unsync, add/del addresses if the bond is up. In other cases, it is taken care of at the right time by ndo_open/ndo_set_rx_mode/ndo_stop. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1d9a143e |
|
07-Sep-2022 |
Benjamin Poirier <bpoirier@nvidia.com> |
net: bonding: Share lacpdu_mcast_addr definition There are already a few definitions of arrays containing MULTICAST_LACPDU_ADDR and the next patch will add one more use. These all contain the same constant data so define one common instance for all bonding code. Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
592335a4 |
|
30-Aug-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: accept unsolicited NA message The unsolicited NA message with all-nodes multicast dest address should be valid, as this also means the link could reach the target. Also rename bond_validate_ns() to bond_validate_na(). Reported-by: LiLiang <liali@redhat.com> Fixes: 5e1eeef69c0f ("bonding: NS target should accept link local address") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b7f14132 |
|
30-Aug-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: use unspecified address if no available link local address When ns_ip6_target was set, the ipv6_dev_get_saddr() will be called to get available source address and send IPv6 neighbor solicit message. If the target is global address, ipv6_dev_get_saddr() will get any available src address. But if the target is link local address, ipv6_dev_get_saddr() will only get available address from our interface, i.e. the corresponding bond interface. But before bond interface up, all the address is tentative, while ipv6_dev_get_saddr() will ignore tentative address. This makes we can't find available link local src address, then bond_ns_send() will not be called and no NS message was sent. Finally bond interface will keep in down state. Fix this by sending NS with unspecified address if there is no available source address. Reported-by: LiLiang <liali@redhat.com> Fixes: 5e1eeef69c0f ("bonding: NS target should accept link local address") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f2e44dff |
|
19-Aug-2022 |
Jonathan Toppins <jtoppins@redhat.com> |
bonding: 3ad: make ad_ticks_per_sec a const The value is only ever set once in bond_3ad_initialize and only ever read otherwise. There seems to be no reason to set the variable via bond_3ad_initialize when setting the global variable will do. Change ad_ticks_per_sec to a const to enforce its read-only usage. Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
94ce3b64 |
|
10-Aug-2022 |
Maxim Mikityanskiy <maximmi@nvidia.com> |
net/tls: Use RCU API to access tls_ctx->netdev Currently, tls_device_down synchronizes with tls_device_resync_rx using RCU, however, the pointer to netdev is stored using WRITE_ONCE and loaded using READ_ONCE. Although such approach is technically correct (rcu_dereference is essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store NULL), using special RCU helpers for pointers is more valid, as it includes additional checks and might change the implementation transparently to the callers. Mark the netdev pointer as __rcu and use the correct RCU helpers to access it. For non-concurrent access pass the right conditions that guarantee safe access (locks taken, refcount value). Also use the correct helper in mlx5e, where even READ_ONCE was missing. The transition to RCU exposes existing issues, fixed by this commit: 1. bond_tls_device_xmit could read netdev twice, and it could become NULL the second time, after the NULL check passed. 2. Drivers shouldn't stop processing the last packet if tls_device_down just set netdev to NULL, before tls_dev_del was called. This prevents a possible packet drop when transitioning to the fallback software mode. Fixes: 89df6a810470 ("net/bonding: Implement TLS TX device offload") Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
06799a90 |
|
31-Jul-2022 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS The bonding driver piggybacks on time stamps kept by the network stack for the purpose of the netdev TX watchdog, and this is problematic because it does not work with NETIF_F_LLTX devices. It is hard to say why the driver looks at dev_trans_start() of the slave->dev, considering that this is updated even by non-ARP/NS probes sent by us, and even by traffic not sent by us at all (for example PTP on physical slave devices). ARP monitoring in active-backup mode appears to still work even if we track only the last TX time of actual ARP probes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
0a2ff7cc |
|
21-Jun-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
Bonding: add per-port priority for failover re-selection Add per port priority support for bonding active slave re-selection during failover. A higher number means higher priority in selection. The primary slave still has the highest priority. This option also follows the primary_reselect rules. This option could only be configured via netlink. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2fa3ee93 |
|
08-Jun-2022 |
Jonathan Toppins <jtoppins@redhat.com> |
bonding: cleanup bond_create Setting RLB_NULL_INDEX is not needed as this is done in bond_alb_initialize which is called by bond_open. Also reduce the number of rtnl_unlock calls by just using the standard goto cleanup path. Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7a9214f3 |
|
16-Jun-2022 |
Jay Vosburgh <jay.vosburgh@canonical.com> |
bonding: ARP monitor spams NETDEV_NOTIFY_PEERS notifiers The bonding ARP monitor fails to decrement send_peer_notif, the number of peer notifications (gratuitous ARP or ND) to be sent. This results in a continuous series of notifications. Correct this by decrementing the counter for each notification. Reported-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Fixes: b0929915e035 ("bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor") Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/ Tested-by: Jonathan Toppins <jtoppins@redhat.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/9400.1655407960@famine Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c4caa500 |
|
31-May-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: guard ns_targets by CONFIG_IPV6 Guard ns_targets in struct bond_params by CONFIG_IPV6, which could save 256 bytes if IPv6 not configed. Also add this protection for function bond_is_ip6_target_ok() and bond_get_targets_ip6(). Remove the IS_ENABLED() check for bond_opts[] as this will make BOND_OPT_NS_TARGETS uninitialized if CONFIG_IPV6 not enabled. Add a dummy bond_option_ns_ip6_targets_set() for this situation. Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/20220531063727.224043-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
9b80ccda |
|
18-May-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: fix missed rcu protection When removing the rcu_read_lock in bond_ethtool_get_ts_info() as discussed [1], I didn't notice it could be called via setsockopt, which doesn't hold rcu lock, as syzbot pointed: stack backtrace: CPU: 0 PID: 3599 Comm: syz-executor317 Not tainted 5.18.0-rc5-syzkaller-01392-g01f4685797a5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 bond_option_active_slave_get_rcu include/net/bonding.h:353 [inline] bond_ethtool_get_ts_info+0x32c/0x3a0 drivers/net/bonding/bond_main.c:5595 __ethtool_get_ts_info+0x173/0x240 net/ethtool/common.c:554 ethtool_get_phc_vclocks+0x99/0x110 net/ethtool/common.c:568 sock_timestamping_bind_phc net/core/sock.c:869 [inline] sock_set_timestamping+0x3a3/0x7e0 net/core/sock.c:916 sock_setsockopt+0x543/0x2ec0 net/core/sock.c:1221 __sys_setsockopt+0x55e/0x6a0 net/socket.c:2223 __do_sys_setsockopt net/socket.c:2238 [inline] __se_sys_setsockopt net/socket.c:2235 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2235 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f8902c8eb39 Fix it by adding rcu_read_lock and take a ref on the real_dev. Since dev_hold() and dev_put() can take NULL these days, we can skip checking if real_dev exist. [1] https://lore.kernel.org/netdev/27565.1642742439@famine/ Reported-by: syzbot+92beb3d46aab498710fa@syzkaller.appspotmail.com Fixes: aa6034678e87 ("bonding: use rcu_dereference_rtnl when get bonding active slave") Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220519020148.1058344-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ee8b7a11 |
|
05-May-2022 |
Jakub Kicinski <kuba@kernel.org> |
net: make drivers set the TSO limit not the GSO limit Drivers should call the TSO setting helper, GSO is controllable by user space. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
81ee0eb6 |
|
19-Apr-2022 |
Kuniyuki Iwashima <kuniyu@amazon.co.jp> |
ipv6: Use ipv6_only_sock() helper in condition. This patch replaces some sk_ipv6only tests with ipv6_only_sock(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
49aefd13 |
|
16-Apr-2022 |
suresh kumar <suresh2514@gmail.com> |
bonding: do not discard lowest hash bit for non layer3+4 hashing Commit b5f862180d70 was introduced to discard lowest hash bit for layer3+4 hashing but it also removes last bit from non layer3+4 hashing Below script shows layer2+3 hashing will result in same slave to be used with above commit. $ cat hash.py #/usr/bin/python3.6 h_dests=[0xa0, 0xa1] h_source=0xe3 hproto=0x8 saddr=0x1e7aa8c0 daddr=0x17aa8c0 for h_dest in h_dests: hash = (h_dest ^ h_source ^ hproto ^ saddr ^ daddr) hash ^= hash >> 16 hash ^= hash >> 8 print(hash) print("with last bit removed") for h_dest in h_dests: hash = (h_dest ^ h_source ^ hproto ^ saddr ^ daddr) hash ^= hash >> 16 hash ^= hash >> 8 hash = hash >> 1 print(hash) Output: $ python3.6 hash.py 522133332 522133333 <-------------- will result in both slaves being used with last bit removed 261066666 261066666 <-------------- only single slave used Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
625788b5 |
|
10-Mar-2022 |
Eric Dumazet <edumazet@google.com> |
net: add per-cpu storage and net->core_stats Before adding yet another possibly contended atomic_long_t, it is time to add per-cpu storage for existing ones: dev->tx_dropped, dev->rx_dropped, and dev->rx_nohandler Because many devices do not have to increment such counters, allocate the per-cpu storage on demand, so that dev_get_stats() does not have to spend considerable time folding zero counters. Note that some drivers have abused these counters which were supposed to be only used by core networking stack. v4: should use per_cpu_ptr() in dev_get_stats() (Jakub) v3: added a READ_ONCE() in netdev_core_stats_alloc() (Paolo) v2: add a missing include (reported by kernel test robot <lkp@intel.com>) Change in netdev_core_stats_alloc() (Jakub) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: jeffreyji <jeffreyji@google.com> Reviewed-by: Brian Vazquez <brianvv@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220311051420.2608812-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
4e24be01 |
|
20-Feb-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: add new parameter ns_targets Add a new bonding parameter ns_targets to store IPv6 address. Add required bond_ns_send/rcv functions first before adding IPv6 address option setting. Add two functions bond_send/rcv_validate so we can send/recv ARP and NS at the same time. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1fcd5d44 |
|
20-Feb-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
Bonding: split bond_handle_vlan from bond_arp_send Function bond_handle_vlan() is split from bond_arp_send() for later IPv6 usage. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
16a41634 |
|
07-Feb-2022 |
Eric Dumazet <edumazet@google.com> |
bonding: switch bond_net_exit() to batch mode cleanup_net() is competing with other rtnl users. Batching bond_net_exit() factorizes all rtnl acquistions to a single one, giving chance for cleanup_net() to progress much faster, holding rtnl a bit longer. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
a6ab75ce |
|
16-Feb-2022 |
Zhang Changzhong <zhangchangzhong@huawei.com> |
bonding: force carrier update when releasing slave In __bond_release_one(), bond_set_carrier() is only called when bond device has no slave. Therefore, if we remove the up slave from a master with two slaves and keep the down slave, the master will remain up. Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond)) statement. Reproducer: $ insmod bonding.ko mode=0 miimon=100 max_bonds=2 $ ifconfig bond0 up $ ifenslave bond0 eth0 eth1 $ ifconfig eth0 down $ ifenslave -d bond0 eth1 $ cat /proc/net/bonding/bond0 Fixes: ff59c4563a8d ("[PATCH] bonding: support carrier state for master") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
aa603467 |
|
21-Jan-2022 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: use rcu_dereference_rtnl when get bonding active slave bond_option_active_slave_get_rcu() should not be used in rtnl_mutex as it use rcu_dereference(). Replace to rcu_dereference_rtnl() so we also can use this function in rtnl protected context. With this update, we can rmeove the rcu_read_lock/unlock in bonding .ndo_eth_ioctl and .get_ts_info. Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com> Fixes: 94dd016ae538 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
429e3d12 |
|
16-Jan-2022 |
Moshe Tal <moshet@nvidia.com> |
bonding: Fix extraction of ports from the packet headers Wrong hash sends single stream to multiple output interfaces. The offset calculation was relative to skb->head, fix it to be relative to skb->data. Fixes: a815bde56b15 ("net, bonding: Refactor bond_xmit_hash for use with xdp_buff") Reviewed-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Moshe Tal <moshet@nvidia.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4e5bd03a |
|
12-Jan-2022 |
Jie Wang <wangjie125@huawei.com> |
net: bonding: fix bond_xmit_broadcast return value error bug In Linux bonding scenario, one packet is copied to several copies and sent by all slave device of bond0 in mode 3(broadcast mode). The mode 3 xmit function bond_xmit_broadcast() only ueses the last slave device's tx result as the final result. In this case, if the last slave device is down, then it always return NET_XMIT_DROP, even though the other slave devices xmit success. It may cause the tx statistics error, and cause the application (e.g. scp) consider the network is unreachable. For example, use the following command to configure server A. echo 3 > /sys/class/net/bond0/bonding/mode ifconfig bond0 up ifenslave bond0 eth0 eth1 ifconfig bond0 192.168.1.125 ifconfig eth0 up ifconfig eth1 down The slave device eth0 and eth1 are connected to server B(192.168.1.107). Run the ping 192.168.1.107 -c 3 -i 0.2 command, the following information is displayed. PING 192.168.1.107 (192.168.1.107) 56(84) bytes of data. 64 bytes from 192.168.1.107: icmp_seq=1 ttl=64 time=0.077 ms 64 bytes from 192.168.1.107: icmp_seq=2 ttl=64 time=0.056 ms 64 bytes from 192.168.1.107: icmp_seq=3 ttl=64 time=0.051 ms 192.168.1.107 ping statistics 0 packets transmitted, 3 received Actually, the slave device eth0 of the bond successfully sends three ICMP packets, but the result shows that 0 packets are transmitted. Also if we use scp command to get remote files, the command end with the following printings. ssh_exchange_identification: read: Connection timed out So this patch modifies the bond_xmit_broadcast to return NET_XMIT_SUCCESS if one slave device in the bond sends packets successfully. If all slave devices send packets fail, the discarded packets stats is increased. The skb is released when there is no slave device in the bond or the last slave device is down. Fixes: ae46f184bc1f ("bonding: propagate transmit status") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cfe355c5 |
|
29-Dec-2021 |
Hangbin Liu <liuhangbin@gmail.com> |
Bonding: return HWTSTAMP_FLAG_BONDED_PHC_INDEX to notify user space If the userspace program is distributed in binary form (distro package), there is no way to know on which kernel versions it will run. Let's only check if the flag was set when do SIOCSHWTSTAMP. And return hwtstamp_config with flag HWTSTAMP_FLAG_BONDED_PHC_INDEX to notify userspace whether the new feature is supported or not. Suggested-by: Jakub Kicinski <kuba@kernel.org> Fixes: 085d61000845 ("Bonding: force user to add HWTSTAMP_FLAG_BONDED_PHC_INDEX when get/set HWTSTAMP") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
b6459415 |
|
28-Dec-2021 |
Jakub Kicinski <kuba@kernel.org> |
net: Don't include filter.h from net/sock.h sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
|
#
085d6100 |
|
10-Dec-2021 |
Hangbin Liu <liuhangbin@gmail.com> |
Bonding: force user to add HWTSTAMP_FLAG_BONDED_PHC_INDEX when get/set HWTSTAMP When there is a failover, the PHC index of bond active interface will be changed. This may break the user space program if the author didn't aware. By setting this flag, the user should aware that the PHC index get/set by syscall is not stable. And the user space is able to deal with it. Without this flag, the kernel will reject the request forwarding to bonding. Reported-by: Jakub Kicinski <kuba@kernel.org> Fixes: 94dd016ae538 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fee32de2 |
|
12-Dec-2021 |
Suresh Kumar <surkumar@redhat.com> |
net: bonding: debug: avoid printing debug logs when bond is not notifying peers Currently "bond_should_notify_peers: slave ..." messages are printed whenever "bond_should_notify_peers" function is called. +++ Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): Received LACPDU on port 1 Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): Rx Machine: Port=1, Last State=6, Curr State=6 Dec 12 12:33:26 node1 kernel: bond0: (slave enp0s25): partner sync=1 Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:26 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 ... Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): Received LACPDU on port 2 Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): Rx Machine: Port=2, Last State=6, Curr State=6 Dec 12 12:33:30 node1 kernel: bond0: (slave enp4s3): partner sync=1 Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 Dec 12 12:33:30 node1 kernel: bond0: bond_should_notify_peers: slave enp0s25 +++ This is confusing and can also clutter up debug logs. Print logs only when the peer notification happens. Signed-off-by: Suresh Kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
94dd016a |
|
30-Nov-2021 |
Hangbin Liu <liuhangbin@gmail.com> |
bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device We have VLAN PTP support(via get_ts_info) on kernel, and bond support(by getting active interface via netlink message) on userspace tool linuxptp. But there are always some users who want to use PTP with VLAN over bond, which is not able to do with the current implementation. This patch passed get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device with bond mode active-backup/tlb/alb. With this users could get kernel native bond or VLAN over bond PTP support. Test with ptp4l and it works with VLAN over bond after this patch: ]# ptp4l -m -i bond0.23 ptp4l[53377.141]: selected /dev/ptp4 as PTP clock ptp4l[53377.142]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[53377.143]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[53377.143]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[53384.127]: port 1: LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES ptp4l[53384.127]: selected local clock e41d2d.fffe.123db0 as best master ptp4l[53384.127]: port 1: assuming the grand master role Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5944b5ab |
|
29-Nov-2021 |
Hangbin Liu <liuhangbin@gmail.com> |
Bonding: add arp_missed_max option Currently, we use hard code number to verify if we are in the arp_interval timeslice. But some user may want to reduce/extend the verify timeslice. With the similar team option 'missed_max' the uers could change that number based on their own environment. Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6d872df3 |
|
19-Nov-2021 |
Eric Dumazet <edumazet@google.com> |
net: annotate accesses to dev->gso_max_segs dev->gso_max_segs is written under RTNL protection, or when the device is not yet visible, but is read locklessly. Add netif_set_gso_max_segs() helper. Add the READ_ONCE()/WRITE_ONCE() pairs, and use netif_set_gso_max_segs() where we can to better document what is going on. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6f238100 |
|
22-Oct-2021 |
Jakub Kicinski <kuba@kernel.org> |
net: bonding: constify and use dev_addr_set() Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it go through appropriate helpers. Make sure local references to netdev->dev_addr are constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ea52a0b5 |
|
08-Oct-2021 |
Jakub Kicinski <kuba@kernel.org> |
net: use dev_addr_set() Use dev_addr_set() instead of writing directly to netdev->dev_addr in various misc and old drivers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6d5f1ef8 |
|
06-Sep-2021 |
Jussi Maki <joamaki@gmail.com> |
bonding: Fix negative jump label count on nested bonding With nested bonding devices the nested bond device's ndo_bpf was called without a program causing it to decrement the static key without a prior increment leading to negative count. Fix the issue by 1) only calling slave's ndo_bpf when there's a program to be loaded and 2) only decrement the count when a program is unloaded. Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver") Reported-by: syzbot+30622fb04ddd72a4d167@syzkaller.appspotmail.com Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0a4fd8df |
|
04-Sep-2021 |
David Decotigny <ddecotig@google.com> |
bonding: complain about missing route only once for A/B ARP probes On configs where there is no confirgured direct route to the target of the ARP probes, these probes are still sent and may be replied to properly, so no need to repeatedly complain about the missing route. Signed-off-by: David Decotigny <ddecotig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1b3f78df |
|
15-Aug-2021 |
Antoine Tenart <atenart@kernel.org> |
bonding: improve nl error msg when device can't be enslaved because of IFF_MASTER Use a more user friendly netlink error message when a device can't be enslaved because it has IFF_MASTER, by not referring directly to a kernel internal flag. Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
39a0876d |
|
12-Aug-2021 |
Jussi Maki <joamaki@gmail.com> |
net, bonding: Disallow vlan+srcmac with XDP The new vlan+srcmac xmit policy is not implementable with XDP since in many cases the 802.1Q payload is not present in the packet. This can be for example due to hardware offload or in the case of veth due to use of skbuffs internally. This also fixes the NULL deref with the vlan+srcmac xmit policy reported by Jonathan Toppins by additionally checking the skb pointer. Fixes: a815bde56b15 ("net, bonding: Refactor bond_xmit_hash for use with xdp_buff") Reported-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/20210812145241.12449-1-joamaki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
6569fa2d |
|
10-Aug-2021 |
Jonathan Toppins <jtoppins@redhat.com> |
bonding: combine netlink and console error messages There seems to be no reason to have different error messages between netlink and printk. It also cleans up the function slightly. Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
9e2ee5c7 |
|
30-Jul-2021 |
Jussi Maki <joamaki@gmail.com> |
net, bonding: Add XDP support to the bonding driver XDP is implemented in the bonding driver by transparently delegating the XDP program loading, removal and xmit operations to the bonding slave devices. The overall goal of this work is that XDP programs can be attached to a bond device *without* any further changes (or awareness) necessary to the program itself, meaning the same XDP program can be attached to a native device but also a bonding device. Semantics of XDP_TX when attached to a bond are equivalent in such setting to the case when a tc/BPF program would be attached to the bond, meaning transmitting the packet out of the bond itself using one of the bond's configured xmit methods to select a slave device (rather than XDP_TX on the slave itself). Handling of XDP_TX to transmit using the configured bonding mechanism is therefore implemented by rewriting the BPF program return value in bpf_prog_run_xdp. To avoid performance impact this check is guarded by a static key, which is incremented when a XDP program is loaded onto a bond device. This approach was chosen to avoid changes to drivers implementing XDP. If the slave device does not match the receive device, then XDP_REDIRECT is transparently used to perform the redirection in order to have the network driver release the packet from its RX ring. The bonding driver hashing functions have been refactored to allow reuse with xdp_buff's to avoid code duplication. The motivation for this change is to enable use of bonding (and 802.3ad) in hairpinning L4 load-balancers such as [1] implemented with XDP and also to transparently support bond devices for projects that use XDP given most modern NICs have dual port adapters. An alternative to this approach would be to implement 802.3ad in user-space and implement the bonding load-balancing in the XDP program itself, but is rather a cumbersome endeavor in terms of slave device management (e.g. by watching netlink) and requires separate programs for native vs bond cases for the orchestrator. A native in-kernel implementation overcomes these issues and provides more flexibility. Below are benchmark results done on two machines with 100Gbit Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and 16-core 3950X on receiving machine. 64 byte packets were sent with pktgen-dpdk at full rate. Two issues [2, 3] were identified with the ice driver, so the tests were performed with iommu=off and patch [2] applied. Additionally the bonding round robin algorithm was modified to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate of cache misses were caused by the shared rr_tx_counter (see patch 2/3). The statistics were collected using "sar -n dev -u 1 10". On top of that, for ice, further work is in progress on improving the XDP_TX numbers [4]. -----------------------| CPU |--| rxpck/s |--| txpck/s |---- without patch (1 dev): XDP_DROP: 3.15% 48.6Mpps XDP_TX: 3.12% 18.3Mpps 18.3Mpps XDP_DROP (RSS): 9.47% 116.5Mpps XDP_TX (RSS): 9.67% 25.3Mpps 24.2Mpps ----------------------- with patch, bond (1 dev): XDP_DROP: 3.14% 46.7Mpps XDP_TX: 3.15% 13.9Mpps 13.9Mpps XDP_DROP (RSS): 10.33% 117.2Mpps XDP_TX (RSS): 10.64% 25.1Mpps 24.0Mpps ----------------------- with patch, bond (2 devs): XDP_DROP: 6.27% 92.7Mpps XDP_TX: 6.26% 17.6Mpps 17.5Mpps XDP_DROP (RSS): 11.38% 117.2Mpps XDP_TX (RSS): 14.30% 28.7Mpps 27.4Mpps -------------------------------------------------------------- RSS: Receive Side Scaling, e.g. the packets were sent to a range of destination IPs. [1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb [2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t [3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/ [4]: https://lore.kernel.org/bpf/20210805230046.28715-1-maciej.fijalkowski@intel.com/T/#t Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20210731055738.16820-4-joamaki@gmail.com
|
#
a815bde5 |
|
30-Jul-2021 |
Jussi Maki <joamaki@gmail.com> |
net, bonding: Refactor bond_xmit_hash for use with xdp_buff In preparation for adding XDP support to the bonding driver refactor the packet hashing functions to be able to work with any linear data buffer without an skb. Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Link: https://lore.kernel.org/bpf/20210731055738.16820-2-joamaki@gmail.com
|
#
3a755cd8 |
|
01-Aug-2021 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: add new option lacp_active Add an option lacp_active, which is similar with team's runner.active. This option specifies whether to send LACPDU frames periodically. If set on, the LACPDU frames are sent along with the configured lacp_rate setting. If set off, the LACPDU frames acts as "speak when spoken to". Note, the LACPDU state frames still will be sent when init or unbind port. v2: remove module parameter Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
220ade77 |
|
29-Jul-2021 |
Yufeng Mo <moyufeng@huawei.com> |
bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler() Some time ago, I reported a calltrace issue "did not find a suitable aggregator", please see[1]. After a period of analysis and reproduction, I find that this problem is caused by concurrency. Before the problem occurs, the bond structure is like follows: bond0 - slaver0(eth0) - agg0.lag_ports -> port0 - port1 \ port0 \ slaver1(eth1) - agg1.lag_ports -> NULL \ port1 If we run 'ifenslave bond0 -d eth1', the process is like below: excuting __bond_release_one() | bond_upper_dev_unlink()[step1] | | | | | bond_3ad_lacpdu_recv() | | ->bond_3ad_rx_indication() | | spin_lock_bh() | | ->ad_rx_machine() | | ->__record_pdu()[step2] | | spin_unlock_bh() | | | | bond_3ad_state_machine_handler() | spin_lock_bh() | ->ad_port_selection_logic() | ->try to find free aggregator[step3] | ->try to find suitable aggregator[step4] | ->did not find a suitable aggregator[step5] | spin_unlock_bh() | | | | bond_3ad_unbind_slave() | spin_lock_bh() spin_unlock_bh() step1: already removed slaver1(eth1) from list, but port1 remains step2: receive a lacpdu and update port0 step3: port0 will be removed from agg0.lag_ports. The struct is "agg0.lag_ports -> port1" now, and agg0 is not free. At the same time, slaver1/agg1 has been removed from the list by step1. So we can't find a free aggregator now. step4: can't find suitable aggregator because of step2 step5: cause a calltrace since port->aggregator is NULL To solve this concurrency problem, put bond_upper_dev_unlink() after bond_3ad_unbind_slave(). In this way, we can invalid the port first and skip this port in bond_3ad_state_machine_handler(). This eliminates the situation that the slaver has been removed from the list but the port is still valid. [1]https://lore.kernel.org/netdev/10374.1611947473@famine/ Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d9d00bd |
|
27-Jul-2021 |
Arnd Bergmann <arnd@arndb.de> |
net: bonding: move ioctl handling to private ndo operation All other user triggered operations are gone from ndo_ioctl, so move the SIOCBOND family into a custom operation as well. The .ndo_ioctl() helper is no longer called by the dev_ioctl.c code now, but there are still a few definitions in obsolete wireless drivers as well as the appletalk and ieee802154 layers to call SIOCSIFADDR/SIOCGIFADDR helpers from inside the kernel. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a7605370 |
|
27-Jul-2021 |
Arnd Bergmann <arnd@arndb.de> |
dev_ioctl: split out ndo_eth_ioctl Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP. Separate these from the few drivers that use ndo_do_ioctl to implement SIOCBOND, SIOCBR and SIOCWANDEV commands. This is a purely cosmetic change intended to help readers find their way through the implementation. Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
232ec98e |
|
27-Jul-2021 |
Arnd Bergmann <arnd@arndb.de> |
bonding: use siocdevprivate The bonding driver supports two command codes for each operation: one in the SIOCDEVPRIVATE range and another one with the same definition but a unique command code. Only the second set currently works in compat mode, as the ifr_data expansion overwrites part of the ifr_slave field. Move the private ones into ndo_siocdevprivate and change the implementation to call the other function. This makes both version work correctly. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b69874f |
|
16-Jul-2021 |
Mahesh Bandewar <maheshb@google.com> |
bonding: fix build issue The commit 9a5605505d9c (" bonding: Add struct bond_ipesc to manage SA") is causing following build error when XFRM is not selected in kernel config. lld: error: undefined symbol: xfrm_dev_state_flush >>> referenced by bond_main.c:3453 (drivers/net/bonding/bond_main.c:3453) >>> net/bonding/bond_main.o:(bond_netdev_event) in archive drivers/built-in.a Fixes: 9a5605505d9c (" bonding: Add struct bond_ipesc to manage SA") Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Taehee Yoo <ap420073@gmail.com> CC: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
168e696a |
|
05-Jul-2021 |
Taehee Yoo <ap420073@gmail.com> |
bonding: fix incorrect return value of bond_ipsec_offload_ok() bond_ipsec_offload_ok() is called to check whether the interface supports ipsec offload or not. bonding interface support ipsec offload only in active-backup mode. So, if a bond interface is not in active-backup mode, it should return false but it returns true. Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
955b785e |
|
05-Jul-2021 |
Taehee Yoo <ap420073@gmail.com> |
bonding: fix suspicious RCU usage in bond_ipsec_offload_ok() To dereference bond->curr_active_slave, it uses rcu_dereference(). But it and the caller doesn't acquire RCU so a warning occurs. So add rcu_read_lock(). Splat looks like: WARNING: suspicious RCU usage 5.13.0-rc6+ #1179 Not tainted drivers/net/bonding/bond_main.c:571 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ping/974: #0: ffff888109e7db70 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x1303/0x2cb0 stack backtrace: CPU: 2 PID: 974 Comm: ping Not tainted 5.13.0-rc6+ #1179 Call Trace: dump_stack+0xa4/0xe5 bond_ipsec_offload_ok+0x1f4/0x260 [bonding] xfrm_output+0x179/0x890 xfrm4_output+0xfa/0x410 ? __xfrm4_output+0x4b0/0x4b0 ? __ip_make_skb+0xecc/0x2030 ? xfrm4_udp_encap_rcv+0x800/0x800 ? ip_local_out+0x21/0x3a0 ip_send_skb+0x37/0xa0 raw_sendmsg+0x1bfd/0x2cb0 Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9a560550 |
|
05-Jul-2021 |
Taehee Yoo <ap420073@gmail.com> |
bonding: Add struct bond_ipesc to manage SA bonding has been supporting ipsec offload. When SA is added, bonding just passes SA to its own active real interface. But it doesn't manage SA. So, when events(add/del real interface, active real interface change, etc) occur, bonding can't handle that well because It doesn't manage SA. So some problems(panic, UAF, refcnt leak)occur. In order to make it stable, it should manage SA. That's the reason why struct bond_ipsec is added. When a new SA is added to bonding interface, it is stored in the bond_ipsec list. And the SA is passed to a current active real interface. If events occur, it uses bond_ipsec data to handle these events. bond->ipsec_list is protected by bond->ipsec_lock. If a current active real interface is changed, the following logic works. 1. delete all SAs from old active real interface 2. Add all SAs to the new active real interface. 3. If a new active real interface doesn't support ipsec offload or SA's option, it sets real_dev to NULL. Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b1216933 |
|
05-Jul-2021 |
Taehee Yoo <ap420073@gmail.com> |
bonding: disallow setting nested bonding + ipsec offload bonding interface can be nested and it supports ipsec offload. So, it allows setting the nested bonding + ipsec scenario. But code does not support this scenario. So, it should be disallowed. interface graph: bond2 | bond1 | eth0 The nested bonding + ipsec offload may not a real usecase. So, disallowing this scenario is fine. Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a22c39b8 |
|
05-Jul-2021 |
Taehee Yoo <ap420073@gmail.com> |
bonding: fix suspicious RCU usage in bond_ipsec_del_sa() To dereference bond->curr_active_slave, it uses rcu_dereference(). But it and the caller doesn't acquire RCU so a warning occurs. So add rcu_read_lock(). Test commands: ip netns add A ip netns exec A bash modprobe netdevsim echo "1 1" > /sys/bus/netdevsim/new_device ip link add bond0 type bond ip link set eth0 master bond0 ip link set eth0 up ip link set bond0 up ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \ transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \ 0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \ dst 14.0.0.70/24 proto tcp offload dev bond0 dir in ip x s f Splat looks like: ============================= WARNING: suspicious RCU usage 5.13.0-rc3+ #1168 Not tainted ----------------------------- drivers/net/bonding/bond_main.c:448 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by ip/705: #0: ffff888106701780 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3}, at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user] #1: ffff8880075b0098 (&x->lock){+.-.}-{2:2}, at: xfrm_state_delete+0x16/0x30 stack backtrace: CPU: 6 PID: 705 Comm: ip Not tainted 5.13.0-rc3+ #1168 Call Trace: dump_stack+0xa4/0xe5 bond_ipsec_del_sa+0x16a/0x1c0 [bonding] __xfrm_state_delete+0x51f/0x730 xfrm_state_delete+0x1e/0x30 xfrm_state_flush+0x22f/0x390 xfrm_flush_sa+0xd8/0x260 [xfrm_user] ? xfrm_flush_policy+0x290/0x290 [xfrm_user] xfrm_user_rcv_msg+0x331/0x660 [xfrm_user] ? rcu_read_lock_sched_held+0x91/0xc0 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? find_held_lock+0x3a/0x1c0 ? mutex_lock_io_nested+0x1210/0x1210 ? sched_clock_cpu+0x18/0x170 netlink_rcv_skb+0x121/0x350 [ ... ] Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
105cd17a |
|
05-Jul-2021 |
Taehee Yoo <ap420073@gmail.com> |
bonding: fix null dereference in bond_ipsec_add_sa() If bond doesn't have real device, bond->curr_active_slave is null. But bond_ipsec_add_sa() dereferences bond->curr_active_slave without null checking. So, null-ptr-deref would occur. Test commands: ip link add bond0 type bond ip link set bond0 up ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi \ 0x07 mode transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \ 0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \ dst 14.0.0.70/24 proto tcp offload dev bond0 dir in Splat looks like: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 4 PID: 680 Comm: ip Not tainted 5.13.0-rc3+ #1168 RIP: 0010:bond_ipsec_add_sa+0xc4/0x2e0 [bonding] Code: 85 21 02 00 00 4d 8b a6 48 0c 00 00 e8 75 58 44 ce 85 c0 0f 85 14 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 fc 01 00 00 48 8d bb e0 02 00 00 4d 8b 2c 24 48 RSP: 0018:ffff88810946f508 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88810b4e8040 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffff8fe34280 RDI: ffff888115abe100 RBP: ffff88810946f528 R08: 0000000000000003 R09: fffffbfff2287e11 R10: 0000000000000001 R11: ffff888115abe0c8 R12: 0000000000000000 R13: ffffffffc0aea9a0 R14: ffff88800d7d2000 R15: ffff88810b4e8330 FS: 00007efc5552e680(0000) GS:ffff888119c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055c2530dbf40 CR3: 0000000103056004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: xfrm_dev_state_add+0x2a9/0x770 ? memcpy+0x38/0x60 xfrm_add_sa+0x2278/0x3b10 [xfrm_user] ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user] ? register_lock_class+0x1750/0x1750 xfrm_user_rcv_msg+0x331/0x660 [xfrm_user] ? rcu_read_lock_sched_held+0x91/0xc0 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? find_held_lock+0x3a/0x1c0 ? mutex_lock_io_nested+0x1210/0x1210 ? sched_clock_cpu+0x18/0x170 netlink_rcv_skb+0x121/0x350 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? netlink_ack+0x9d0/0x9d0 ? netlink_deliver_tap+0x17c/0xa50 xfrm_netlink_rcv+0x68/0x80 [xfrm_user] netlink_unicast+0x41c/0x610 ? netlink_attachskb+0x710/0x710 netlink_sendmsg+0x6b9/0xb70 [ ...] Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b648eba4 |
|
05-Jul-2021 |
Taehee Yoo <ap420073@gmail.com> |
bonding: fix suspicious RCU usage in bond_ipsec_add_sa() To dereference bond->curr_active_slave, it uses rcu_dereference(). But it and the caller doesn't acquire RCU so a warning occurs. So add rcu_read_lock(). Test commands: ip link add dummy0 type dummy ip link add bond0 type bond ip link set dummy0 master bond0 ip link set dummy0 up ip link set bond0 up ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 \ mode transport \ reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \ 0x44434241343332312423222114131211f4f3f2f1 128 sel \ src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload \ dev bond0 dir in Splat looks like: ============================= WARNING: suspicious RCU usage 5.13.0-rc3+ #1168 Not tainted ----------------------------- drivers/net/bonding/bond_main.c:411 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ip/684: #0: ffffffff9a2757c0 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3}, at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user] 55.191733][ T684] stack backtrace: CPU: 0 PID: 684 Comm: ip Not tainted 5.13.0-rc3+ #1168 Call Trace: dump_stack+0xa4/0xe5 bond_ipsec_add_sa+0x18c/0x1f0 [bonding] xfrm_dev_state_add+0x2a9/0x770 ? memcpy+0x38/0x60 xfrm_add_sa+0x2278/0x3b10 [xfrm_user] ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user] ? register_lock_class+0x1750/0x1750 xfrm_user_rcv_msg+0x331/0x660 [xfrm_user] ? rcu_read_lock_sched_held+0x91/0xc0 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? find_held_lock+0x3a/0x1c0 ? mutex_lock_io_nested+0x1210/0x1210 ? sched_clock_cpu+0x18/0x170 netlink_rcv_skb+0x121/0x350 ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user] ? netlink_ack+0x9d0/0x9d0 ? netlink_deliver_tap+0x17c/0xa50 xfrm_netlink_rcv+0x68/0x80 [xfrm_user] netlink_unicast+0x41c/0x610 ? netlink_attachskb+0x710/0x710 netlink_sendmsg+0x6b9/0xb70 [ ... ] Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d293fe1 |
|
22-Jun-2021 |
Di Zhu <zhudi21@huawei.com> |
bonding: allow nesting of bonding device The commit 3c9ef511b9fa ("bonding: avoid adding slave device with IFF_MASTER flag") fix a crash when add slave device with IFF_MASTER, but it rejects the scenario of nested bonding device. As Eric Dumazet described: since there indeed is a usage scenario about nesting bonding, we should not break it. So we add a new judgment condition to allow nesting of bonding device. Fixes: 3c9ef511b9fa ("bonding: avoid adding slave device with IFF_MASTER flag") Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Di Zhu <zhudi21@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3c9ef511 |
|
21-Jun-2021 |
Di Zhu <zhudi21@huawei.com> |
bonding: avoid adding slave device with IFF_MASTER flag The following steps will definitely cause the kernel to crash: ip link add vrf1 type vrf table 1 modprobe bonding.ko max_bonds=1 echo "+vrf1" >/sys/class/net/bond0/bonding/slaves rmmod bonding The root cause is that: When the VRF is added to the slave device, it will fail, and some cleaning work will be done. because VRF device has IFF_MASTER flag, cleanup process will not clear the IFF_BONDING flag. Then, when we unload the bonding module, unregister_netdevice_notifier() will treat the VRF device as a bond master device and treat netdev_priv() as struct bonding{} which actually is struct net_vrf{}. By analyzing the processing logic of bond_enslave(), it seems that it is not allowed to add the slave device with the IFF_MASTER flag, so we need to add a code check for this situation. Signed-off-by: Di Zhu <zhudi21@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
848ca918 |
|
15-Jun-2021 |
Jussi Maki <joamaki@gmail.com> |
net: bonding: Use per-cpu rr_tx_counter The round-robin rr_tx_counter was shared across CPUs leading to significant cache thrashing at high packet rates. This patch switches the round-robin packet counter to use a per-cpu variable to decide the destination slave. On a test with 2x100Gbit ICE nic with pktgen_sample_04_many_flows.sh (-s 64 -t 32) the tx rate was 19.6Mpps before and 22.3Mpps after this patch. "perf top -e cache_misses" before: 12.31% [bonding] [k] bond_xmit_roundrobin_slave_get 10.59% [sch_fq_codel] [k] fq_codel_dequeue 9.34% [kernel] [k] skb_release_data after: 15.42% [sch_fq_codel] [k] fq_codel_dequeue 10.06% [kernel] [k] __memset 9.12% [kernel] [k] skb_release_data Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
43902070 |
|
02-Jun-2021 |
Kees Cook <keescook@chromium.org> |
net: bonding: Use strscpy_pad() instead of manually-truncated strncpy() Silence this warning by using strscpy_pad() directly: drivers/net/bonding/bond_main.c:4877:3: warning: 'strncpy' specified bound 16 equals destination size [-Wstringop-truncation] 4877 | strncpy(params->primary, primary, IFNAMSIZ); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Additionally replace other strncpy() uses, as it is considered deprecated: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/202102150705.fdR6obB0-lkp@intel.com Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52333512 |
|
20-May-2021 |
Yufeng Mo <moyufeng@huawei.com> |
net: bonding: remove unnecessary braces Braces {} are not necessary for single statement blocks, so remove these braces {}. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
86a5ad0a |
|
20-May-2021 |
Yufeng Mo <moyufeng@huawei.com> |
net: bonding: add some required blank lines Add some blank lines after declarations as required. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
35d96e63 |
|
17-May-2021 |
Johannes Berg <johannes.berg@intel.com> |
bonding: init notify_work earlier to avoid uninitialized use If bond_kobj_init() or later kzalloc() in bond_alloc_slave() fail, then we call kobject_put() on the slave->kobj. This in turn calls the release function slave_kobj_release() which will always try to cancel_delayed_work_sync(&slave->notify_work), which shouldn't be done on an uninitialized work struct. Always initialize the work struct earlier to avoid problems here. Syzbot bisected this down to a completely pointless commit, some fault injection may have been at work here that caused the alloc failure in the first place, which may interact badly with bisect. Reported-by: syzbot+bfda097c12a00c8cae67@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
83d686a6 |
|
21-Apr-2021 |
jinyiting <jinyiting@huawei.com> |
bonding: 3ad: Fix the conflict between bond_update_slave_arr and the state machine The bond works in mode 4, and performs down/up operations on the bond that is normally negotiated. The probability of bond-> slave_arr is NULL Test commands: ifconfig bond1 down ifconfig bond1 up The conflict occurs in the following process: __dev_open (CPU A) --bond_open --queue_delayed_work(bond->wq,&bond->ad_work,0); --bond_update_slave_arr --bond_3ad_get_active_agg_info ad_work(CPU B) --bond_3ad_state_machine_handler --ad_agg_selection_logic ad_work runs on cpu B. In the function ad_agg_selection_logic, all agg->is_active will be cleared. Before the new active aggregator is selected on CPU B, bond_3ad_get_active_agg_info failed on CPU A, bond->slave_arr will be set to NULL. The best aggregator in ad_agg_selection_logic has not changed, no need to update slave arr. The conflict occurred in that ad_agg_selection_logic clears agg->is_active under mode_lock, but bond_open -> bond_update_slave_arr is inspecting agg->is_active outside the lock. Also, bond_update_slave_arr is normal for potential sleep when allocating memory, so replace the WARN_ON with a call to might_sleep. Signed-off-by: jinyiting <jinyiting@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
acf61b3d |
|
29-Mar-2021 |
Yang Yingliang <yangyingliang@huawei.com> |
net: bonding: Correct function name bond_change_active_slave() in comment Fix the following make W=1 kernel build warning: drivers/net/bonding/bond_main.c:982: warning: expecting prototype for change_active_interface(). Prototype was for bond_change_active_slave() instead Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
080bfa1e |
|
12-Mar-2021 |
David S. Miller <davem@davemloft.net> |
Revert "net: bonding: fix error return code of bond_neigh_init()" This reverts commit 2055a99da8a253a357bdfd359b3338ef3375a26c. This change rejects legitimate configurations. A slave doesn't need to exist nor implement ndo_slave_setup. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2055a99d |
|
07-Mar-2021 |
Jia-Ju Bai <baijiaju1990@gmail.com> |
net: bonding: fix error return code of bond_neigh_init() When slave is NULL or slave_ops->ndo_neigh_setup is NULL, no error return code of bond_neigh_init() is assigned. To fix this bug, ret is assigned with -EINVAL in these cases. Fixes: 9e99bfefdbce ("bonding: fix bond_neigh_init()") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7b8fc010 |
|
18-Jan-2021 |
Jarod Wilson <jarod@redhat.com> |
bonding: add a vlan+srcmac tx hashing option This comes from an end-user request, where they're running multiple VMs on hosts with bonded interfaces connected to some interest switch topologies, where 802.3ad isn't an option. They're currently running a proprietary solution that effectively achieves load-balancing of VMs and bandwidth utilization improvements with a similar form of transmission algorithm. Basically, each VM has it's own vlan, so it always sends its traffic out the same interface, unless that interface fails. Traffic gets split between the interfaces, maintaining a consistent path, with failover still available if an interface goes down. Unlike bond_eth_hash(), this hash function is using the full source MAC address instead of just the last byte, as there are so few components to the hash, and in the no-vlan case, we would be returning just the last byte of the source MAC as the hash value. It's entirely possible to have two NICs in a bond with the same last byte of their MAC, but not the same MAC, so this adjustment should guarantee distinct hashes in all cases. This has been rudimetarily tested to provide similar results to the proprietary solution it is aiming to replace. A patch for iproute2 is also posted, to properly support the new mode there as well. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Thomas Davis <tadavis@lbl.gov> Signed-off-by: Jarod Wilson <jarod@redhat.com> Link: https://lore.kernel.org/r/20210119010927.1191922-1-jarod@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
89df6a81 |
|
17-Jan-2021 |
Tariq Toukan <tariqt@nvidia.com> |
net/bonding: Implement TLS TX device offload Implement TLS TX device offload for bonding interfaces. This allows kTLS sockets running on a bond to benefit from the device offload on capable lower devices. To allow a simple and fast maintenance of the TLS context in SW and lower devices, we bind the TLS socket to a specific lower dev. To achieve a behavior similar to SW kTLS, we support only balance-xor and 802.3ad modes, with xmit_hash_policy=layer3+4. This is enforced in bond_sk_check(), done in a previous patch. For the above configuration, the SW implementation keeps picking the same exact lower dev for all the socket's SKBs. The device offload behaves similarly, making the decision once at the connection creation. Per socket, the TLS module should work directly with the lowest netdev in chain, to call the tls_dev_ops operations. As the bond interface is being bypassed by the TLS module, interacting directly against the lower devs, there is no way for the bond interface to disable its device offload capabilities, as long as the mode/policy config allows it. Hence, the feature flag is not directly controllable, but just reflects the current offload status based on the logic under bond_sk_check(). Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
007feb87 |
|
17-Jan-2021 |
Tariq Toukan <tariqt@nvidia.com> |
net/bonding: Implement ndo_sk_get_lower_dev Add ndo_sk_get_lower_dev() implementation for bond interfaces. Support only for the cases where the socket's and SKBs' hash yields identical value for the whole connection lifetime. Here we restrict it to L3+4 sockets only, with xmit_hash_policy==LAYER34 and bond modes xor/802.3ad. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
5b998545 |
|
17-Jan-2021 |
Tariq Toukan <tariqt@nvidia.com> |
net/bonding: Take IP hash logic into a helper Hash logic on L3 will be used in a downstream patch for one more use case. Take it to a function for a better code reuse. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
32d4c564 |
|
13-Jan-2021 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: bonding: Notify ports about their initial state When creating a static bond (e.g. balance-xor), all ports will always be enabled. This is set, and the corresponding notification is sent out, before the port is linked to the bond upper. In the offloaded case, this ordering is hard to deal with. The lower will first see a notification that it can not associate with any bond. Then the bond is joined. After that point no more notifications are sent, so all ports remain disabled. This change simply sends an extra notification once the port has been linked to the upper to synchronize the initial state. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
d241b382 |
|
04-Dec-2020 |
Jarod Wilson <jarod@redhat.com> |
bonding: set xfrm feature flags more sanely We can remove one of the ifdef blocks here, and instead of setting both the xfrm hw_features and features flags, then unsetting the feature flags if not in AB, wait to set the features flags if we're actually in AB mode. Signed-off-by: Jarod Wilson <jarod@redhat.com> Link: https://lore.kernel.org/r/20201205174003.578267-1-jarod@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
b9ad3e9f |
|
20-Nov-2020 |
Jamie Iles <jamie@nuviainc.com> |
bonding: wait for sysfs kobject destruction before freeing struct slave syzkaller found that with CONFIG_DEBUG_KOBJECT_RELEASE=y, releasing a struct slave device could result in the following splat: kobject: 'bonding_slave' (00000000cecdd4fe): kobject_release, parent 0000000074ceb2b2 (delayed 1000) bond0 (unregistering): (slave bond_slave_1): Releasing backup interface ------------[ cut here ]------------ ODEBUG: free active (active state 0) object type: timer_list hint: workqueue_select_cpu_near kernel/workqueue.c:1549 [inline] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x98 kernel/workqueue.c:1600 WARNING: CPU: 1 PID: 842 at lib/debugobjects.c:485 debug_print_object+0x180/0x240 lib/debugobjects.c:485 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 842 Comm: kworker/u4:4 Tainted: G S 5.9.0-rc8+ #96 Hardware name: linux,dummy-virt (DT) Workqueue: netns cleanup_net Call trace: dump_backtrace+0x0/0x4d8 include/linux/bitmap.h:239 show_stack+0x34/0x48 arch/arm64/kernel/traps.c:142 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x174/0x1f8 lib/dump_stack.c:118 panic+0x360/0x7a0 kernel/panic.c:231 __warn+0x244/0x2ec kernel/panic.c:600 report_bug+0x240/0x398 lib/bug.c:198 bug_handler+0x50/0xc0 arch/arm64/kernel/traps.c:974 call_break_hook+0x160/0x1d8 arch/arm64/kernel/debug-monitors.c:322 brk_handler+0x30/0xc0 arch/arm64/kernel/debug-monitors.c:329 do_debug_exception+0x184/0x340 arch/arm64/mm/fault.c:864 el1_dbg+0x48/0xb0 arch/arm64/kernel/entry-common.c:65 el1_sync_handler+0x170/0x1c8 arch/arm64/kernel/entry-common.c:93 el1_sync+0x80/0x100 arch/arm64/kernel/entry.S:594 debug_print_object+0x180/0x240 lib/debugobjects.c:485 __debug_check_no_obj_freed lib/debugobjects.c:967 [inline] debug_check_no_obj_freed+0x200/0x430 lib/debugobjects.c:998 slab_free_hook mm/slub.c:1536 [inline] slab_free_freelist_hook+0x190/0x210 mm/slub.c:1577 slab_free mm/slub.c:3138 [inline] kfree+0x13c/0x460 mm/slub.c:4119 bond_free_slave+0x8c/0xf8 drivers/net/bonding/bond_main.c:1492 __bond_release_one+0xe0c/0xec8 drivers/net/bonding/bond_main.c:2190 bond_slave_netdev_event drivers/net/bonding/bond_main.c:3309 [inline] bond_netdev_event+0x8f0/0xa70 drivers/net/bonding/bond_main.c:3420 notifier_call_chain+0xf0/0x200 kernel/notifier.c:83 __raw_notifier_call_chain kernel/notifier.c:361 [inline] raw_notifier_call_chain+0x44/0x58 kernel/notifier.c:368 call_netdevice_notifiers_info+0xbc/0x150 net/core/dev.c:2033 call_netdevice_notifiers_extack net/core/dev.c:2045 [inline] call_netdevice_notifiers net/core/dev.c:2059 [inline] rollback_registered_many+0x6a4/0xec0 net/core/dev.c:9347 unregister_netdevice_many.part.0+0x2c/0x1c0 net/core/dev.c:10509 unregister_netdevice_many net/core/dev.c:10508 [inline] default_device_exit_batch+0x294/0x338 net/core/dev.c:10992 ops_exit_list.isra.0+0xec/0x150 net/core/net_namespace.c:189 cleanup_net+0x44c/0x888 net/core/net_namespace.c:603 process_one_work+0x96c/0x18c0 kernel/workqueue.c:2269 worker_thread+0x3f0/0xc30 kernel/workqueue.c:2415 kthread+0x390/0x498 kernel/kthread.c:292 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:925 This is a potential use-after-free if the sysfs nodes are being accessed whilst removing the struct slave, so wait for the object destruction to complete before freeing the struct slave itself. Fixes: 07699f9a7c8d ("bonding: add sysfs /slave dir for bond slave devices.") Fixes: a068aab42258 ("bonding: Fix reference count leak in bond_sysfs_slave_add.") Cc: Qiushi Wu <wu000273@umn.edu> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jamie Iles <jamie@nuviainc.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20201120142827.879226-1-jamie@nuviainc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ecb8fed4 |
|
01-Nov-2020 |
Alexander Lobakin <alobakin@pm.me> |
net: bonding, dummy, ifb, team: advertise NETIF_F_GSO_SOFTWARE Virtual netdevs should use NETIF_F_GSO_SOFTWARE to forward GSO skbs as-is and let the final drivers deal with them when supported. Also remove NETIF_F_GSO_UDP_L4 from bonding and team drivers as it's now included in the "software" list. Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
eff74233 |
|
25-Sep-2020 |
Taehee Yoo <ap420073@gmail.com> |
net: core: introduce struct netdev_nested_priv for nested interface infrastructure Functions related to nested interface infrastructure such as netdev_walk_all_{ upper | lower }_dev() pass both private functions and "data" pointer to handle their own things. At this point, the data pointer type is void *. In order to make it easier to expand common variables and functions, this new netdev_nested_priv structure is added. In the following patch, a new member variable will be added into this struct to fix the lockdep issue. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f32f1933 |
|
25-Sep-2020 |
Eric Dumazet <edumazet@google.com> |
bonding: set dev->needed_headroom in bond_setup_by_slave() syzbot managed to crash a host by creating a bond with a GRE device. For non Ethernet device, bonding calls bond_setup_by_slave() instead of ether_setup(), and unfortunately dev->needed_headroom was not copied from the new added member. [ 171.243095] skbuff: skb_under_panic: text:ffffffffa184b9ea len:116 put:20 head:ffff883f84012dc0 data:ffff883f84012dbc tail:0x70 end:0xd00 dev:bond0 [ 171.243111] ------------[ cut here ]------------ [ 171.243112] kernel BUG at net/core/skbuff.c:112! [ 171.243117] invalid opcode: 0000 [#1] SMP KASAN PTI [ 171.243469] gsmi: Log Shutdown Reason 0x03 [ 171.243505] Call Trace: [ 171.243506] <IRQ> [ 171.243512] [<ffffffffa171be59>] skb_push+0x49/0x50 [ 171.243516] [<ffffffffa184b9ea>] ipgre_header+0x2a/0xf0 [ 171.243520] [<ffffffffa17452d7>] neigh_connected_output+0xb7/0x100 [ 171.243524] [<ffffffffa186f1d3>] ip6_finish_output2+0x383/0x490 [ 171.243528] [<ffffffffa186ede2>] __ip6_finish_output+0xa2/0x110 [ 171.243531] [<ffffffffa186acbc>] ip6_finish_output+0x2c/0xa0 [ 171.243534] [<ffffffffa186abe9>] ip6_output+0x69/0x110 [ 171.243537] [<ffffffffa186ac90>] ? ip6_output+0x110/0x110 [ 171.243541] [<ffffffffa189d952>] mld_sendpack+0x1b2/0x2d0 [ 171.243544] [<ffffffffa189d290>] ? mld_send_report+0xf0/0xf0 [ 171.243548] [<ffffffffa189c797>] mld_ifc_timer_expire+0x2d7/0x3b0 [ 171.243551] [<ffffffffa189c4c0>] ? mld_gq_timer_expire+0x50/0x50 [ 171.243556] [<ffffffffa0fea270>] call_timer_fn+0x30/0x130 [ 171.243559] [<ffffffffa0fea17c>] expire_timers+0x4c/0x110 [ 171.243563] [<ffffffffa0fea0e3>] __run_timers+0x213/0x260 [ 171.243566] [<ffffffffa0fecb7d>] ? ktime_get+0x3d/0xa0 [ 171.243570] [<ffffffffa0ff9c4e>] ? clockevents_program_event+0x7e/0xe0 [ 171.243574] [<ffffffffa0f7e5d5>] ? sched_clock_cpu+0x15/0x190 [ 171.243577] [<ffffffffa0fe973d>] run_timer_softirq+0x1d/0x40 [ 171.243581] [<ffffffffa1c00152>] __do_softirq+0x152/0x2f0 [ 171.243585] [<ffffffffa0f44e1f>] irq_exit+0x9f/0xb0 [ 171.243588] [<ffffffffa1a02e1d>] smp_apic_timer_interrupt+0xfd/0x1a0 [ 171.243591] [<ffffffffa1a01ea6>] apic_timer_interrupt+0x86/0x90 Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df561f66 |
|
23-Aug-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
#
0410d071 |
|
16-Aug-2020 |
Jiri Wiesner <jwiesner@suse.com> |
bonding: fix active-backup failover for current ARP slave When the ARP monitor is used for link detection, ARP replies are validated for all slaves (arp_validate=3) and fail_over_mac is set to active, two slaves of an active-backup bond may get stuck in a state where both of them are active and pass packets that they receive to the bond. This state makes IPv6 duplicate address detection fail. The state is reached thus: 1. The current active slave goes down because the ARP target is not reachable. 2. The current ARP slave is chosen and made active. 3. A new slave is enslaved. This new slave becomes the current active slave and can reach the ARP target. As a result, the current ARP slave stays active after the enslave action has finished and the log is littered with "PROBE BAD" messages: > bond0: PROBE: c_arp ens10 && cas ens11 BAD The workaround is to remove the slave with "going back" status from the bond and re-enslave it. This issue was encountered when DPDK PMD interfaces were being enslaved to an active-backup bond. I would be possible to fix the issue in bond_enslave() or bond_change_active_slave() but the ARP monitor was fixed instead to keep most of the actions changing the current ARP slave in the ARP monitor code. The current ARP slave is set as inactive and backup during the commit phase. A new state, BOND_LINK_FAIL, has been introduced for slaves in the context of the ARP monitor. This allows administrators to see how slaves are rotated for sending ARP requests and attempts are made to find a new active slave. Fixes: b2220cad583c9 ("bonding: refactor ARP active-backup monitor") Signed-off-by: Jiri Wiesner <jwiesner@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
83270702 |
|
14-Aug-2020 |
Cong Wang <xiyou.wangcong@gmail.com> |
bonding: fix a potential double-unregister When we tear down a network namespace, we unregister all the netdevices within it. So we may queue a slave device and a bonding device together in the same unregister queue. If the only slave device is non-ethernet, it would automatically unregister the bonding device as well. Thus, we may end up unregistering the bonding device twice. Workaround this special case by checking reg_state. Fixes: 9b5e383c11b0 ("net: Introduce unregister_netdevice_many()") Reported-by: syzbot+af23e7f3e0a7e10c8b67@syzkaller.appspotmail.com Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
45a1553b |
|
13-Aug-2020 |
Lee Jones <lee.jones@linaro.org> |
net: bonding: bond_main: Document 'proto' and rename 'new_active' parameters Fixes the following W=1 kernel build warning(s): drivers/net/bonding/bond_main.c:329: warning: Function parameter or member 'proto' not described in 'bond_vlan_rx_add_vid' drivers/net/bonding/bond_main.c:362: warning: Function parameter or member 'proto' not described in 'bond_vlan_rx_kill_vid' drivers/net/bonding/bond_main.c:964: warning: Function parameter or member 'new_active' not described in 'bond_change_active_slave' drivers/net/bonding/bond_main.c:964: warning: Excess function parameter 'new' description in 'bond_change_active_slave' Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Thomas Davis <tadavis@lbl.gov> Cc: netdev@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4ca0d9ac |
|
13-Aug-2020 |
Jarod Wilson <jarod@redhat.com> |
bonding: show saner speed for broadcast mode Broadcast mode bonds transmit a copy of all traffic simultaneously out of all interfaces, so the "speed" of the bond isn't really the aggregate of all interfaces, but rather, the speed of the slowest active interface. Also, the type of the speed field is u32, not unsigned long, so adjust that accordingly, as required to make min() function here without complaining about mismatching types. Fixes: bb5b052f751b ("bond: add support to read speed and duplex via ethtool") CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
544f287b |
|
18-Jul-2020 |
Taehee Yoo <ap420073@gmail.com> |
bonding: check error value of register_netdevice() immediately If register_netdevice() is failed, net_device should not be used because variables are uninitialized or freed. So, the routine should be stopped immediately. But, bond_create() doesn't check return value of register_netdevice() immediately. That will result in a panic because of using uninitialized or freed memory. Test commands: modprobe netdev-notifier-error-inject echo -22 > /sys/kernel/debug/notifier-error-inject/netdev/\ actions/NETDEV_REGISTER/error modprobe bonding max_bonds=3 Splat looks like: [ 375.028492][ T193] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [ 375.033207][ T193] CPU: 2 PID: 193 Comm: kworker/2:2 Not tainted 5.8.0-rc4+ #645 [ 375.036068][ T193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 375.039673][ T193] Workqueue: events linkwatch_event [ 375.041557][ T193] RIP: 0010:dev_activate+0x4a/0x340 [ 375.043381][ T193] Code: 40 a8 04 0f 85 db 00 00 00 8b 83 08 04 00 00 85 c0 0f 84 0d 01 00 00 31 d2 89 d0 48 8d 04 40 48 c1 e0 07 48 03 83 00 04 00 00 <48> 8b 48 10 f6 41 10 01 75 08 f0 80 a1 a0 01 00 00 fd 48 89 48 08 [ 375.050267][ T193] RSP: 0018:ffff9f8facfcfdd8 EFLAGS: 00010202 [ 375.052410][ T193] RAX: 6b6b6b6b6b6b6b6b RBX: ffff9f8fae6ea000 RCX: 0000000000000006 [ 375.055178][ T193] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9f8fae6ea000 [ 375.057762][ T193] RBP: ffff9f8fae6ea000 R08: 0000000000000000 R09: 0000000000000000 [ 375.059810][ T193] R10: 0000000000000001 R11: 0000000000000000 R12: ffff9f8facfcfe08 [ 375.061892][ T193] R13: ffffffff883587e0 R14: 0000000000000000 R15: ffff9f8fae6ea580 [ 375.063931][ T193] FS: 0000000000000000(0000) GS:ffff9f8fbae00000(0000) knlGS:0000000000000000 [ 375.066239][ T193] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 375.067841][ T193] CR2: 00007f2f542167a0 CR3: 000000012cee6002 CR4: 00000000003606e0 [ 375.069657][ T193] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 375.071471][ T193] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 375.073269][ T193] Call Trace: [ 375.074005][ T193] linkwatch_do_dev+0x4d/0x50 [ 375.075052][ T193] __linkwatch_run_queue+0x10b/0x200 [ 375.076244][ T193] linkwatch_event+0x21/0x30 [ 375.077274][ T193] process_one_work+0x252/0x600 [ 375.078379][ T193] ? process_one_work+0x600/0x600 [ 375.079518][ T193] worker_thread+0x3c/0x380 [ 375.080534][ T193] ? process_one_work+0x600/0x600 [ 375.081668][ T193] kthread+0x139/0x150 [ 375.082567][ T193] ? kthread_park+0x90/0x90 [ 375.083567][ T193] ret_from_fork+0x22/0x30 Fixes: e826eafa65c6 ("bonding: Call netif_carrier_off after register_netdevice") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f548a476 |
|
08-Jul-2020 |
Jarod Wilson <jarod@redhat.com> |
bonding: don't need RTNL for ipsec helpers The bond_ipsec_* helpers don't need RTNL, and can potentially get called without it being held, so switch from rtnl_dereference() to rcu_dereference() to access bond struct data. Lightly tested with xfrm bonding, no problems found, should address the syzkaller bug referenced below. Reported-by: syzbot+582c98032903dcc04816@syzkaller.appspotmail.com CC: Huy Nguyen <huyn@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5cd24cbe |
|
08-Jul-2020 |
Jarod Wilson <jarod@redhat.com> |
bonding: deal with xfrm state in all modes and add more error-checking It's possible that device removal happens when the bond is in non-AB mode, and addition happens in AB mode, so bond_ipsec_del_sa() never gets called, which leaves security associations in an odd state if bond_ipsec_add_sa() then gets called after switching the bond into AB. Just call add and delete universally for all modes to keep things consistent. However, it's also possible that this code gets called when the system is shutting down, and the xfrm subsystem has already been disconnected from the bond device, so we need to do some error-checking and bail, lest we hit a null ptr deref. Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") CC: Huy Nguyen <huyn@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a3b658cf |
|
30-Jun-2020 |
Jarod Wilson <jarod@redhat.com> |
bonding: allow xfrm offload setup post-module-load At the moment, bonding xfrm crypto offload can only be set up if the bonding module is loaded with active-backup mode already set. We need to be able to make this work with bonds set to AB after the bonding driver has already been loaded. So what's done here is: 1) move #define BOND_XFRM_FEATURES to net/bonding.h so it can be used by both bond_main.c and bond_options.c 2) set BOND_XFRM_FEATURES in bond_dev->hw_features universally, rather than only when loading in AB mode 3) wire up xfrmdev_ops universally too 4) disable BOND_XFRM_FEATURES in bond_dev->features if not AB 5) exit early (non-AB case) from bond_ipsec_offload_ok, to prevent a performance hit from traversing into the underlying drivers 5) toggle BOND_XFRM_FEATURES in bond_dev->wanted_features and call netdev_change_features() from bond_option_mode_set() In my local testing, I can change bonding modes back and forth on the fly, have hardware offload work when I'm in AB, and see no performance penalty to non-AB software encryption, despite having xfrm bits all wired up for all modes now. Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Reported-by: Huy Nguyen <huyn@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
18c955b7 |
|
25-Jun-2020 |
Nathan Chancellor <nathan@kernel.org> |
bonding: Remove extraneous parentheses in bond_setup Clang warns: drivers/net/bonding/bond_main.c:4657:23: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] if ((BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)) ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/bonding/bond_main.c:4681:23: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] if ((BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)) ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ This warning occurs when a comparision has two sets of parentheses, which is usually the convention for doing an assignment within an if statement. Since equality comparisons do not need a second set of parentheses, remove them to fix the warning. Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Link: https://github.com/ClangBuiltLinux/linux/issues/1066 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: kernelci.org bot <bot@kernelci.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bdfd2d1f |
|
23-Jun-2020 |
Jarod Wilson <jarod@redhat.com> |
bonding/xfrm: use real_dev instead of slave_dev Rather than requiring every hw crypto capable NIC driver to do a check for slave_dev being set, set real_dev in the xfrm layer and xso init time, and then override it in the bonding driver as needed. Then NIC drivers can always use real_dev, and at the same time, we eliminate the use of a variable name that probably shouldn't have been used in the first place, particularly given recent current events. CC: Boris Pismenny <borisp@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Leon Romanovsky <leon@kernel.org> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org Suggested-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
18cb261a |
|
19-Jun-2020 |
Jarod Wilson <jarod@redhat.com> |
bonding: support hardware encryption offload to slaves Currently, this support is limited to active-backup mode, as I'm not sure about the feasilibity of mapping an xfrm_state's offload handle to multiple hardware devices simultaneously, and we rely on being able to pass some hints to both the xfrm and NIC driver about whether or not they're operating on a slave device. I've tested this atop an Intel x520 device (ixgbe) using libreswan in transport mode, succesfully achieving ~4.3Gbps throughput with netperf (more or less identical to throughput on a bare NIC in this system), as well as successful failover and recovery mid-netperf. v2: just use CONFIG_XFRM_OFFLOAD for wrapping, isolate more code with it CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
845e0ebb |
|
08-Jun-2020 |
Cong Wang <xiyou.wangcong@gmail.com> |
net: change addr_list_lock back to static key The dynamic key update for addr_list_lock still causes troubles, for example the following race condition still exists: CPU 0: CPU 1: (RCU read lock) (RTNL lock) dev_mc_seq_show() netdev_update_lockdep_key() -> lockdep_unregister_key() -> netif_addr_lock_bh() because lockdep doesn't provide an API to update it atomically. Therefore, we have to move it back to static keys and use subclass for nest locking like before. In commit 1a33e10e4a95 ("net: partially revert dynamic lockdep key changes"), I already reverted most parts of commit ab92d68fc22f ("net: core: add generic lockdep keys"). This patch reverts the rest and also part of commit f3b0a18bb6cb ("net: remove unnecessary variables and callback"). After this patch, addr_list_lock changes back to using static keys and subclasses to satisfy lockdep. Thanks to dev->lower_level, we do not have to change back to ->ndo_get_lock_subclass(). And hopefully this reduces some syzbot lockdep noises too. Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com Cc: Taehee Yoo <ap420073@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae46f184 |
|
07-May-2020 |
Eric Dumazet <edumazet@google.com> |
bonding: propagate transmit status Currently, bonding always returns NETDEV_TX_OK to its caller. It is worth trying to be more accurate : TCP for instance can have different recovery strategies if it can have more precise status, if packet was dropped by slave qdisc. This is especially important when host is under stress. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a33e10e |
|
02-May-2020 |
Cong Wang <xiyou.wangcong@gmail.com> |
net: partially revert dynamic lockdep key changes This patch reverts the folowing commits: commit 064ff66e2bef84f1153087612032b5b9eab005bd "bonding: add missing netdev_update_lockdep_key()" commit 53d374979ef147ab51f5d632dfe20b14aebeccd0 "net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()" commit 1f26c0d3d24125992ab0026b0dab16c08df947c7 "net: fix kernel-doc warning in <linux/netdevice.h>" commit ab92d68fc22f9afab480153bd82a20f6e2533769 "net: core: add generic lockdep keys" but keeps the addr_list_lock_key because we still lock addr_list_lock nestedly on stack devices, unlikely xmit_lock this is safe because we don't take addr_list_lock on any fast path. Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e7511f56 |
|
02-May-2020 |
Cong Wang <xiyou.wangcong@gmail.com> |
bonding: remove useless stats_lock_key After commit b3e80d44f5b1 ("bonding: fix lockdep warning in bond_get_stats()") the dynamic key is no longer necessary, as we compute nest level at run-time. So, we can just remove it to save some lockdep key entries. Test commands: ip link add bond0 type bond ip link add bond1 type bond ip link set bond0 master bond1 ip link set bond0 nomaster ip link set bond1 master bond0 Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com Cc: Dmitry Vyukov <dvyukov@google.com> Acked-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
33720aaf |
|
30-Apr-2020 |
Maor Gottlieb <maorg@mellanox.com> |
bonding: Implement ndo_get_xmit_slave Add implementation of ndo_get_xmit_slave. Find the slave by using the helper function according to the bond mode. If the caller set all_slaves to true, then it assumes that all slaves are available to transmit. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6b447e76 |
|
30-Apr-2020 |
Maor Gottlieb <maorg@mellanox.com> |
bonding: Add array of all slaves Keep all slaves in array so it could be used to get the xmit slave assume all the slaves are active. The logic to add slave to the array is like the usable slaves, except that we also add slaves that currently can't transmit - not up or active. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5a19f1c1 |
|
30-Apr-2020 |
Maor Gottlieb <maorg@mellanox.com> |
bonding: Add function to get the xmit slave in active-backup mode Add helper function to get the xmit slave in active-backup mode. It's only one line function that return the curr_active_slave, but it will used both in the xmit flow and by the new .ndo to get the xmit slave. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
29d5bbcc |
|
30-Apr-2020 |
Maor Gottlieb <maorg@mellanox.com> |
bonding: Add helper function to get the xmit slave in rr mode Add helper function to get the xmit slave when bond is in round robin mode. Change bond_xmit_slave_id to bond_get_slave_by_id, then the logic for find the next slave for transmit could be used both by the xmit flow and the .ndo to get the xmit slave. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c071d91d |
|
30-Apr-2020 |
Maor Gottlieb <maorg@mellanox.com> |
bonding: Add helper function to get the xmit slave based on hash Both xor and 802.3ad modes use bond_xmit_hash to get the xmit slave. Export the logic to helper function so it could be used in the following patches by the .ndo to get the xmit slave. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ed7d4f02 |
|
30-Apr-2020 |
Maor Gottlieb <maorg@mellanox.com> |
bonding: Rename slave_arr to usable_slaves Rename slave_arr to usable_slaves, since we will have two arrays, one for the usable slaves and the other to all slaves. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
119d48fd |
|
30-Apr-2020 |
Maor Gottlieb <maorg@mellanox.com> |
bonding: Export skip slave logic to function As a preparation for following change that add array of all slaves, extract code that skip slave to function. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2b526b56 |
|
24-Feb-2020 |
Leon Romanovsky <leon@kernel.org> |
net/bond: Delete driver and module versions The in-kernel code has already unique version, which is based on Linus's tag, update the bond driver to be consistent with that version. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e92a2d0 |
|
20-Feb-2020 |
Julian Wiedmann <jwi@linux.ibm.com> |
net: use netif_is_bridge_port() to check for IFF_BRIDGE_PORT Trivial cleanup, so that all bridge port-specific code can be found in one go. CC: Johannes Berg <johannes@sipsolutions.net> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3e80d44 |
|
15-Feb-2020 |
Taehee Yoo <ap420073@gmail.com> |
bonding: fix lockdep warning in bond_get_stats() In the "struct bonding", there is stats_lock. This lock protects "bond_stats" in the "struct bonding". bond_stats is updated in the bond_get_stats() and this function would be executed concurrently. So, the lock is needed. Bonding interfaces would be nested. So, either stats_lock should use dynamic lockdep class key or stats_lock should be used by spin_lock_nested(). In the current code, stats_lock is using a dynamic lockdep class key. But there is no updating stats_lock_key routine So, lockdep warning will occur. Test commands: ip link add bond0 type bond ip link add bond1 type bond ip link set bond0 master bond1 ip link set bond0 nomaster ip link set bond1 master bond0 Splat looks like: [ 38.420603][ T957] 5.5.0+ #394 Not tainted [ 38.421074][ T957] ------------------------------------------------------ [ 38.421837][ T957] ip/957 is trying to acquire lock: [ 38.422399][ T957] ffff888063262cd8 (&bond->stats_lock_key#2){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding] [ 38.423528][ T957] [ 38.423528][ T957] but task is already holding lock: [ 38.424526][ T957] ffff888065fd2cd8 (&bond->stats_lock_key){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding] [ 38.426075][ T957] [ 38.426075][ T957] which lock already depends on the new lock. [ 38.426075][ T957] [ 38.428536][ T957] [ 38.428536][ T957] the existing dependency chain (in reverse order) is: [ 38.429475][ T957] [ 38.429475][ T957] -> #1 (&bond->stats_lock_key){+.+.}: [ 38.430273][ T957] _raw_spin_lock+0x30/0x70 [ 38.430812][ T957] bond_get_stats+0x90/0x4d0 [bonding] [ 38.431451][ T957] dev_get_stats+0x1ec/0x270 [ 38.432088][ T957] bond_get_stats+0x1a5/0x4d0 [bonding] [ 38.432767][ T957] dev_get_stats+0x1ec/0x270 [ 38.433322][ T957] rtnl_fill_stats+0x44/0xbe0 [ 38.433866][ T957] rtnl_fill_ifinfo+0xeb2/0x3720 [ 38.434474][ T957] rtmsg_ifinfo_build_skb+0xca/0x170 [ 38.435081][ T957] rtmsg_ifinfo_event.part.33+0x1b/0xb0 [ 38.436848][ T957] rtnetlink_event+0xcd/0x120 [ 38.437455][ T957] notifier_call_chain+0x90/0x160 [ 38.438067][ T957] netdev_change_features+0x74/0xa0 [ 38.438708][ T957] bond_compute_features.isra.45+0x4e6/0x6f0 [bonding] [ 38.439522][ T957] bond_enslave+0x3639/0x47b0 [bonding] [ 38.440225][ T957] do_setlink+0xaab/0x2ef0 [ 38.440786][ T957] __rtnl_newlink+0x9c5/0x1270 [ 38.441463][ T957] rtnl_newlink+0x65/0x90 [ 38.442075][ T957] rtnetlink_rcv_msg+0x4a8/0x890 [ 38.442774][ T957] netlink_rcv_skb+0x121/0x350 [ 38.443451][ T957] netlink_unicast+0x42e/0x610 [ 38.444282][ T957] netlink_sendmsg+0x65a/0xb90 [ 38.444992][ T957] ____sys_sendmsg+0x5ce/0x7a0 [ 38.445679][ T957] ___sys_sendmsg+0x10f/0x1b0 [ 38.446365][ T957] __sys_sendmsg+0xc6/0x150 [ 38.447007][ T957] do_syscall_64+0x99/0x4f0 [ 38.447668][ T957] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 38.448538][ T957] [ 38.448538][ T957] -> #0 (&bond->stats_lock_key#2){+.+.}: [ 38.449554][ T957] __lock_acquire+0x2d8d/0x3de0 [ 38.450148][ T957] lock_acquire+0x164/0x3b0 [ 38.450711][ T957] _raw_spin_lock+0x30/0x70 [ 38.451292][ T957] bond_get_stats+0x90/0x4d0 [bonding] [ 38.451950][ T957] dev_get_stats+0x1ec/0x270 [ 38.452425][ T957] bond_get_stats+0x1a5/0x4d0 [bonding] [ 38.453362][ T957] dev_get_stats+0x1ec/0x270 [ 38.453825][ T957] rtnl_fill_stats+0x44/0xbe0 [ 38.454390][ T957] rtnl_fill_ifinfo+0xeb2/0x3720 [ 38.456257][ T957] rtmsg_ifinfo_build_skb+0xca/0x170 [ 38.456998][ T957] rtmsg_ifinfo_event.part.33+0x1b/0xb0 [ 38.459351][ T957] rtnetlink_event+0xcd/0x120 [ 38.460086][ T957] notifier_call_chain+0x90/0x160 [ 38.460829][ T957] netdev_change_features+0x74/0xa0 [ 38.461752][ T957] bond_compute_features.isra.45+0x4e6/0x6f0 [bonding] [ 38.462705][ T957] bond_enslave+0x3639/0x47b0 [bonding] [ 38.463476][ T957] do_setlink+0xaab/0x2ef0 [ 38.464141][ T957] __rtnl_newlink+0x9c5/0x1270 [ 38.464897][ T957] rtnl_newlink+0x65/0x90 [ 38.465522][ T957] rtnetlink_rcv_msg+0x4a8/0x890 [ 38.466215][ T957] netlink_rcv_skb+0x121/0x350 [ 38.466895][ T957] netlink_unicast+0x42e/0x610 [ 38.467583][ T957] netlink_sendmsg+0x65a/0xb90 [ 38.468285][ T957] ____sys_sendmsg+0x5ce/0x7a0 [ 38.469202][ T957] ___sys_sendmsg+0x10f/0x1b0 [ 38.469884][ T957] __sys_sendmsg+0xc6/0x150 [ 38.470587][ T957] do_syscall_64+0x99/0x4f0 [ 38.471245][ T957] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 38.472093][ T957] [ 38.472093][ T957] other info that might help us debug this: [ 38.472093][ T957] [ 38.473438][ T957] Possible unsafe locking scenario: [ 38.473438][ T957] [ 38.474898][ T957] CPU0 CPU1 [ 38.476234][ T957] ---- ---- [ 38.480171][ T957] lock(&bond->stats_lock_key); [ 38.480808][ T957] lock(&bond->stats_lock_key#2); [ 38.481791][ T957] lock(&bond->stats_lock_key); [ 38.482754][ T957] lock(&bond->stats_lock_key#2); [ 38.483416][ T957] [ 38.483416][ T957] *** DEADLOCK *** [ 38.483416][ T957] [ 38.484505][ T957] 3 locks held by ip/957: [ 38.485048][ T957] #0: ffffffffbccf6230 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x457/0x890 [ 38.486198][ T957] #1: ffff888065fd2cd8 (&bond->stats_lock_key){+.+.}, at: bond_get_stats+0x90/0x4d0 [bonding] [ 38.487625][ T957] #2: ffffffffbc9254c0 (rcu_read_lock){....}, at: bond_get_stats+0x5/0x4d0 [bonding] [ 38.488897][ T957] [ 38.488897][ T957] stack backtrace: [ 38.489646][ T957] CPU: 1 PID: 957 Comm: ip Not tainted 5.5.0+ #394 [ 38.490497][ T957] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 38.492810][ T957] Call Trace: [ 38.493219][ T957] dump_stack+0x96/0xdb [ 38.493709][ T957] check_noncircular+0x371/0x450 [ 38.494344][ T957] ? lookup_address+0x60/0x60 [ 38.494923][ T957] ? print_circular_bug.isra.35+0x310/0x310 [ 38.495699][ T957] ? hlock_class+0x130/0x130 [ 38.496334][ T957] ? __lock_acquire+0x2d8d/0x3de0 [ 38.496979][ T957] __lock_acquire+0x2d8d/0x3de0 [ 38.497607][ T957] ? register_lock_class+0x14d0/0x14d0 [ 38.498333][ T957] ? check_chain_key+0x236/0x5d0 [ 38.499003][ T957] lock_acquire+0x164/0x3b0 [ 38.499800][ T957] ? bond_get_stats+0x90/0x4d0 [bonding] [ 38.500706][ T957] _raw_spin_lock+0x30/0x70 [ 38.501435][ T957] ? bond_get_stats+0x90/0x4d0 [bonding] [ 38.502311][ T957] bond_get_stats+0x90/0x4d0 [bonding] [ ... ] But, there is another problem. The dynamic lockdep class key is protected by RTNL, but bond_get_stats() would be called outside of RTNL. So, it would use an invalid dynamic lockdep class key. In order to fix this issue, stats_lock uses spin_lock_nested() instead of a dynamic lockdep key. The bond_get_stats() calls bond_get_lowest_level_rcu() to get the correct nest level value, which will be used by spin_lock_nested(). The "dev->lower_level" indicates lower nest level value, but this value is invalid outside of RTNL. So, bond_get_lowest_level_rcu() returns valid lower nest level value in the RCU critical section. bond_get_lowest_level_rcu() will be work only when LOCKDEP is enabled. Fixes: 089bca2caed0 ("bonding: use dynamic lockdep key instead of subclass") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
064ff66e |
|
15-Feb-2020 |
Taehee Yoo <ap420073@gmail.com> |
bonding: add missing netdev_update_lockdep_key() After bond_release(), netdev_update_lockdep_key() should be called. But both ioctl path and attribute path don't call netdev_update_lockdep_key(). This patch adds missing netdev_update_lockdep_key(). Test commands: ip link add bond0 type bond ip link add bond1 type bond ifenslave bond0 bond1 ifenslave -d bond0 bond1 ifenslave bond1 bond0 Splat looks like: [ 29.501182][ T1046] WARNING: possible circular locking dependency detected [ 29.501945][ T1039] hardirqs last disabled at (1962): [<ffffffffac6c807f>] handle_mm_fault+0x13f/0x700 [ 29.503442][ T1046] 5.5.0+ #322 Not tainted [ 29.503447][ T1046] ------------------------------------------------------ [ 29.504277][ T1039] softirqs last enabled at (1180): [<ffffffffade00678>] __do_softirq+0x678/0x981 [ 29.505443][ T1046] ifenslave/1046 is trying to acquire lock: [ 29.505886][ T1039] softirqs last disabled at (1169): [<ffffffffac19c18a>] irq_exit+0x17a/0x1a0 [ 29.509997][ T1046] ffff88805d5da280 (&dev->addr_list_lock_key#3){+...}, at: dev_mc_sync_multiple+0x95/0x120 [ 29.511243][ T1046] [ 29.511243][ T1046] but task is already holding lock: [ 29.512192][ T1046] ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding] [ 29.514124][ T1046] [ 29.514124][ T1046] which lock already depends on the new lock. [ 29.514124][ T1046] [ 29.517297][ T1046] [ 29.517297][ T1046] the existing dependency chain (in reverse order) is: [ 29.518231][ T1046] [ 29.518231][ T1046] -> #1 (&dev->addr_list_lock_key#4){+...}: [ 29.519076][ T1046] _raw_spin_lock+0x30/0x70 [ 29.519588][ T1046] dev_mc_sync_multiple+0x95/0x120 [ 29.520208][ T1046] bond_enslave+0x448d/0x47b0 [bonding] [ 29.520862][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding] [ 29.521640][ T1046] __bond_opt_set+0x1ff/0xbb0 [bonding] [ 29.522438][ T1046] __bond_opt_set_notify+0x2b/0xf0 [bonding] [ 29.523251][ T1046] bond_opt_tryset_rtnl+0x92/0xf0 [bonding] [ 29.524082][ T1046] bonding_sysfs_store_option+0x8a/0xf0 [bonding] [ 29.524959][ T1046] kernfs_fop_write+0x276/0x410 [ 29.525620][ T1046] vfs_write+0x197/0x4a0 [ 29.526218][ T1046] ksys_write+0x141/0x1d0 [ 29.526818][ T1046] do_syscall_64+0x99/0x4f0 [ 29.527430][ T1046] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 29.528265][ T1046] [ 29.528265][ T1046] -> #0 (&dev->addr_list_lock_key#3){+...}: [ 29.529272][ T1046] __lock_acquire+0x2d8d/0x3de0 [ 29.529935][ T1046] lock_acquire+0x164/0x3b0 [ 29.530638][ T1046] _raw_spin_lock+0x30/0x70 [ 29.531187][ T1046] dev_mc_sync_multiple+0x95/0x120 [ 29.531790][ T1046] bond_enslave+0x448d/0x47b0 [bonding] [ 29.532451][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding] [ 29.533163][ T1046] __bond_opt_set+0x1ff/0xbb0 [bonding] [ 29.533789][ T1046] __bond_opt_set_notify+0x2b/0xf0 [bonding] [ 29.534595][ T1046] bond_opt_tryset_rtnl+0x92/0xf0 [bonding] [ 29.535500][ T1046] bonding_sysfs_store_option+0x8a/0xf0 [bonding] [ 29.536379][ T1046] kernfs_fop_write+0x276/0x410 [ 29.537057][ T1046] vfs_write+0x197/0x4a0 [ 29.537640][ T1046] ksys_write+0x141/0x1d0 [ 29.538251][ T1046] do_syscall_64+0x99/0x4f0 [ 29.538870][ T1046] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 29.539659][ T1046] [ 29.539659][ T1046] other info that might help us debug this: [ 29.539659][ T1046] [ 29.540953][ T1046] Possible unsafe locking scenario: [ 29.540953][ T1046] [ 29.541883][ T1046] CPU0 CPU1 [ 29.542540][ T1046] ---- ---- [ 29.543209][ T1046] lock(&dev->addr_list_lock_key#4); [ 29.543880][ T1046] lock(&dev->addr_list_lock_key#3); [ 29.544873][ T1046] lock(&dev->addr_list_lock_key#4); [ 29.545863][ T1046] lock(&dev->addr_list_lock_key#3); [ 29.546525][ T1046] [ 29.546525][ T1046] *** DEADLOCK *** [ 29.546525][ T1046] [ 29.547542][ T1046] 5 locks held by ifenslave/1046: [ 29.548196][ T1046] #0: ffff88806044c478 (sb_writers#5){.+.+}, at: vfs_write+0x3bb/0x4a0 [ 29.549248][ T1046] #1: ffff88805af00890 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cf/0x410 [ 29.550343][ T1046] #2: ffff88805b8b54b0 (kn->count#157){.+.+}, at: kernfs_fop_write+0x1f2/0x410 [ 29.551575][ T1046] #3: ffffffffaecf4cf0 (rtnl_mutex){+.+.}, at: bond_opt_tryset_rtnl+0x5f/0xf0 [bonding] [ 29.552819][ T1046] #4: ffff8880460f2280 (&dev->addr_list_lock_key#4){+...}, at: bond_enslave+0x4482/0x47b0 [bonding] [ 29.554175][ T1046] [ 29.554175][ T1046] stack backtrace: [ 29.554907][ T1046] CPU: 0 PID: 1046 Comm: ifenslave Not tainted 5.5.0+ #322 [ 29.555854][ T1046] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 29.557064][ T1046] Call Trace: [ 29.557504][ T1046] dump_stack+0x96/0xdb [ 29.558054][ T1046] check_noncircular+0x371/0x450 [ 29.558723][ T1046] ? print_circular_bug.isra.35+0x310/0x310 [ 29.559486][ T1046] ? hlock_class+0x130/0x130 [ 29.560100][ T1046] ? __lock_acquire+0x2d8d/0x3de0 [ 29.560761][ T1046] __lock_acquire+0x2d8d/0x3de0 [ 29.561366][ T1046] ? register_lock_class+0x14d0/0x14d0 [ 29.562045][ T1046] ? find_held_lock+0x39/0x1d0 [ 29.562641][ T1046] lock_acquire+0x164/0x3b0 [ 29.563199][ T1046] ? dev_mc_sync_multiple+0x95/0x120 [ 29.563872][ T1046] _raw_spin_lock+0x30/0x70 [ 29.564464][ T1046] ? dev_mc_sync_multiple+0x95/0x120 [ 29.565146][ T1046] dev_mc_sync_multiple+0x95/0x120 [ 29.565793][ T1046] bond_enslave+0x448d/0x47b0 [bonding] [ 29.566487][ T1046] ? bond_update_slave_arr+0x940/0x940 [bonding] [ 29.567279][ T1046] ? bstr_printf+0xc20/0xc20 [ 29.567857][ T1046] ? stack_trace_consume_entry+0x160/0x160 [ 29.568614][ T1046] ? deactivate_slab.isra.77+0x2c5/0x800 [ 29.569320][ T1046] ? check_chain_key+0x236/0x5d0 [ 29.569939][ T1046] ? sscanf+0x93/0xc0 [ 29.570442][ T1046] ? vsscanf+0x1e20/0x1e20 [ 29.571003][ T1046] bond_option_slaves_set+0x1a3/0x370 [bonding] [ ... ] Fixes: ab92d68fc22f ("net: core: add generic lockdep keys") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d485ed8 |
|
06-Dec-2019 |
Mahesh Bandewar <maheshb@google.com> |
bonding: fix active-backup transition after link failure After the recent fix in commit 1899bb325149 ("bonding: fix state transition issue in link monitoring"), the active-backup mode with miimon initially come-up fine but after a link-failure, both members transition into backup state. Following steps to reproduce the scenario (eth1 and eth2 are the slaves of the bond): ip link set eth1 up ip link set eth2 down sleep 1 ip link set eth2 up ip link set eth1 down cat /sys/class/net/eth1/bonding_slave/state cat /sys/class/net/eth2/bonding_slave/state Fixes: 1899bb325149 ("bonding: fix state transition issue in link monitoring") CC: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
#
9e99bfef |
|
07-Dec-2019 |
Eric Dumazet <edumazet@google.com> |
bonding: fix bond_neigh_init() 1) syzbot reported an uninit-value in bond_neigh_setup() [1] bond_neigh_setup() uses a temporary on-stack 'struct neigh_parms parms', but only clears parms.neigh_setup field. A stacked bonding device would then enter bond_neigh_setup() and read garbage from parms->dev. If we get really unlucky and garbage is matching @dev, then we could recurse and eventually crash. Let's make sure the whole structure is cleared to avoid surprises. 2) bond_neigh_setup() can be called while another cpu manipulates the master device, removing or adding a slave. We need at least rcu protection to prevent use-after-free. Note: Prior code does not support a stack of bonding devices, this patch does not attempt to fix this, and leave a comment instead. [1] BUG: KMSAN: uninit-value in bond_neigh_setup+0xa4/0x110 drivers/net/bonding/bond_main.c:3655 CPU: 0 PID: 11256 Comm: syz-executor.0 Not tainted 5.4.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108 __msan_warning+0x57/0xa0 mm/kmsan/kmsan_instr.c:245 bond_neigh_setup+0xa4/0x110 drivers/net/bonding/bond_main.c:3655 bond_neigh_init+0x216/0x4b0 drivers/net/bonding/bond_main.c:3626 ___neigh_create+0x169e/0x2c40 net/core/neighbour.c:613 __neigh_create+0xbd/0xd0 net/core/neighbour.c:674 ip6_finish_output2+0x149a/0x2670 net/ipv6/ip6_output.c:113 __ip6_finish_output+0x83d/0x8f0 net/ipv6/ip6_output.c:142 ip6_finish_output+0x2db/0x420 net/ipv6/ip6_output.c:152 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip6_output+0x5d3/0x720 net/ipv6/ip6_output.c:175 dst_output include/net/dst.h:436 [inline] NF_HOOK include/linux/netfilter.h:305 [inline] mld_sendpack+0xebd/0x13d0 net/ipv6/mcast.c:1682 mld_send_cr net/ipv6/mcast.c:1978 [inline] mld_ifc_timer_expire+0x116b/0x1680 net/ipv6/mcast.c:2477 call_timer_fn+0x232/0x530 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers+0xd60/0x1270 kernel/time/timer.c:1773 run_timer_softirq+0x2d/0x50 kernel/time/timer.c:1786 __do_softirq+0x4a1/0x83a kernel/softirq.c:293 invoke_softirq kernel/softirq.c:375 [inline] irq_exit+0x230/0x280 kernel/softirq.c:416 exiting_irq+0xe/0x10 arch/x86/include/asm/apic.h:536 smp_apic_timer_interrupt+0x48/0x70 arch/x86/kernel/apic/apic.c:1138 apic_timer_interrupt+0x2e/0x40 arch/x86/entry/entry_64.S:835 </IRQ> RIP: 0010:kmsan_free_page+0x18d/0x1c0 mm/kmsan/kmsan_shadow.c:439 Code: 4c 89 ff 44 89 f6 e8 82 0d ee ff 65 ff 0d 9f 26 3b 60 65 8b 05 98 26 3b 60 85 c0 75 24 e8 5b f6 35 ff 4c 89 6d d0 ff 75 d0 9d <48> 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 0f 0b 0f 0b 0f RSP: 0018:ffffb328034af818 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000000 RBX: ffffe2d7471f8360 RCX: 0000000000000000 RDX: ffffffffadea7000 RSI: 0000000000000004 RDI: ffff93496fcda104 RBP: ffffb328034af850 R08: ffff934a47e86d00 R09: ffff93496fc41900 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000246 R14: 0000000000000000 R15: ffffe2d7472225c0 free_pages_prepare mm/page_alloc.c:1138 [inline] free_pcp_prepare mm/page_alloc.c:1230 [inline] free_unref_page_prepare+0x1d9/0x770 mm/page_alloc.c:3025 free_unref_page mm/page_alloc.c:3074 [inline] free_the_page mm/page_alloc.c:4832 [inline] __free_pages+0x154/0x230 mm/page_alloc.c:4840 __vunmap+0xdac/0xf20 mm/vmalloc.c:2277 __vfree mm/vmalloc.c:2325 [inline] vfree+0x7c/0x170 mm/vmalloc.c:2355 copy_entries_to_user net/ipv6/netfilter/ip6_tables.c:883 [inline] get_entries net/ipv6/netfilter/ip6_tables.c:1041 [inline] do_ip6t_get_ctl+0xfa4/0x1030 net/ipv6/netfilter/ip6_tables.c:1709 nf_sockopt net/netfilter/nf_sockopt.c:104 [inline] nf_getsockopt+0x481/0x4e0 net/netfilter/nf_sockopt.c:122 ipv6_getsockopt+0x264/0x510 net/ipv6/ipv6_sockglue.c:1400 tcp_getsockopt+0x1c6/0x1f0 net/ipv4/tcp.c:3688 sock_common_getsockopt+0x13f/0x180 net/core/sock.c:3110 __sys_getsockopt+0x533/0x7b0 net/socket.c:2129 __do_sys_getsockopt net/socket.c:2144 [inline] __se_sys_getsockopt+0xe1/0x100 net/socket.c:2141 __x64_sys_getsockopt+0x62/0x80 net/socket.c:2141 do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45d20a Code: b8 34 01 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 8d 8b fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 6a 8b fb ff c3 66 0f 1f 84 00 00 00 00 00 RSP: 002b:0000000000a6f618 EFLAGS: 00000212 ORIG_RAX: 0000000000000037 RAX: ffffffffffffffda RBX: 0000000000a6f640 RCX: 000000000045d20a RDX: 0000000000000041 RSI: 0000000000000029 RDI: 0000000000000003 RBP: 0000000000717cc0 R08: 0000000000a6f63c R09: 0000000000004000 R10: 0000000000a6f740 R11: 0000000000000212 R12: 0000000000000003 R13: 0000000000000000 R14: 0000000000000029 R15: 0000000000715b00 Local variable description: ----parms@bond_neigh_init Variable was created at: bond_neigh_init+0x8c/0x4b0 drivers/net/bonding/bond_main.c:3617 bond_neigh_init+0x8c/0x4b0 drivers/net/bonding/bond_main.c:3617 Fixes: 9918d5bf329d ("bonding: modify only neigh_parms owned by us") Fixes: 234bcf8a499e ("net/bonding: correctly proxy slave neigh param setup ndo function") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f394722f |
|
07-Dec-2019 |
Eric Dumazet <edumazet@google.com> |
neighbour: remove neigh_cleanup() method neigh_cleanup() has not been used for seven years, and was a wrong design. Messing with shared pointer in bond_neigh_init() without proper memory barriers would at least trigger syzbot complains eventually. It is time to remove this stuff. Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df98be06 |
|
14-Nov-2019 |
Matteo Croce <mcroce@redhat.com> |
bonding: symmetric ICMP transmit A bonding with layer2+3 or layer3+4 hashing uses the IP addresses and the ports to balance packets between slaves. With some network errors, we receive an ICMP error packet by the remote host or a router. If sent by a router, the source IP can differ from the remote host one. Additionally the ICMP protocol has no port numbers, so a layer3+4 bonding will get a different hash than the previous one. These two conditions could let the packet go through a different interface than the other packets of the same flow: # tcpdump -qltnni veth0 |sed 's/^/0: /' & # tcpdump -qltnni veth1 |sed 's/^/1: /' & # hping3 -2 192.168.0.2 -p 9 0: IP 192.168.0.1.2251 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 1: IP 192.168.0.1.2252 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 1: IP 192.168.0.1.2253 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 0: IP 192.168.0.1.2254 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 An ICMP error packet contains the header of the packet which caused the network error, so inspect it and match the flow against it, so we can send the ICMP via the same interface of the previous packet in the flow. Move the IP and port dissect code into a generic function bond_flow_ip() and if we are dissecting an ICMP error packet, call it again with the adjusted offset. # hping3 -2 192.168.0.2 -p 9 1: IP 192.168.0.1.1224 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 1: IP 192.168.0.1.1225 > 192.168.0.2.9: UDP, length 0 1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 0: IP 192.168.0.1.1226 > 192.168.0.2.9: UDP, length 0 0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 0: IP 192.168.0.1.1227 > 192.168.0.2.9: UDP, length 0 0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36 Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1899bb32 |
|
01-Nov-2019 |
Jay Vosburgh <jay.vosburgh@canonical.com> |
bonding: fix state transition issue in link monitoring Since de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring"), the bonding driver has utilized two separate variables to indicate the next link state a particular slave should transition to. Each is used to communicate to a different portion of the link state change commit logic; one to the bond_miimon_commit function itself, and another to the state transition logic. Unfortunately, the two variables can become unsynchronized, resulting in incorrect link state transitions within bonding. This can cause slaves to become stuck in an incorrect link state until a subsequent carrier state transition. The issue occurs when a special case in bond_slave_netdev_event sets slave->link directly to BOND_LINK_FAIL. On the next pass through bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL case will set the proposed next state (link_new_state) to BOND_LINK_UP, but the new_link to BOND_LINK_DOWN. The setting of the final link state from new_link comes after that from link_new_state, and so the slave will end up incorrectly in _DOWN state. Resolve this by combining the two variables into one. Reported-by: Aleksei Zakharov <zakharov.a.g@yandex.ru> Reported-by: Sha Zhang <zhangsha.zhang@huawei.com> Cc: Mahesh Bandewar <maheshb@google.com> Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
58deb77c |
|
29-Oct-2019 |
Matteo Croce <mcroce@redhat.com> |
bonding: balance ICMP echoes in layer3+4 mode The bonding uses the L4 ports to balance flows between slaves. As the ICMP protocol has no ports, those packets are sent all to the same device: # tcpdump -qltnni veth0 ip |sed 's/^/0: /' & # tcpdump -qltnni veth1 ip |sed 's/^/1: /' & # ping -qc1 192.168.0.2 1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 315, seq 1, length 64 1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 315, seq 1, length 64 # ping -qc1 192.168.0.2 1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 316, seq 1, length 64 1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 316, seq 1, length 64 # ping -qc1 192.168.0.2 1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 317, seq 1, length 64 1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 317, seq 1, length 64 But some ICMP packets have an Identifier field which is used to match packets within sessions, let's use this value in the hash function to balance these packets between bond slaves: # ping -qc1 192.168.0.2 0: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 303, seq 1, length 64 0: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 303, seq 1, length 64 # ping -qc1 192.168.0.2 1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 304, seq 1, length 64 1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 304, seq 1, length 64 Aso, let's use a flow_dissector_key which defines FLOW_DISSECTOR_KEY_ICMP, so we can balance pings encapsulated in a tunnel when using mode encap3+4: # ping -q 192.168.1.2 -c1 0: IP 192.168.0.1 > 192.168.0.2: GREv0, length 102: IP 192.168.1.1 > 192.168.1.2: ICMP echo request, id 585, seq 1, length 64 0: IP 192.168.0.2 > 192.168.0.1: GREv0, length 102: IP 192.168.1.2 > 192.168.1.1: ICMP echo reply, id 585, seq 1, length 64 # ping -q 192.168.1.2 -c1 1: IP 192.168.0.1 > 192.168.0.2: GREv0, length 102: IP 192.168.1.1 > 192.168.1.2: ICMP echo request, id 586, seq 1, length 64 1: IP 192.168.0.2 > 192.168.0.1: GREv0, length 102: IP 192.168.1.2 > 192.168.1.1: ICMP echo reply, id 586, seq 1, length 64 Signed-off-by: Matteo Croce <mcroce@redhat.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ad9bd8da |
|
29-Oct-2019 |
Taehee Yoo <ap420073@gmail.com> |
bonding: fix using uninitialized mode_lock When a bonding interface is being created, it setups its mode and options. At that moment, it uses mode_lock so mode_lock should be initialized before that moment. rtnl_newlink() rtnl_create_link() alloc_netdev_mqs() ->setup() //bond_setup() ->newlink //bond_newlink bond_changelink() register_netdevice() ->ndo_init() //bond_init() After commit 089bca2caed0 ("bonding: use dynamic lockdep key instead of subclass"), mode_lock is initialized in bond_init(). So in the bond_changelink(), un-initialized mode_lock can be used. mode_lock should be initialized in bond_setup(). This patch partially reverts commit 089bca2caed0 ("bonding: use dynamic lockdep key instead of subclass") Test command: ip link add bond0 type bond mode 802.3ad lacp_rate 0 Splat looks like: [ 60.615127] INFO: trying to register non-static key. [ 60.615900] the code is fine but needs lockdep annotation. [ 60.616697] turning off the locking correctness validator. [ 60.617490] CPU: 1 PID: 957 Comm: ip Not tainted 5.4.0-rc3+ #109 [ 60.618350] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 60.619481] Call Trace: [ 60.619918] dump_stack+0x7c/0xbb [ 60.620453] register_lock_class+0x1215/0x14d0 [ 60.621131] ? alloc_netdev_mqs+0x7b3/0xcc0 [ 60.621771] ? is_bpf_text_address+0x86/0xf0 [ 60.622416] ? is_dynamic_key+0x230/0x230 [ 60.623032] ? unwind_get_return_address+0x5f/0xa0 [ 60.623757] ? create_prof_cpu_mask+0x20/0x20 [ 60.624408] ? arch_stack_walk+0x83/0xb0 [ 60.625023] __lock_acquire+0xd8/0x3de0 [ 60.625616] ? stack_trace_save+0x82/0xb0 [ 60.626225] ? stack_trace_consume_entry+0x160/0x160 [ 60.626957] ? deactivate_slab.isra.80+0x2c5/0x800 [ 60.627668] ? register_lock_class+0x14d0/0x14d0 [ 60.628380] ? alloc_netdev_mqs+0x7b3/0xcc0 [ 60.629020] ? save_stack+0x69/0x80 [ 60.629574] ? save_stack+0x19/0x80 [ 60.630121] ? __kasan_kmalloc.constprop.4+0xa0/0xd0 [ 60.630859] ? __kmalloc_node+0x16f/0x480 [ 60.631472] ? alloc_netdev_mqs+0x7b3/0xcc0 [ 60.632121] ? rtnl_create_link+0x2ed/0xad0 [ 60.634388] ? __rtnl_newlink+0xad4/0x11b0 [ 60.635024] lock_acquire+0x164/0x3b0 [ 60.635608] ? bond_3ad_update_lacp_rate+0x91/0x200 [bonding] [ 60.636463] _raw_spin_lock_bh+0x38/0x70 [ 60.637084] ? bond_3ad_update_lacp_rate+0x91/0x200 [bonding] [ 60.637930] bond_3ad_update_lacp_rate+0x91/0x200 [bonding] [ 60.638753] ? bond_3ad_lacpdu_recv+0xb30/0xb30 [bonding] [ 60.639552] ? bond_opt_get_val+0x180/0x180 [bonding] [ 60.640307] ? ___slab_alloc+0x5aa/0x610 [ 60.640925] bond_option_lacp_rate_set+0x71/0x140 [bonding] [ 60.641751] __bond_opt_set+0x1ff/0xbb0 [bonding] [ 60.643217] ? kasan_unpoison_shadow+0x30/0x40 [ 60.643924] bond_changelink+0x9a4/0x1700 [bonding] [ 60.644653] ? memset+0x1f/0x40 [ 60.742941] ? bond_slave_changelink+0x1a0/0x1a0 [bonding] [ 60.752694] ? alloc_netdev_mqs+0x8ea/0xcc0 [ 60.753330] ? rtnl_create_link+0x2ed/0xad0 [ 60.753964] bond_newlink+0x1e/0x60 [bonding] [ 60.754612] __rtnl_newlink+0xb9f/0x11b0 [ ... ] Reported-by: syzbot+8da67f407bcba2c72e6e@syzkaller.appspotmail.com Reported-by: syzbot+0d083911ab18b710da71@syzkaller.appspotmail.com Fixes: 089bca2caed0 ("bonding: use dynamic lockdep key instead of subclass") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3b0a18b |
|
21-Oct-2019 |
Taehee Yoo <ap420073@gmail.com> |
net: remove unnecessary variables and callback This patch removes variables and callback these are related to the nested device structure. devices that can be nested have their own nest_level variable that represents the depth of nested devices. In the previous patch, new {lower/upper}_level variables are added and they replace old private nest_level variable. So, this patch removes all 'nest_level' variables. In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added to get lockdep subclass value, which is actually lower nested depth value. But now, they use the dynamic lockdep key to avoid lockdep warning instead of the subclass. So, this patch removes ->ndo_get_lock_subclass() callback. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
089bca2c |
|
21-Oct-2019 |
Taehee Yoo <ap420073@gmail.com> |
bonding: use dynamic lockdep key instead of subclass All bonding device has same lockdep key and subclass is initialized with nest_level. But actual nest_level value can be changed when a lower device is attached. And at this moment, the subclass should be updated but it seems to be unsafe. So this patch makes bonding use dynamic lockdep key instead of the subclass. Test commands: ip link add bond0 type bond for i in {1..5} do let A=$i-1 ip link add bond$i type bond ip link set bond$i master bond$A done ip link set bond5 master bond0 Splat looks like: [ 307.992912] WARNING: possible recursive locking detected [ 307.993656] 5.4.0-rc3+ #96 Tainted: G W [ 307.994367] -------------------------------------------- [ 307.995092] ip/761 is trying to acquire lock: [ 307.995710] ffff8880513aac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 307.997045] but task is already holding lock: [ 307.997923] ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 307.999215] other info that might help us debug this: [ 308.000251] Possible unsafe locking scenario: [ 308.001137] CPU0 [ 308.001533] ---- [ 308.001915] lock(&(&bond->stats_lock)->rlock#2/2); [ 308.002609] lock(&(&bond->stats_lock)->rlock#2/2); [ 308.003302] *** DEADLOCK *** [ 308.004310] May be due to missing lock nesting notation [ 308.005319] 3 locks held by ip/761: [ 308.005830] #0: ffffffff9fcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0 [ 308.006894] #1: ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 308.008243] #2: ffffffff9f9219c0 (rcu_read_lock){....}, at: bond_get_stats+0x9f/0x500 [bonding] [ 308.009422] stack backtrace: [ 308.010124] CPU: 0 PID: 761 Comm: ip Tainted: G W 5.4.0-rc3+ #96 [ 308.011097] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 308.012179] Call Trace: [ 308.012601] dump_stack+0x7c/0xbb [ 308.013089] __lock_acquire+0x269d/0x3de0 [ 308.013669] ? register_lock_class+0x14d0/0x14d0 [ 308.014318] lock_acquire+0x164/0x3b0 [ 308.014858] ? bond_get_stats+0xb8/0x500 [bonding] [ 308.015520] _raw_spin_lock_nested+0x2e/0x60 [ 308.016129] ? bond_get_stats+0xb8/0x500 [bonding] [ 308.017215] bond_get_stats+0xb8/0x500 [bonding] [ 308.018454] ? bond_arp_rcv+0xf10/0xf10 [bonding] [ 308.019710] ? rcu_read_lock_held+0x90/0xa0 [ 308.020605] ? rcu_read_lock_sched_held+0xc0/0xc0 [ 308.021286] ? bond_get_stats+0x9f/0x500 [bonding] [ 308.021953] dev_get_stats+0x1ec/0x270 [ 308.022508] bond_get_stats+0x1d1/0x500 [bonding] Fixes: d3fff6c443fe ("net: add netdev_lockdep_set_classes() helper") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
65de65d9 |
|
21-Oct-2019 |
Taehee Yoo <ap420073@gmail.com> |
bonding: fix unexpected IFF_BONDING bit unset The IFF_BONDING means bonding master or bonding slave device. ->ndo_add_slave() sets IFF_BONDING flag and ->ndo_del_slave() unsets IFF_BONDING flag. bond0<--bond1 Both bond0 and bond1 are bonding device and these should keep having IFF_BONDING flag until they are removed. But bond1 would lose IFF_BONDING at ->ndo_del_slave() because that routine do not check whether the slave device is the bonding type or not. This patch adds the interface type check routine before removing IFF_BONDING flag. Test commands: ip link add bond0 type bond ip link add bond1 type bond ip link set bond1 master bond0 ip link set bond1 nomaster ip link del bond1 type bond ip link add bond1 type bond Splat looks like: [ 226.665555] proc_dir_entry 'bonding/bond1' already registered [ 226.666440] WARNING: CPU: 0 PID: 737 at fs/proc/generic.c:361 proc_register+0x2a9/0x3e0 [ 226.667571] Modules linked in: bonding af_packet sch_fq_codel ip_tables x_tables unix [ 226.668662] CPU: 0 PID: 737 Comm: ip Not tainted 5.4.0-rc3+ #96 [ 226.669508] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 226.670652] RIP: 0010:proc_register+0x2a9/0x3e0 [ 226.671612] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 39 01 00 00 48 8b 04 24 48 89 ea 48 c7 c7 a0 0b 14 9f 48 8b b0 e 0 00 00 00 e8 07 e7 88 ff <0f> 0b 48 c7 c7 40 2d a5 9f e8 59 d6 23 01 48 8b 4c 24 10 48 b8 00 [ 226.675007] RSP: 0018:ffff888050e17078 EFLAGS: 00010282 [ 226.675761] RAX: dffffc0000000008 RBX: ffff88805fdd0f10 RCX: ffffffff9dd344e2 [ 226.676757] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88806c9f6b8c [ 226.677751] RBP: ffff8880507160f3 R08: ffffed100d940019 R09: ffffed100d940019 [ 226.678761] R10: 0000000000000001 R11: ffffed100d940018 R12: ffff888050716008 [ 226.679757] R13: ffff8880507160f2 R14: dffffc0000000000 R15: ffffed100a0e2c1e [ 226.680758] FS: 00007fdc217cc0c0(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000 [ 226.681886] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 226.682719] CR2: 00007f49313424d0 CR3: 0000000050e46001 CR4: 00000000000606f0 [ 226.683727] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 226.684725] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 226.685681] Call Trace: [ 226.687089] proc_create_seq_private+0xb3/0xf0 [ 226.687778] bond_create_proc_entry+0x1b3/0x3f0 [bonding] [ 226.691458] bond_netdev_event+0x433/0x970 [bonding] [ 226.692139] ? __module_text_address+0x13/0x140 [ 226.692779] notifier_call_chain+0x90/0x160 [ 226.693401] register_netdevice+0x9b3/0xd80 [ 226.694010] ? alloc_netdev_mqs+0x854/0xc10 [ 226.694629] ? netdev_change_features+0xa0/0xa0 [ 226.695278] ? rtnl_create_link+0x2ed/0xad0 [ 226.695849] bond_newlink+0x2a/0x60 [bonding] [ 226.696422] __rtnl_newlink+0xb9f/0x11b0 [ 226.696968] ? rtnl_link_unregister+0x220/0x220 [ ... ] Fixes: 0b680e753724 ("[PATCH] bonding: Add priv_flag to avoid event mishandling") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ab92d68f |
|
21-Oct-2019 |
Taehee Yoo <ap420073@gmail.com> |
net: core: add generic lockdep keys Some interface types could be nested. (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..) These interface types should set lockdep class because, without lockdep class key, lockdep always warn about unexisting circular locking. In the current code, these interfaces have their own lockdep class keys and these manage itself. So that there are so many duplicate code around the /driver/net and /net/. This patch adds new generic lockdep keys and some helper functions for it. This patch does below changes. a) Add lockdep class keys in struct net_device - qdisc_running, xmit, addr_list, qdisc_busylock - these keys are used as dynamic lockdep key. b) When net_device is being allocated, lockdep keys are registered. - alloc_netdev_mqs() c) When net_device is being free'd llockdep keys are unregistered. - free_netdev() d) Add generic lockdep key helper function - netdev_register_lockdep_key() - netdev_unregister_lockdep_key() - netdev_update_lockdep_key() e) Remove unnecessary generic lockdep macro and functions f) Remove unnecessary lockdep code of each interfaces. After this patch, each interface modules don't need to maintain their lockdep keys. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a7137534 |
|
07-Oct-2019 |
Eric Dumazet <edumazet@google.com> |
bonding: fix potential NULL deref in bond_update_slave_arr syzbot got a NULL dereference in bond_update_slave_arr() [1], happening after a failure to allocate bond->slave_arr A workqueue (bond_slave_arr_handler) is supposed to retry the allocation later, but if the slave is removed before the workqueue had a chance to complete, bond->slave_arr can still be NULL. [1] Failed to build slave-array. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI Modules linked in: Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:bond_update_slave_arr.cold+0xc6/0x198 drivers/net/bonding/bond_main.c:4039 RSP: 0018:ffff88018fe33678 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000290b000 RDX: 0000000000000000 RSI: ffffffff82b63037 RDI: ffff88019745ea20 RBP: ffff88018fe33760 R08: ffff880170754280 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff88019745ea00 R14: 0000000000000000 R15: ffff88018fe338b0 FS: 00007febd837d700(0000) GS:ffff8801dad00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004540a0 CR3: 00000001c242e005 CR4: 00000000001626f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff82b5b45e>] __bond_release_one+0x43e/0x500 drivers/net/bonding/bond_main.c:1923 [<ffffffff82b5b966>] bond_release drivers/net/bonding/bond_main.c:2039 [inline] [<ffffffff82b5b966>] bond_do_ioctl+0x416/0x870 drivers/net/bonding/bond_main.c:3562 [<ffffffff83ae25f4>] dev_ifsioc+0x6f4/0x940 net/core/dev_ioctl.c:328 [<ffffffff83ae2e58>] dev_ioctl+0x1b8/0xc70 net/core/dev_ioctl.c:495 [<ffffffff83995ffd>] sock_do_ioctl+0x1bd/0x300 net/socket.c:1088 [<ffffffff83996a80>] sock_ioctl+0x300/0x5d0 net/socket.c:1196 [<ffffffff81b124db>] vfs_ioctl fs/ioctl.c:47 [inline] [<ffffffff81b124db>] file_ioctl fs/ioctl.c:501 [inline] [<ffffffff81b124db>] do_vfs_ioctl+0xacb/0x1300 fs/ioctl.c:688 [<ffffffff81b12dc6>] SYSC_ioctl fs/ioctl.c:705 [inline] [<ffffffff81b12dc6>] SyS_ioctl+0xb6/0xe0 fs/ioctl.c:696 [<ffffffff8101ccc8>] do_syscall_64+0x528/0x770 arch/x86/entry/common.c:305 [<ffffffff84400091>] entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: ee6377147409 ("bonding: Simplify the xmit function for modes that use xmit_hash") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
#
d595b03d |
|
06-Aug-2019 |
YueHaibing <yuehaibing@huawei.com> |
bonding: Add vlan tx offload to hw_enc_features As commit 30d8177e8ac7 ("bonding: Always enable vlan tx offload") said, we should always enable bonding's vlan tx offload, pass the vlan packets to the slave devices with vlan tci, let them to handle vlan implementation. Now if encapsulation protocols like VXLAN is used, skb->encapsulation may be set, then the packet is passed to vlan device which based on bonding device. However in netif_skb_features(), the check of hw_enc_features: if (skb->encapsulation) features &= dev->hw_enc_features; clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results in same issue in commit 30d8177e8ac7 like this: vlan_dev_hard_start_xmit -->dev_queue_xmit -->validate_xmit_skb -->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared -->validate_xmit_vlan -->__vlan_hwaccel_push_inside //skb->tci is cleared ... --> bond_start_xmit --> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34 --> __skb_flow_dissect // nhoff point to IP header --> case htons(ETH_P_8021Q) // skb_vlan_tag_present is false, so vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan), //vlan point to ip header wrongly Fixes: b2a103e6d0af ("bonding: convert to ndo_fix_features") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
12185dfe |
|
16-Jul-2019 |
Thomas Falcon <tlfalcon@linux.ibm.com> |
bonding: Force slave speed check after link state recovery for 802.3ad The following scenario was encountered during testing of logical partition mobility on pseries partitions with bonded ibmvnic adapters in LACP mode. 1. Driver receives a signal that the device has been swapped, and it needs to reset to initialize the new device. 2. Driver reports loss of carrier and begins initialization. 3. Bonding driver receives NETDEV_CHANGE notifier and checks the slave's current speed and duplex settings. Because these are unknown at the time, the bond sets its link state to BOND_LINK_FAIL and handles the speed update, clearing AD_PORT_LACP_ENABLE. 4. Driver finishes recovery and reports that the carrier is on. 5. Bond receives a new notification and checks the speed again. The speeds are valid but miimon has not altered the link state yet. AD_PORT_LACP_ENABLE remains off. Because the slave's link state is still BOND_LINK_FAIL, no further port checks are made when it recovers. Though the slave devices are operational and have valid speed and duplex settings, the bond will not send LACPDU's. The simplest fix I can see is to force another speed check in bond_miimon_commit. This way the bond will update AD_PORT_LACP_ENABLE if needed when transitioning from BOND_LINK_FAIL to BOND_LINK_UP. CC: Jarod Wilson <jarod@redhat.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07a4ddec |
|
02-Jul-2019 |
Vincent Bernat <vincent@bernat.ch> |
bonding: add an option to specify a delay between peer notifications Currently, gratuitous ARP/ND packets are sent every `miimon' milliseconds. This commit allows a user to specify a custom delay through a new option, `peer_notif_delay'. Like for `updelay' and `downdelay', this delay should be a multiple of `miimon' to avoid managing an additional work queue. The configuration logic is copied from `updelay' and `downdelay'. However, the default value cannot be set using a module parameter: Netlink or sysfs should be used to configure this feature. When setting `miimon' to 100 and `peer_notif_delay' to 500, we can observe the 500 ms delay is respected: 20:30:19.354693 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28 20:30:19.874892 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28 20:30:20.394919 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28 20:30:20.914963 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28 In bond_mii_monitor(), I have tried to keep the lock logic readable. The change is due to the fact we cannot rely on a notification to lower the value of `bond->send_peer_notif' as `NETDEV_NOTIFY_PEERS' is only triggered once every N times, while we need to decrement the counter each time. iproute2 also needs to be updated to be able to specify this new attribute through `ip link'. Signed-off-by: Vincent Bernat <vincent@bernat.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9d1bc24b |
|
01-Jul-2019 |
Cong Wang <xiyou.wangcong@gmail.com> |
bonding: validate ip header before check IPPROTO_IGMP bond_xmit_roundrobin() checks for IGMP packets but it parses the IP header even before checking skb->protocol. We should validate the IP header with pskb_may_pull() before using iph->protocol. Reported-and-tested-by: syzbot+e5be16aa39ad6e755391@syzkaller.appspotmail.com Fixes: a2fd940f4cff ("bonding: fix broken multicast with round-robin mode") Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b8bd72d3 |
|
01-Jul-2019 |
Eric Dumazet <edumazet@google.com> |
bonding/main: fix NULL dereference in bond_select_active_slave() A bonding master can be up while best_slave is NULL. [12105.636318] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [12105.638204] mlx4_en: eth1: Linkstate event 1 -> 1 [12105.648984] IP: bond_select_active_slave+0x125/0x250 [12105.653977] PGD 0 P4D 0 [12105.656572] Oops: 0000 [#1] SMP PTI [12105.660487] gsmi: Log Shutdown Reason 0x03 [12105.664620] Modules linked in: kvm_intel loop act_mirred uhaul vfat fat stg_standard_ftl stg_megablocks stg_idt stg_hdi stg elephant_dev_num stg_idt_eeprom w1_therm wire i2c_mux_pca954x i2c_mux mlx4_i2c i2c_usb cdc_acm ehci_pci ehci_hcd i2c_iimc mlx4_en mlx4_ib ib_uverbs ib_core mlx4_core [last unloaded: kvm_intel] [12105.685686] mlx4_core 0000:03:00.0: dispatching link up event for port 2 [12105.685700] mlx4_en: eth2: Linkstate event 2 -> 1 [12105.685700] mlx4_en: eth2: Link Up (linkstate) [12105.724452] Workqueue: bond0 bond_mii_monitor [12105.728854] RIP: 0010:bond_select_active_slave+0x125/0x250 [12105.734355] RSP: 0018:ffffaf146a81fd88 EFLAGS: 00010246 [12105.739637] RAX: 0000000000000003 RBX: ffff8c62b03c6900 RCX: 0000000000000000 [12105.746838] RDX: 0000000000000000 RSI: ffffaf146a81fd08 RDI: ffff8c62b03c6000 [12105.754054] RBP: ffffaf146a81fdb8 R08: 0000000000000001 R09: ffff8c517d387600 [12105.761299] R10: 00000000001075d9 R11: ffffffffaceba92f R12: 0000000000000000 [12105.768553] R13: ffff8c8240ae4800 R14: 0000000000000000 R15: 0000000000000000 [12105.775748] FS: 0000000000000000(0000) GS:ffff8c62bfa40000(0000) knlGS:0000000000000000 [12105.783892] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12105.789716] CR2: 0000000000000000 CR3: 0000000d0520e001 CR4: 00000000001626f0 [12105.796976] Call Trace: [12105.799446] [<ffffffffac31d387>] bond_mii_monitor+0x497/0x6f0 [12105.805317] [<ffffffffabd42643>] process_one_work+0x143/0x370 [12105.811225] [<ffffffffabd42c7a>] worker_thread+0x4a/0x360 [12105.816761] [<ffffffffabd48bc5>] kthread+0x105/0x140 [12105.821865] [<ffffffffabd42c30>] ? rescuer_thread+0x380/0x380 [12105.827757] [<ffffffffabd48ac0>] ? kthread_associate_blkcg+0xc0/0xc0 [12105.834266] [<ffffffffac600241>] ret_from_fork+0x51/0x60 Fixes: e2a7420df2e0 ("bonding/main: convert to using slave printk macros") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: John Sperbeck <jsperbeck@google.com> Cc: Jarod Wilson <jarod@redhat.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
30d8177e |
|
26-Jun-2019 |
YueHaibing <yuehaibing@huawei.com> |
bonding: Always enable vlan tx offload We build vlan on top of bonding interface, which vlan offload is off, bond mode is 802.3ad (LACP) and xmit_hash_policy is BOND_XMIT_POLICY_ENCAP34. Because vlan tx offload is off, vlan tci is cleared and skb push the vlan header in validate_xmit_vlan() while sending from vlan devices. Then in bond_xmit_hash, __skb_flow_dissect() fails to get information from protocol headers encapsulated within vlan, because 'nhoff' is points to IP header, so bond hashing is based on layer 2 info, which fails to distribute packets across slaves. This patch always enable bonding's vlan tx offload, pass the vlan packets to the slave devices with vlan tci, let them to handle vlan implementation. Fixes: 278339a42a1b ("bonding: propogate vlan_features to bonding master") Suggested-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e2a7420d |
|
07-Jun-2019 |
Jarod Wilson <jarod@redhat.com> |
bonding/main: convert to using slave printk macros All of these printk instances benefit from having both master and slave device information included, so convert to using a standardized macro format and remove redundant information. Suggested-by: Joe Perches <joe@perches.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f43b6530 |
|
07-Jun-2019 |
Jarod Wilson <jarod@redhat.com> |
bonding: fix error messages in bond_do_fail_over_mac Passing the bond name again to debug output when referencing slave is wrong. We're trying to set the bond's MAC to that of the new_active slave, so adjust the error message slightly and pass in the slave's name, not the bond's. Then we're trying to set the MAC on the old active slave, but putting the new active slave's name in the output. While we're at it, clarify the error messages so you know which one actually triggered. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75466dce |
|
07-Jun-2019 |
Jarod Wilson <jarod@redhat.com> |
bonding: improve event debug usability Seeing bonding debug log data along the lines of "event: 5" is a bit spartan, and often requires a lookup table if you don't remember what every event is. Make use of netdev_cmd_to_name for an improved debugging experience, so for the prior example, you'll see: "bond_netdev_event received NETDEV_REGISTER" instead (both are prefixed with the device for which the event pertains). CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e770b50 |
|
03-Jun-2019 |
Ariel Levkovich <lariel@mellanox.com> |
net: bonding: Inherit MPLS features from slave devices When setting the bonding interface net device features, the kernel code doesn't address the slaves' MPLS features and doesn't inherit them. Therefore, HW offloads that enhance performance such as checksumming and TSO are disabled for MPLS tagged traffic flowing via the bonding interface. The patch add the inheritance of the MPLS features from the slave devices with a similar logic to setting the bonding device's VLAN and encapsulation features. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
33403121 |
|
24-May-2019 |
Jarod Wilson <jarod@redhat.com> |
bonding/802.3ad: fix slave link initialization transition states Once in a while, with just the right timing, 802.3ad slaves will fail to properly initialize, winding up in a weird state, with a partner system mac address of 00:00:00:00:00:00. This started happening after a fix to properly track link_failure_count tracking, where an 802.3ad slave that reported itself as link up in the miimon code, but wasn't able to get a valid speed/duplex, started getting set to BOND_LINK_FAIL instead of BOND_LINK_DOWN. That was the proper thing to do for the general "my link went down" case, but has created a link initialization race that can put the interface in this odd state. The simple fix is to instead set the slave link to BOND_LINK_DOWN again, if the link has never been up (last_link_up == 0), so the link state doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed in this case, it simply hasn't been up yet, and this prevents the unnecessary state change from DOWN to FAIL and getting stuck in an init failure w/o a partner mac. Fixes: ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking") CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Tested-by: Heesoon Kim <Heesoon.Kim@stratus.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
92480b39 |
|
12-Apr-2019 |
Sabrina Dubroca <sd@queasysnail.net> |
bonding: fix event handling for stacked bonds When a bond is enslaved to another bond, bond_netdev_event() only handles the event as if the bond is a master, and skips treating the bond as a slave. This leads to a refcount leak on the slave, since we don't remove the adjacency to its master and the master holds a reference on the slave. Reproducer: ip link add bondL type bond ip link add bondU type bond ip link set bondL master bondU ip link del bondL No "Fixes:" tag, this code is older than git history. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a350ecce |
|
20-Mar-2019 |
Paolo Abeni <pabeni@redhat.com> |
net: remove 'fallback' argument from dev->ndo_select_queue() After the previous patch, all the callers of ndo_select_queue() provide as a 'fallback' argument netdev_pick_tx. The only exceptions are nested calls to ndo_select_queue(), which pass down the 'fallback' available in the current scope - still netdev_pick_tx. We can drop such argument and replace fallback() invocation with netdev_pick_tx(). This avoids an indirect call per xmit packet in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen) with device drivers implementing such ndo. It also clean the code a bit. Tested with ixgbe and CONFIG_FCOE=m With pktgen using queue xmit: threads vanilla patched (kpps) (kpps) 1 2334 2428 2 4166 4278 4 7895 8100 v1 -> v2: - rebased after helper's name change Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
47f70626 |
|
22-Feb-2019 |
Florian Fainelli <f.fainelli@gmail.com> |
net: Remove switchdev.h inclusion from team/bond/vlan This is no longer necessary after eca59f691566 ("net: Remove support for bridge bypass ndos from stacked devices") Suggested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andy Gospodarek <andy@greyhouse.net> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3c963a33 |
|
18-Feb-2019 |
Michal Soltys <soltys@ziu.info> |
bonding: fix PACKET_ORIGDEV regression This patch fixes a subtle PACKET_ORIGDEV regression which was a side effect of fixes introduced by: 6a9e461f6fe4 bonding: pass link-local packets to bonding master also. ... to: b89f04c61efe bonding: deliver link-local packets with skb->dev set to link that packets arrived on While 6a9e461f6fe4 restored pre-b89f04c61efe presence of link-local packets on bonding masters (which is required e.g. by linux bridges participating in spanning tree or needed for lab-like setups created with group_fwd_mask) it also caused the originating device information to be lost due to cloning. Maciej Żenczykowski proposed another solution that doesn't require packet cloning and retains original device information - instead of returning RX_HANDLER_PASS for all link-local packets it's now limited only to packets from inactive slaves. At the same time, packets passed to bonding masters retain correct information about the originating device and PACKET_ORIGDEV can be used to determine it. This elegantly solves all issues so far: - link-local packets that were removed from bonding masters - LLDP daemons being forced to explicitly bind to slave interfaces - PACKET_ORIGDEV having no effect on bond interfaces Fixes: 6a9e461f6fe4 (bonding: pass link-local packets to bonding master also.) Reported-by: Vincent Bernat <vincent@bernat.ch> Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
001e465f |
|
07-Jan-2019 |
Willem de Bruijn <willemb@google.com> |
bonding: update nest level on unlink A network device stack with multiple layers of bonding devices can trigger a false positive lockdep warning. Adding lockdep nest levels fixes this. Update the level on both enslave and unlink, to avoid the following series of events .. ip netns add test ip netns exec test bash ip link set dev lo addr 00:11:22:33:44:55 ip link set dev lo down ip link add dev bond1 type bond ip link add dev bond2 type bond ip link set dev lo master bond1 ip link set dev bond1 master bond2 ip link set dev bond1 nomaster ip link set dev bond2 master bond1 .. from still generating a splat: [ 193.652127] ====================================================== [ 193.658231] WARNING: possible circular locking dependency detected [ 193.664350] 4.20.0 #8 Not tainted [ 193.668310] ------------------------------------------------------ [ 193.674417] ip/15577 is trying to acquire lock: [ 193.678897] 00000000a40e3b69 (&(&bond->stats_lock)->rlock#3/3){+.+.}, at: bond_get_stats+0x58/0x290 [ 193.687851] but task is already holding lock: [ 193.693625] 00000000807b9d9f (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0x58/0x290 [..] [ 193.851092] lock_acquire+0xa7/0x190 [ 193.855138] _raw_spin_lock_nested+0x2d/0x40 [ 193.859878] bond_get_stats+0x58/0x290 [ 193.864093] dev_get_stats+0x5a/0xc0 [ 193.868140] bond_get_stats+0x105/0x290 [ 193.872444] dev_get_stats+0x5a/0xc0 [ 193.876493] rtnl_fill_stats+0x40/0x130 [ 193.880797] rtnl_fill_ifinfo+0x6c5/0xdc0 [ 193.885271] rtmsg_ifinfo_build_skb+0x86/0xe0 [ 193.890091] rtnetlink_event+0x5b/0xa0 [ 193.894320] raw_notifier_call_chain+0x43/0x60 [ 193.899225] netdev_change_features+0x50/0xa0 [ 193.904044] bond_compute_features.isra.46+0x1ab/0x270 [ 193.909640] bond_enslave+0x141d/0x15b0 [ 193.913946] do_set_master+0x89/0xa0 [ 193.918016] do_setlink+0x37c/0xda0 [ 193.921980] __rtnl_newlink+0x499/0x890 [ 193.926281] rtnl_newlink+0x48/0x70 [ 193.930238] rtnetlink_rcv_msg+0x171/0x4b0 [ 193.934801] netlink_rcv_skb+0xd1/0x110 [ 193.939103] rtnetlink_rcv+0x15/0x20 [ 193.943151] netlink_unicast+0x3b5/0x520 [ 193.947544] netlink_sendmsg+0x2fd/0x3f0 [ 193.951942] sock_sendmsg+0x38/0x50 [ 193.955899] ___sys_sendmsg+0x2ba/0x2d0 [ 193.960205] __x64_sys_sendmsg+0xad/0x100 [ 193.964687] do_syscall_64+0x5a/0x460 [ 193.968823] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 7e2556e40026 ("bonding: avoid lockdep confusion in bond_get_stats()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1caf40de |
|
13-Dec-2018 |
Petr Machata <petrm@mellanox.com> |
net: bonding: Issue NETDEV_PRE_CHANGEADDR Give interested parties an opportunity to veto an impending HW address change. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b9245914 |
|
13-Dec-2018 |
Petr Machata <petrm@mellanox.com> |
net: bonding: Give bond_set_dev_addr() a return value Before NETDEV_CHANGEADDR, bond driver should emit NETDEV_PRE_CHANGEADDR, and allow consumers to veto the address change. To propagate further the return code from NETDEV_PRE_CHANGEADDR, give the function that implements address change a return value. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3a37a963 |
|
13-Dec-2018 |
Petr Machata <petrm@mellanox.com> |
net: dev: Add extack argument to dev_set_mac_address() A follow-up patch will add a notifier type NETDEV_PRE_CHANGEADDR, which allows vetoing of MAC address changes. One prominent path to that notification is through dev_set_mac_address(). Therefore give this function an extack argument, so that it can be packed together with the notification. Thus a textual reason for rejection (or a warning) can be communicated back to the user. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
00f54e68 |
|
06-Dec-2018 |
Petr Machata <petrm@mellanox.com> |
net: core: dev: Add extack argument to dev_open() In order to pass extack together with NETDEV_PRE_UP notifications, it's necessary to route the extack to __dev_open() from diverse (possibly indirect) callers. One prominent API through which the notification is invoked is dev_open(). Therefore extend dev_open() with and extra extack argument and update all users. Most of the calls end up just encoding NULL, but bond and team drivers have the extack readily available. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ea53abfa |
|
04-Nov-2018 |
Jarod Wilson <jarod@redhat.com> |
bonding/802.3ad: fix link_failure_count tracking Commit 4d2c0cda07448ea6980f00102dc3964eb25e241c set slave->link to BOND_LINK_DOWN for 802.3ad bonds whenever invalid speed/duplex values were read, to fix a problem with slaves getting into weird states, but in the process, broke tracking of link failures, as going straight to BOND_LINK_DOWN when a link is indeed down (cable pulled, switch rebooted) means we broke out of bond_miimon_inspect()'s BOND_LINK_DOWN case because !link_state was already true, we never incremented commit, and never got a chance to call bond_miimon_commit(), where slave->link_failure_count would be incremented. I believe the simple fix here is to mark the slave as BOND_LINK_FAIL, and let bond_miimon_inspect() transition the link from _FAIL to either _UP or _DOWN, and in the latter case, we now get proper incrementing of link_failure_count again. Fixes: 4d2c0cda0744 ("bonding: speed/duplex update at NETDEV_UP event") CC: Mahesh Bandewar <maheshb@google.com> CC: David S. Miller <davem@davemloft.net> CC: netdev@vger.kernel.org CC: stable@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c9fbd71f |
|
18-Oct-2018 |
Debabrata Banerjee <dbanerje@akamai.com> |
netpoll: allow cleanup to be synchronous This fixes a problem introduced by: commit 2cde6acd49da ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock") When using netconsole on a bond, __netpoll_cleanup can asynchronously recurse multiple times, each __netpoll_free_async call can result in more __netpoll_free_async's. This means there is now a race between cleanup_work queues on multiple netpoll_info's on multiple devices and the configuration of a new netpoll. For example if a netconsole is set to enable 0, reconfigured, and enable 1 immediately, this netconsole will likely not work. Given the reason for __netpoll_free_async is it can be called when rtnl is not locked, if it is locked, we should be able to execute synchronously. It appears to be locked everywhere it's called from. Generalize the design pattern from the teaming driver for current callers of __netpoll_free_async. CC: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0f3b914c |
|
02-Oct-2018 |
Mahesh Bandewar <maheshb@google.com> |
bonding: fix warning message RX queue config for bonding master could be different from its slave device(s). With the commit 6a9e461f6fe4 ("bonding: pass link-local packets to bonding master also."), the packet is reinjected into stack with skb->dev as bonding master. This potentially triggers the message: "bondX received packet on queue Y, but number of RX queues is Z" whenever the queue that packet is received on is higher than the numrxqueues on bonding master (Y > Z). Fixes: 6a9e461f6fe4 ("bonding: pass link-local packets to bonding master also.") Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d4859d74 |
|
24-Sep-2018 |
Mahesh Bandewar <maheshb@google.com> |
bonding: avoid possible dead-lock Syzkaller reported this on a slightly older kernel but it's still applicable to the current kernel - ====================================================== WARNING: possible circular locking dependency detected 4.18.0-next-20180823+ #46 Not tainted ------------------------------------------------------ syz-executor4/26841 is trying to acquire lock: 00000000dd41ef48 ((wq_completion)bond_dev->name){+.+.}, at: flush_workqueue+0x2db/0x1e10 kernel/workqueue.c:2652 but task is already holding lock: 00000000768ab431 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline] 00000000768ab431 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4708 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (rtnl_mutex){+.+.}: __mutex_lock_common kernel/locking/mutex.c:925 [inline] __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77 bond_netdev_notify drivers/net/bonding/bond_main.c:1310 [inline] bond_netdev_notify_work+0x44/0xd0 drivers/net/bonding/bond_main.c:1320 process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153 worker_thread+0x189/0x13c0 kernel/workqueue.c:2296 kthread+0x35a/0x420 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 -> #1 ((work_completion)(&(&nnw->work)->work)){+.+.}: process_one_work+0xc0b/0x1aa0 kernel/workqueue.c:2129 worker_thread+0x189/0x13c0 kernel/workqueue.c:2296 kthread+0x35a/0x420 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 -> #0 ((wq_completion)bond_dev->name){+.+.}: lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901 flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655 drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820 destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155 __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138 bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734 register_netdevice+0x337/0x1100 net/core/dev.c:8410 bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453 rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343 netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115 __sys_sendmsg+0x11d/0x290 net/socket.c:2153 __do_sys_sendmsg net/socket.c:2162 [inline] __se_sys_sendmsg net/socket.c:2160 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Chain exists of: (wq_completion)bond_dev->name --> (work_completion)(&(&nnw->work)->work) --> rtnl_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock((work_completion)(&(&nnw->work)->work)); lock(rtnl_mutex); lock((wq_completion)bond_dev->name); *** DEADLOCK *** 1 lock held by syz-executor4/26841: stack backtrace: CPU: 1 PID: 26841 Comm: syz-executor4 Not tainted 4.18.0-next-20180823+ #46 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_circular_bug.isra.34.cold.55+0x1bd/0x27d kernel/locking/lockdep.c:1222 check_prev_add kernel/locking/lockdep.c:1862 [inline] check_prevs_add kernel/locking/lockdep.c:1975 [inline] validate_chain kernel/locking/lockdep.c:2416 [inline] __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3412 lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901 flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655 drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820 destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155 __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138 bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734 register_netdevice+0x337/0x1100 net/core/dev.c:8410 bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453 rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343 netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:632 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115 __sys_sendmsg+0x11d/0x290 net/socket.c:2153 __do_sys_sendmsg net/socket.c:2162 [inline] __se_sys_sendmsg net/socket.c:2160 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457089 Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f2df20a5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f2df20a66d4 RCX: 0000000000457089 RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003 RBP: 0000000000930140 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000004d40b8 R14: 00000000004c8ad8 R15: 0000000000000001 Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6a9e461f |
|
24-Sep-2018 |
Mahesh Bandewar <maheshb@google.com> |
bonding: pass link-local packets to bonding master also. Commit b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on") changed the behavior of how link-local-multicast packets are processed. The change in the behavior broke some legacy use cases where these packets are expected to arrive on bonding master device also. This patch passes the packet to the stack with the link it arrived on as well as passes to the bonding-master device to preserve the legacy use case. Fixes: b89f04c61efe ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on") Reported-by: Michal Soltys <soltys@ziu.info> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93f62ad5 |
|
21-Sep-2018 |
Eric Dumazet <edumazet@google.com> |
bonding: use netpoll_poll_dev() helper We want to allow NAPI drivers to no longer provide ndo_poll_controller() method, as it has been proven problematic. team driver must not look at its presence, but instead call netpoll_poll_dev() which factorize the needed actions. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7e2556e4 |
|
31-Jul-2018 |
Eric Dumazet <edumazet@google.com> |
bonding: avoid lockdep confusion in bond_get_stats() syzbot found that the following sequence produces a LOCKDEP splat [1] ip link add bond10 type bond ip link add bond11 type bond ip link set bond11 master bond10 To fix this, we can use the already provided nest_level. This patch also provides correct nesting for dev->addr_list_lock [1] WARNING: possible recursive locking detected 4.18.0-rc6+ #167 Not tainted -------------------------------------------- syz-executor751/4439 is trying to acquire lock: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline] (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 but task is already holding lock: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline] (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&bond->stats_lock)->rlock); lock(&(&bond->stats_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by syz-executor751/4439: #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77 #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: spin_lock include/linux/spinlock.h:310 [inline] #1: (____ptrval____) (&(&bond->stats_lock)->rlock){+.+.}, at: bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 #2: (____ptrval____) (rcu_read_lock){....}, at: bond_get_stats+0x0/0x560 include/linux/compiler.h:215 stack backtrace: CPU: 0 PID: 4439 Comm: syz-executor751 Not tainted 4.18.0-rc6+ #167 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline] check_deadlock kernel/locking/lockdep.c:1809 [inline] validate_chain kernel/locking/lockdep.c:2405 [inline] __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144 spin_lock include/linux/spinlock.h:310 [inline] bond_get_stats+0xb4/0x560 drivers/net/bonding/bond_main.c:3426 dev_get_stats+0x10f/0x470 net/core/dev.c:8316 bond_get_stats+0x232/0x560 drivers/net/bonding/bond_main.c:3432 dev_get_stats+0x10f/0x470 net/core/dev.c:8316 rtnl_fill_stats+0x4d/0xac0 net/core/rtnetlink.c:1169 rtnl_fill_ifinfo+0x1aa6/0x3fb0 net/core/rtnetlink.c:1611 rtmsg_ifinfo_build_skb+0xc8/0x190 net/core/rtnetlink.c:3268 rtmsg_ifinfo_event.part.30+0x45/0xe0 net/core/rtnetlink.c:3300 rtmsg_ifinfo_event net/core/rtnetlink.c:3297 [inline] rtnetlink_event+0x144/0x170 net/core/rtnetlink.c:4716 notifier_call_chain+0x180/0x390 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735 call_netdevice_notifiers net/core/dev.c:1753 [inline] netdev_features_change net/core/dev.c:1321 [inline] netdev_change_features+0xb3/0x110 net/core/dev.c:7759 bond_compute_features.isra.47+0x585/0xa50 drivers/net/bonding/bond_main.c:1120 bond_enslave+0x1b25/0x5da0 drivers/net/bonding/bond_main.c:1755 bond_do_ioctl+0x7cb/0xae0 drivers/net/bonding/bond_main.c:3528 dev_ifsioc+0x43c/0xb30 net/core/dev_ioctl.c:327 dev_ioctl+0x1b5/0xcc0 net/core/dev_ioctl.c:493 sock_do_ioctl+0x1d3/0x3e0 net/socket.c:992 sock_ioctl+0x30d/0x680 net/socket.c:1093 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 __do_sys_ioctl fs/ioctl.c:708 [inline] __se_sys_ioctl fs/ioctl.c:706 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x440859 Code: e8 2c af 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc51a92878 EFLAGS: 00000213 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440859 RDX: 0000000020000040 RSI: 0000000000008990 RDI: 0000000000000003 RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8 R10: 00000000022d5880 R11: 0000000000000213 R12: 0000000000007390 R13: 0000000000401db0 R14: 0000000000000000 R15: 0000000000000000 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4f49dec9 |
|
08-Jul-2018 |
Alexander Duyck <alexander.h.duyck@intel.com> |
net: allow ndo_select_queue to pass netdev This patch makes it so that instead of passing a void pointer as the accel_priv we instead pass a net_device pointer as sb_dev. Making this change allows us to pass the subordinate device through to the fallback function eventually so that we can keep the actual code in the ndo_select_queue call as focused on possible on the exception cases. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
#
6396bb22 |
|
12-Jun-2018 |
Kees Cook <keescook@chromium.org> |
treewide: kzalloc() -> kcalloc() The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
f44aa9ef |
|
23-May-2018 |
John Hurley <john.hurley@netronome.com> |
net: include hash policy in LAG changeupper info LAG upper event notifiers contain the tx type used by the LAG device. Extend this to also include the hash policy used for tx types that utilize hashing. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8eea1ca8 |
|
22-May-2018 |
Willem de Bruijn <willemb@google.com> |
gso: limit udp gso to egress-only virtual devices Until the udp receive stack supports large packets (UDP GRO), GSO packets must not loop from the egress to the ingress path. Revert the change that added NETIF_F_GSO_UDP_L4 to various virtual devices through NETIF_F_GSO_ENCAP_ALL as this included devices that may loop packets, such as veth and macvlan. Instead add it to specific devices that forward to another device's egress path, bonding and team. Fixes: 83aa025f535f ("udp: add gso support to virtual devices") CC: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7e878b60 |
|
16-May-2018 |
Tonghao Zhang <xiangxia.m.yue@gmail.com> |
bonding: introduce link change helper Introduce an new common helper to avoid redundancy. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3c898e2 |
|
16-May-2018 |
Debabrata Banerjee <dbanerje@akamai.com> |
Revert "bonding: allow carrier and link status to determine link state" This reverts commit 1386c36b30388f46a95100924bfcae75160db715. We don't want to encourage drivers to not report carrier status correctly, therefore remove this commit. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1386c36b |
|
14-May-2018 |
Debabrata Banerjee <dbanerje@akamai.com> |
bonding: allow carrier and link status to determine link state In a mixed environment it may be difficult to tell if your hardware support carrier, if it does not it can always report true. With a new use_carrier option of 2, we can check both carrier and link status sequentially, instead of one or the other Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e79c1055 |
|
14-May-2018 |
Debabrata Banerjee <dbanerje@akamai.com> |
bonding: allow use of tx hashing in balance-alb The rx load balancing provided by balance-alb is not mutually exclusive with using hashing for tx selection, and should provide a decent speed increase because this eliminates spinlocks and cache contention. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae35c6f7 |
|
11-May-2018 |
Tonghao Zhang <xiangxia.m.yue@gmail.com> |
bonding: use the skb_get/set_queue_mapping Use the skb_get_queue_mapping, skb_set_queue_mapping and skb_rx_queue_recorded for skb queue_mapping in bonding driver, but not use it directly. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dbdc8a21 |
|
11-May-2018 |
Tonghao Zhang <xiangxia.m.yue@gmail.com> |
bonding: replace the return value type The method ndo_start_xmit is defined as returning a netdev_tx_t, which is a typedef for an enum type, but the implementation in this driver returns an int. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
21706ee8 |
|
09-May-2018 |
Debabrata Banerjee <dbanerje@akamai.com> |
bonding: send learning packets for vlans on slave There was a regression at some point from the intended functionality of commit f60c3704e87d ("bonding: Fix alb mode to only use first level vlans.") Given the return value vlan_get_encap_level() we need to store the nest level of the bond device, and then compare the vlan's encap level to this. Without this, this check always fails and learning packets are never sent. In addition, this same commit caused a regression in the behavior of balance_alb, which requires learning packets be sent for all interfaces using the slave's mac in order to load balance properly. For vlan's that have not set a user mac, we can send after checking one bit. Otherwise we need send the set mac, albeit defeating rx load balancing for that vlan. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ddea788c |
|
22-Apr-2018 |
Xin Long <lucien.xin@gmail.com> |
bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave After Commit 8a8efa22f51b ("bonding: sync netpoll code with bridge"), it would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev if bond->dev->npinfo was set. However now slave_dev npinfo is set with bond->dev->npinfo before calling slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup(). It causes that the lower dev of this slave dev can't set its npinfo. One way to reproduce it: # modprobe bonding # brctl addbr br0 # brctl addif br0 eth1 # ifconfig bond0 192.168.122.1/24 up # ifenslave bond0 eth2 # systemctl restart netconsole # ifenslave bond0 br0 # ifconfig eth2 down # systemctl restart netconsole The netpoll won't really work. This patch is to remove that slave_dev npinfo setting in bond_enslave(). Fixes: 8a8efa22f51b ("bonding: sync netpoll code with bridge") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f635cee |
|
27-Mar-2018 |
Kirill Tkhai <ktkhai@virtuozzo.com> |
net: Drop pernet_operations::async Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9f5a90c1 |
|
25-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
bonding: process the err returned by dev_set_allmulti properly in bond_enslave When dev_set_promiscuity(1) succeeds but dev_set_allmulti(1) fails, dev_set_promiscuity(-1) should be done before going to the err path. Otherwise, dev->promiscuity will leak. Fixes: 7e1a1ac1fbaa ("bonding: Check return of dev_set_promiscuity/allmulti") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae42cc62 |
|
25-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
bonding: move dev_mc_sync after master_upper_dev_link in bond_enslave Beniamino found a crash when adding vlan as slave of bond which is also the parent link: ip link add bond1 type bond ip link set bond1 up ip link add link bond1 vlan1 type vlan id 80 ip link set vlan1 master bond1 The call trace is as below: [<ffffffffa850842a>] queued_spin_lock_slowpath+0xb/0xf [<ffffffffa8515680>] _raw_spin_lock+0x20/0x30 [<ffffffffa83f6f07>] dev_mc_sync+0x37/0x80 [<ffffffffc08687dc>] vlan_dev_set_rx_mode+0x1c/0x30 [8021q] [<ffffffffa83efd2a>] __dev_set_rx_mode+0x5a/0xa0 [<ffffffffa83f7138>] dev_mc_sync_multiple+0x78/0x80 [<ffffffffc084127c>] bond_enslave+0x67c/0x1190 [bonding] [<ffffffffa8401909>] do_setlink+0x9c9/0xe50 [<ffffffffa8403bf2>] rtnl_newlink+0x522/0x880 [<ffffffffa8403ff7>] rtnetlink_rcv_msg+0xa7/0x260 [<ffffffffa8424ecb>] netlink_rcv_skb+0xab/0xc0 [<ffffffffa83fe498>] rtnetlink_rcv+0x28/0x30 [<ffffffffa8424850>] netlink_unicast+0x170/0x210 [<ffffffffa8424bf8>] netlink_sendmsg+0x308/0x420 [<ffffffffa83cc396>] sock_sendmsg+0xb6/0xf0 This is actually a dead lock caused by sync slave hwaddr from master when the master is the slave's 'slave'. This dead loop check is actually done by netdev_master_upper_dev_link. However, Commit 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") moved it after dev_mc_sync. This patch is to fix it by moving dev_mc_sync after master_upper_dev_link, so that this loop check would be earlier than dev_mc_sync. It also moves if (mode == BOND_MODE_8023AD) into if (!bond_uses_primary) clause as an improvement. Note team driver also has this issue, I will fix it in another patch. Fixes: 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") Reported-by: Beniamino Galvani <bgalvani@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5c78f6bf |
|
25-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
bonding: fix the err path for dev hwaddr sync in bond_enslave vlan_vids_add_by_dev is called right after dev hwaddr sync, so on the err path it should unsync dev hwaddr. Otherwise, the slave dev's hwaddr will never be unsync when this err happens. Fixes: 1ff412ad7714 ("bonding: change the bond's vlan syncing functions with the standard ones") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6963ad69 |
|
26-Feb-2018 |
Kirill Tkhai <ktkhai@virtuozzo.com> |
net: Convert bond_net_ops These pernet_operations populate/depopulate /proc and /sys entries. Exit method unregisters all net bond devices, and it seems another pernet_operations are not interested in foreign net bond list. So, it's possible to mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
055db695 |
|
07-Nov-2017 |
Jay Vosburgh <jay.vosburgh@canonical.com> |
bonding: fix slave stuck in BOND_LINK_FAIL state The bonding miimon logic has a flaw, in that a failure of the rtnl_trylock can cause a slave to become permanently stuck in BOND_LINK_FAIL state. The sequence of events to cause this is as follows: 1) bond_miimon_inspect finds that a slave's link is down, and so calls bond_propose_link_state, setting slave->new_link_state to BOND_LINK_FAIL, then sets slave->new_link to BOND_LINK_DOWN and returns non-zero. 2) In bond_mii_monitor, the rtnl_trylock fails, and the timer is rescheduled. No change is committed. 3) bond_miimon_inspect is called again, but this time the slave from step 1 has recovered. slave->new_link is reset to NOCHANGE, and, as slave->link was never changed, the switch enters the BOND_LINK_UP case, and does nothing. The pending BOND_LINK_FAIL state from step 1 remains pending, as new_link_state is not reset. 4) The state from step 3 persists until another slave changes link state and causes bond_miimon_inspect to return non-zero. At this point, the BOND_LINK_FAIL state change on the slave from steps 1-3 is committed, and the slave will remain stuck in BOND_LINK_FAIL state even though it is actually link up. The remedy for this is to initialize new_link_state on each entry to bond_miimon_inspect, as is already done with new_link. Fixes: fb9eb899a6dc ("bonding: handle link transition from FAIL to UP correctly") Reported-by: Alex Sidorenko <alexandre.sidorenko@hpe.com> Reviewed-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b5f86218 |
|
05-Nov-2017 |
Hangbin Liu <liuhangbin@gmail.com> |
bonding: discard lowest hash bit for 802.3ad layer3+4 After commit 07f4c90062f8 ("tcp/dccp: try to not exhaust ip_local_port_range in connect()"), we will try to use even ports for connect(). Then if an application (seen clearly with iperf) opens multiple streams to the same destination IP and port, each stream will be given an even source port. So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing will always hash all these streams to the same interface. And the total throughput will limited to a single slave. Change the tcp code will impact the whole tcp behavior, only for bonding usage. Paolo Abeni suggested fix this by changing the bonding code only, which should be more reasonable, and less impact. Fix this by discarding the lowest hash bit because it contains little entropy. After the fix we can re-balance between slaves. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6aa7de05 |
|
23-Oct-2017 |
Mark Rutland <mark.rutland@arm.com> |
locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
4597efe3 |
|
23-Oct-2017 |
Xin Long <lucien.xin@gmail.com> |
bonding: remove rtmsg_ifinfo called in bond_master_upper_dev_link Since commit 42e52bf9e3ae ("net: add netnotifier event for upper device change"), netdev_master_upper_dev_link has generated NETDEV_CHANGEUPPER event which would send a notification to userspace in rtnetlink_event. There's no need to call rtmsg_ifinfo to send the notification any more. So this patch is to remove it from bond_master_upper_dev_link as well as bond_upper_dev_unlink to avoid the redundant notifications. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
759088bd |
|
04-Oct-2017 |
David Ahern <dsahern@gmail.com> |
net: bonding: Add extack messages for some enslave failures A number of bond_enslave errors are logged using the netdev_err API. Return those messages to userspace via the extack facility. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
42ab19ee |
|
04-Oct-2017 |
David Ahern <dsahern@gmail.com> |
net: Add extack to upper device linking Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
33eaf2a6 |
|
04-Oct-2017 |
David Ahern <dsahern@gmail.com> |
net: Add extack to ndo_add_slave Pass extack to do_set_master and down to ndo_add_slave Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d2c0cda |
|
27-Sep-2017 |
Mahesh Bandewar <maheshb@google.com> |
bonding: speed/duplex update at NETDEV_UP event Some NIC drivers don't have correct speed/duplex settings at the time they send NETDEV_UP notification and that messes up the bonding state. Especially 802.3ad mode which is very sensitive to these settings. In the current implementation we invoke bond_update_speed_duplex() when we receive NETDEV_UP, however, ignore the return value. If the values we get are invalid (UNKNOWN), then slave gets removed from the aggregator with speed and duplex set to UNKNOWN while link is still marked as UP. This patch fixes this scenario. Also 802.3ad mode is sensitive to these conditions while other modes are not, so making sure that it doesn't change the behavior for other modes. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ade97da |
|
26-Sep-2017 |
Alexey Dobriyan <adobriyan@gmail.com> |
arp: make arp_hdr_len() return unsigned int Negative ARP header length are not a thing. Constify arguments while I'm at it. Space savings: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-3 (-3) function old new delta arpt_do_table 1163 1160 -3 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f13ad104 |
|
12-Sep-2017 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
net: bonding: fix tlb_dynamic_lb default value Commit 8b426dc54cf4 ("bonding: remove hardcoded value") changed the default value for tlb_dynamic_lb which lead to either broken ALB mode (since tlb_dynamic_lb can be changed only in TLB) or setting TLB mode with tlb_dynamic_lb equal to 0. The first issue was recently fixed by setting tlb_dynamic_lb to 1 always when switching to ALB mode, but the default value is still wrong and we'll enter TLB mode with tlb_dynamic_lb equal to 0 if the mode is changed via netlink or sysfs. In order to restore the previous behaviour and default value simply remove the mode check around the default param initialization for tlb_dynamic_lb which will always set it to 1 as before. Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
11e9d782 |
|
11-Aug-2017 |
Andreas Born <futur.andy@googlemail.com> |
bonding: ratelimit failed speed/duplex update warning bond_miimon_commit() handles the UP transition for each slave of a bond in the case of MII. It is triggered 10 times per second for the default MII Polling interval of 100ms. For device drivers that do not implement __ethtool_get_link_ksettings() the call to bond_update_speed_duplex() fails persistently while the MII status could remain UP. That is, in this and other cases where the speed/duplex update keeps failing over a longer period of time while the MII state is UP, a warning is printed every MII polling interval. To address these excessive warnings net_ratelimit() should be used. Printing a warning once would not be sufficient since the call to bond_update_speed_duplex() could recover to succeed and fail again later. In that case there would be no new indication what went wrong. Fixes: b5bf0f5b16b9c (bonding: correctly update link status during mii-commit phase) Signed-off-by: Andreas Born <futur.andy@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ad729bc9 |
|
09-Aug-2017 |
Andreas Born <futur.andy@googlemail.com> |
bonding: require speed/duplex only for 802.3ad, alb and tlb The patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") puts the link state to down if bond_update_speed_duplex() cannot retrieve speed and duplex settings. Assumably the patch was written with 802.3ad mode in mind which relies on link speed/duplex settings. For other modes like active-backup these settings are not required. Thus, only for these other modes, this patch reintroduces support for slaves that do not support reporting speed or duplex such as wireless devices. This fixes the regression reported in bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547). Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") Signed-off-by: Andreas Born <futur.andy@googlemail.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d94708a5 |
|
25-Jul-2017 |
WANG Cong <xiyou.wangcong@gmail.com> |
bonding: commit link status change after propose Commit de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") moves link status commitment into bond_mii_monitor(), but it still relies on the return value of bond_miimon_inspect() as the hint. We need to return non-zero as long as we propose a link status change. Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") Reported-by: Benjamin Gilbert <benjamin.gilbert@coreos.com> Tested-by: Benjamin Gilbert <benjamin.gilbert@coreos.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cbf5ecb3 |
|
19-Jul-2017 |
Kosuke Tatsukawa <tatsu@ab.jp.nec.com> |
net: bonding: Fix transmit load balancing in balance-alb mode balance-alb mode used to have transmit dynamic load balancing feature enabled by default. However, transmit dynamic load balancing no longer works in balance-alb after commit 8b426dc54cf4 ("bonding: remove hardcoded value"). Both balance-tlb and balance-alb use the function bond_do_alb_xmit() to send packets. This function uses the parameter tlb_dynamic_lb. tlb_dynamic_lb used to have the default value of 1 for balance-alb, but now the value is set to 0 except in balance-tlb. Re-enable transmit dyanmic load balancing by initializing tlb_dynamic_lb for balance-alb similar to balance-tlb. Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value") Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f51048c3 |
|
06-Jul-2017 |
WANG Cong <xiyou.wangcong@gmail.com> |
bonding: avoid NETDEV_CHANGEMTU event when unregistering slave As Hongjun/Nicolas summarized in their original patch: " When a device changes from one netns to another, it's first unregistered, then the netns reference is updated and the dev is registered in the new netns. Thus, when a slave moves to another netns, it is first unregistered. This triggers a NETDEV_UNREGISTER event which is caught by the bonding driver. The driver calls bond_release(), which calls dev_set_mtu() and thus triggers NETDEV_CHANGEMTU (the device is still in the old netns). " This is a very special case, because the device is being unregistered no one should still care about the NETDEV_CHANGEMTU event triggered at this point, we can avoid broadcasting this event on this path, and avoid touching inetdev_event()/addrconf_notify() path. It requires to export __dev_set_mtu() to bonding driver. Reported-by: Hongjun Li <hongjun.li@6wind.com> Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eca59f69 |
|
08-Jun-2017 |
Arkadi Sharshevsky <arkadis@mellanox.com> |
net: Remove support for bridge bypass ndos from stacked devices Remove support for bridge bypass ndos from stacked devices. At this point no driver which supports stack device behavior offload supports operation with SELF flag. The case for upper device is already taken care of in both of the following cases: 1. FDB add/del - driver should check at the notification cb if the stacked device contains his ports. 2. Port attribute - calls switchdev code directly which checks for case of stack device. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cf124db5 |
|
07-May-2017 |
David S. Miller <davem@davemloft.net> |
net: Fix inconsistent teardown and release of private netdev state. Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7a7e96e0 |
|
27-May-2017 |
Vlad Yasevich <vyasevich@gmail.com> |
bonding: Prevent duplicate userspace notification Whenever a user changes bonding options, a NETDEV_CHANGEINFODATA notificatin is generated which results in a rtnelink message to be sent. While runnig 'ip monitor', we can actually see 2 messages, one a result of the event, and the other a result of state change that is generated bo netdev_state_change(). However, this is not always the case. If bonding changes were done via sysfs or ifenslave (old ioctl interface), then only 1 message is seen. This patch removes duplicate messages in the case of using netlink to configure bonding. It introduceds a separte function that triggers a netdev event and uses that function in the syfs and ioctl cases. This was discovered while auditing all the different envents and continues the effort of cleaning up duplicated netlink messages. CC: David Ahern <dsa@cumulusnetworks.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
797a9364 |
|
24-May-2017 |
Nithin Sujir <nsujir@tintri.com> |
bonding: Don't update slave->link until ready to commit In the loadbalance arp monitoring scheme, when a slave link change is detected, the slave->link is immediately updated and slave_state_changed is set. Later down the function, the rtnl_lock is acquired and the changes are committed, updating the bond link state. However, the acquisition of the rtnl_lock can fail. The next time the monitor runs, since slave->link is already updated, it determines that link is unchanged. This results in the bond link state permanently out of sync with the slave link. This patch modifies bond_loadbalance_arp_mon() to handle link changes identical to bond_ab_arp_{inspect/commit}(). The new link state is maintained in slave->new_link until we're ready to commit at which point it's copied into slave->link. NOTE: miimon_{inspect/commit}() has a more complex state machine requiring the use of the bond_{propose,commit}_link_state() functions which maintains the intermediate state in slave->link_new_state. The arp monitors don't require that. Testing: This bug is very easy to reproduce with the following steps. 1. In a loop, toggle a slave link of a bond slave interface. 2. In a separate loop, do ifconfig up/down of an unrelated interface to create contention for rtnl_lock. Within a few iterations, the bond link goes out of sync with the slave link. Signed-off-by: Nithin Nayak Sujir <nsujir@tintri.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
72ccc471 |
|
19-May-2017 |
Jarod Wilson <jarod@redhat.com> |
bonding: fix randomly populated arp target array In commit dc9c4d0fe023, the arp_target array moved from a static global to a local variable. By the nature of static globals, the array used to be initialized to all 0. At present, it's full of random data, which that gets interpreted as arp_target values, when none have actually been specified. Systems end up booting with spew along these lines: [ 32.161783] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready [ 32.168475] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready [ 32.175089] 8021q: adding VLAN 0 to HW filter on device lacp0 [ 32.193091] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready [ 32.204892] lacp0: Setting MII monitoring interval to 100 [ 32.211071] lacp0: Removing ARP target 216.124.228.17 [ 32.216824] lacp0: Removing ARP target 218.160.255.255 [ 32.222646] lacp0: Removing ARP target 185.170.136.184 [ 32.228496] lacp0: invalid ARP target 255.255.255.255 specified for removal [ 32.236294] lacp0: option arp_ip_target: invalid value (-255.255.255.255) [ 32.243987] lacp0: Removing ARP target 56.125.228.17 [ 32.249625] lacp0: Removing ARP target 218.160.255.255 [ 32.255432] lacp0: Removing ARP target 15.157.233.184 [ 32.261165] lacp0: invalid ARP target 255.255.255.255 specified for removal [ 32.268939] lacp0: option arp_ip_target: invalid value (-255.255.255.255) [ 32.276632] lacp0: Removing ARP target 16.0.0.0 [ 32.281755] lacp0: Removing ARP target 218.160.255.255 [ 32.287567] lacp0: Removing ARP target 72.125.228.17 [ 32.293165] lacp0: Removing ARP target 218.160.255.255 [ 32.298970] lacp0: Removing ARP target 8.125.228.17 [ 32.304458] lacp0: Removing ARP target 218.160.255.255 None of these were actually specified as ARP targets, and the driver does seem to clean up the mess okay, but it's rather noisy and confusing, leaks values to userspace, and the 255.255.255.255 spew shows up even when debug prints are disabled. The fix: just zero out arp_target at init time. While we're in here, init arp_all_targets_value in the right place. Fixes: dc9c4d0fe023 ("bonding: reduce scope of some global variables") CC: Mahesh Bandewar <maheshb@google.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org CC: stable@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
19cdead3 |
|
27-Apr-2017 |
Paolo Abeni <pabeni@redhat.com> |
bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal On slave list updates, the bonding driver computes its hard_header_len as the maximum of all enslaved devices's hard_header_len. If the slave list is empty, e.g. on last enslaved device removal, ETH_HLEN is used. Since the bonding header_ops are set only when the first enslaved device is attached, the above can lead to header_ops->create() being called with the wrong skb headroom in place. If bond0 is configured on top of ipoib devices, with the following commands: ifup bond0 for slave in $BOND_SLAVES_LIST; do ip link set dev $slave nomaster done ping -c 1 <ip on bond0 subnet> we will obtain a skb_under_panic() with a similar call trace: skb_push+0x3d/0x40 push_pseudo_header+0x17/0x30 [ib_ipoib] ipoib_hard_header+0x4e/0x80 [ib_ipoib] arp_create+0x12f/0x220 arp_send_dst.part.19+0x28/0x50 arp_solicit+0x115/0x290 neigh_probe+0x4d/0x70 __neigh_event_send+0xa7/0x230 neigh_resolve_output+0x12e/0x1c0 ip_finish_output2+0x14b/0x390 ip_finish_output+0x136/0x1e0 ip_output+0x76/0xe0 ip_local_out+0x35/0x40 ip_send_skb+0x19/0x40 ip_push_pending_frames+0x33/0x40 raw_sendmsg+0x7d3/0xb50 inet_sendmsg+0x31/0xb0 sock_sendmsg+0x38/0x50 SYSC_sendto+0x102/0x190 SyS_sendto+0xe/0x10 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 This change addresses the issue avoiding updating the bonding device hard_header_len when the slaves list become empty, forbidding to shrink it below the value used by header_ops->create(). The bug is there since commit 54ef31371407 ("[PATCH] bonding: Handle large hard_header_len") but the panic can be triggered only since commit fc791b633515 ("IB/ipoib: move back IB LL address into the hard header"). Reported-by: Norbert P <noe@physik.uzh.ch> Fixes: 54ef31371407 ("[PATCH] bonding: Handle large hard_header_len") Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ea8ffc08 |
|
20-Apr-2017 |
Mahesh Bandewar <maheshb@google.com> |
bonding: fix wq initialization for links created via netlink Earlier patch 4493b81bea ("bonding: initialize work-queues during creation of bond") moved the work-queue initialization from bond_open() to bond_create(). However this caused the link those are created using netlink 'create bond option' (ip link add bondX type bond); create the new trunk without initializing work-queues. Prior to the above mentioned change, ndo_open was in both paths and things worked correctly. The consequence is visible in the report shared by Joe Stringer - I've noticed that this patch breaks bonding within namespaces if you're not careful to perform device cleanup correctly. Here's my repro script, you can run on any net-next with this patch and you'll start seeing some weird behaviour: ip netns add foo ip li add veth0 type veth peer name veth0+ netns foo ip li add veth1 type veth peer name veth1+ netns foo ip netns exec foo ip li add bond0 type bond ip netns exec foo ip li set dev veth0+ master bond0 ip netns exec foo ip li set dev veth1+ master bond0 ip netns exec foo ip addr add dev bond0 192.168.0.1/24 ip netns exec foo ip li set dev bond0 up ip li del dev veth0 ip li del dev veth1 The second to last command segfaults, last command hangs. rtnl is now permanently locked. It's not a problem if you take bond0 down before deleting veths, or delete bond0 before deleting veths. If you delete either end of the veth pair as per above, either inside or outside the namespace, it hits this problem. Here's some kernel logs: [ 1221.801610] bond0: Enslaving veth0+ as an active interface with an up link [ 1224.449581] bond0: Enslaving veth1+ as an active interface with an up link [ 1281.193863] bond0: Releasing backup interface veth0+ [ 1281.193866] bond0: the permanent HWaddr of veth0+ - 16:bf:fb:e0:b8:43 - is still in use by bond0 - set the HWaddr of veth0+ to a different address to avoid conflicts [ 1281.193867] ------------[ cut here ]------------ [ 1281.193873] WARNING: CPU: 0 PID: 2024 at kernel/workqueue.c:1511 __queue_delayed_work+0x13f/0x150 [ 1281.193873] Modules linked in: bonding veth openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat autofs4 nfsd auth_rpcgss nfs_acl binfmt_misc nfs lockd grace sunrpc fscache ppdev vmw_balloon coretemp psmouse serio_raw vmwgfx ttm drm_kms_helper vmw_vmci netconsole parport_pc configfs drm i2c_piix4 fb_sys_fops syscopyarea sysfillrect sysimgblt shpchp mac_hid nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack libcrc32c lp parport hid_generic usbhid hid mptspi mptscsih e1000 mptbase ahci libahci [ 1281.193905] CPU: 0 PID: 2024 Comm: ip Tainted: G W 4.10.0-bisect-bond-v0.14 #37 [ 1281.193906] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014 [ 1281.193906] Call Trace: [ 1281.193912] dump_stack+0x63/0x89 [ 1281.193915] __warn+0xd1/0xf0 [ 1281.193917] warn_slowpath_null+0x1d/0x20 [ 1281.193918] __queue_delayed_work+0x13f/0x150 [ 1281.193920] queue_delayed_work_on+0x27/0x40 [ 1281.193929] bond_change_active_slave+0x25b/0x670 [bonding] [ 1281.193932] ? synchronize_rcu_expedited+0x27/0x30 [ 1281.193935] __bond_release_one+0x489/0x510 [bonding] [ 1281.193939] ? addrconf_notify+0x1b7/0xab0 [ 1281.193942] bond_netdev_event+0x2c5/0x2e0 [bonding] [ 1281.193944] ? netconsole_netdev_event+0x124/0x190 [netconsole] [ 1281.193947] notifier_call_chain+0x49/0x70 [ 1281.193948] raw_notifier_call_chain+0x16/0x20 [ 1281.193950] call_netdevice_notifiers_info+0x35/0x60 [ 1281.193951] rollback_registered_many+0x23b/0x3e0 [ 1281.193953] unregister_netdevice_many+0x24/0xd0 [ 1281.193955] rtnl_delete_link+0x3c/0x50 [ 1281.193956] rtnl_dellink+0x8d/0x1b0 [ 1281.193960] rtnetlink_rcv_msg+0x95/0x220 [ 1281.193962] ? __kmalloc_node_track_caller+0x35/0x280 [ 1281.193964] ? __netlink_lookup+0xf1/0x110 [ 1281.193966] ? rtnl_newlink+0x830/0x830 [ 1281.193967] netlink_rcv_skb+0xa7/0xc0 [ 1281.193969] rtnetlink_rcv+0x28/0x30 [ 1281.193970] netlink_unicast+0x15b/0x210 [ 1281.193971] netlink_sendmsg+0x319/0x390 [ 1281.193974] sock_sendmsg+0x38/0x50 [ 1281.193975] ___sys_sendmsg+0x25c/0x270 [ 1281.193978] ? mem_cgroup_commit_charge+0x76/0xf0 [ 1281.193981] ? page_add_new_anon_rmap+0x89/0xc0 [ 1281.193984] ? lru_cache_add_active_or_unevictable+0x35/0xb0 [ 1281.193985] ? __handle_mm_fault+0x4e9/0x1170 [ 1281.193987] __sys_sendmsg+0x45/0x80 [ 1281.193989] SyS_sendmsg+0x12/0x20 [ 1281.193991] do_syscall_64+0x6e/0x180 [ 1281.193993] entry_SYSCALL64_slow_path+0x25/0x25 [ 1281.193995] RIP: 0033:0x7f6ec122f5a0 [ 1281.193995] RSP: 002b:00007ffe69e89c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 1281.193997] RAX: ffffffffffffffda RBX: 00007ffe69e8dd60 RCX: 00007f6ec122f5a0 [ 1281.193997] RDX: 0000000000000000 RSI: 00007ffe69e89c90 RDI: 0000000000000003 [ 1281.193998] RBP: 00007ffe69e89c90 R08: 0000000000000000 R09: 0000000000000003 [ 1281.193999] R10: 00007ffe69e89a10 R11: 0000000000000246 R12: 0000000058f14b9f [ 1281.193999] R13: 0000000000000000 R14: 00000000006473a0 R15: 00007ffe69e8e450 [ 1281.194001] ---[ end trace 713a77486cbfbfa3 ]--- Fixes: 4493b81bea ("bonding: initialize work-queues during creation of bond") Reported-by: Joe Stringer <joe@ovn.org> Tested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b89f04c6 |
|
16-Apr-2017 |
Chonggang Li <chonggangli@google.com> |
bonding: deliver link-local packets with skb->dev set to link that packets arrived on Bonding driver changes the skb->dev to the bonding-master before passing the packet to stack for further processing. This, however does not make sense for the link-local packets and it loses "the link info" once its skb->dev is changed to bonding-master. This patch changes this behavior for link-local packets by not changing the skb->dev to the bonding-master and maintaining it as it is, i.e. the link on which the packet arrived. Signed-off-by: Chonggang Li <chonggangli@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fb9eb899 |
|
11-Apr-2017 |
Mahesh Bandewar <maheshb@google.com> |
bonding: handle link transition from FAIL to UP correctly When link transitions from LINK_FAIL to LINK_UP, the commit phase is not called. This leads to an erroneous state causing slave-link state to get stuck in "going down" state while its speed and duplex are perfectly fine. This issue is a side-effect of splitting link-set into propose and commit phases introduced by de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") This patch fixes these issues by calling commit phase whenever link state change is proposed. Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
faeeb317 |
|
04-Apr-2017 |
Jarod Wilson <jarod@redhat.com> |
bonding: attempt to better support longer hw addresses People are using bonding over Infiniband IPoIB connections, and who knows what else. Infiniband has a hardware address length of 20 octets (INFINIBAND_ALEN), and the network core defines a MAX_ADDR_LEN of 32. Various places in the bonding code are currently hard-wired to 6 octets (ETH_ALEN), such as the 3ad code, which I've left untouched here. Besides, only alb is currently possible on Infiniband links right now anyway, due to commit 1533e7731522, so the alb code is where most of the changes are. One major component of this change is the addition of a bond_hw_addr_copy function that takes a length argument, instead of using ether_addr_copy everywhere that hardware addresses need to be copied about. The other major component of this change is converting the bonding code from using struct sockaddr for address storage to struct sockaddr_storage, as the former has an address storage space of only 14, while the latter is 128 minus a few, which is necessary to support bonding over device with up to MAX_ADDR_LEN octet hardware addresses. Additionally, this probably fixes up some memory corruption issues with the current code, where it's possible to write an infiniband hardware address into a sockaddr declared on the stack. Lightly tested on a dual mlx4 IPoIB setup, which properly shows a 20-octet hardware address now: $ cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active) Primary Slave: mlx4_ib0 (primary_reselect always) Currently Active Slave: mlx4_ib0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 100 Down Delay (ms): 100 Slave Interface: mlx4_ib0 MII Status: up Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1d:67:01 Slave queue ID: 0 Slave Interface: mlx4_ib1 MII Status: up Speed: Unknown Duplex: Unknown Link Failure Count: 0 Permanent HW addr: 80:00:02:09:fe:80:00:00:00:00:00:01:e4:1d:2d:03:00:1d:67:02 Slave queue ID: 0 Also tested with a standard 1Gbps NIC bonding setup (with a mix of e1000 and e1000e cards), running LNST's bonding tests. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3f3c278c |
|
03-Apr-2017 |
Mahesh Bandewar <maheshb@google.com> |
bonding: fix active-backup transition Earlier patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") made an attempt to keep slave state consistent with speed and duplex settings. Unfortunately link-state transition is used to change the active link especially when used in conjunction with mii-mon. The above mentioned patch broke that logic. Also when speed and duplex settings for a link are updated during a link-event, the link-status should not be changed to invoke correct transition logic. This patch fixes this issue by moving the link-state update outside of the bond_update_speed_duplex() fn and to the places where this fn is called and update link-state selectively. Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
142c6594 |
|
29-Mar-2017 |
Eric Dumazet <edumazet@google.com> |
bonding: refine bond_fold_stats() wrap detection Some device drivers reset their stats at down/up events, possibly fooling bonding stats, since they operate with relative deltas. It is nearly not possible to fix drivers, since some of them compute the tx/rx counters based on per rx/tx queue stats, and the queues can be reconfigured (ethtool -L) between the down/up sequence. Lets avoid accumulating 'negative' values that render bonding stats useless. It is better to lose small deltas, assuming the bonding stats are fetched at a reasonable frequency. Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b5bf0f5b |
|
27-Mar-2017 |
Mahesh Bandewar <maheshb@google.com> |
bonding: correctly update link status during mii-commit phase bond_miimon_commit() marks the link UP after attempting to get the speed and duplex settings for the link. There is a possibility that bond_update_speed_duplex() could fail. This is another place where it could result into an inconsistent bonding link state. With this patch the link will be marked UP only if the speed and duplex values retrieved have sane values and processed further. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c4adfc82 |
|
27-Mar-2017 |
Mahesh Bandewar <maheshb@google.com> |
bonding: make speed, duplex setting consistent with link state bond_update_speed_duplex() retrieves speed and duplex settings. There is a possibility of failure in retrieving these values but caller has to assume it's always successful. This leads to having inconsistent slave link settings. If these (speed, duplex) values cannot be retrieved, then keeping the link UP causes problems. The updated bond_update_speed_duplex() returns 0 on success if it retrieves sane values for speed and duplex. On failure it returns 1 and marks the link down. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
de77ecd4 |
|
27-Mar-2017 |
Mahesh Bandewar <maheshb@google.com> |
bonding: improve link-status update in mii-monitoring The primary issue is that mii-inspect phase updates link-state and expects changes to be committed during the mii-commit phase. After the inspect phase if it fails to acquire rtnl-mutex, the commit phase (bond_mii_commit) doesn't get to run. This partially updated state stays and makes the internal-state inconsistent. e.g. setup bond0 => slaves: eth1, eth2 eth1 goes DOWN -> UP mii_monitor() mii-inspect() bond_set_slave_link_state(eth1, UP, DontNotify) rtnl_trylock() <- fails! Next mii-monitor round eth1: No change mii_monitor() mii-inspect() eth1->link == current-status (ethtool_ops->get_link) no-change-detected End result: eth1: Link = BOND_LINK_UP Speed = 0xfffff [SpeedUnknown] Duplex = 0xff [DuplexUnknown] This doesn't always happen but for some unlucky machines in a large set of machines it creates problems. The fix for this is to avoid making changes during inspect phase and postpone them until acquiring the rtnl-mutex / invoking commit phase. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dc9c4d0f |
|
08-Mar-2017 |
Mahesh Bandewar <maheshb@google.com> |
bonding: reduce scope of some global variables Many of the bond param variables are declared global while it's not really necessary for these variables to be global. So moving them to the location these are used. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8b426dc5 |
|
08-Mar-2017 |
Mahesh Bandewar <maheshb@google.com> |
bonding: remove hardcoded value Eliminate hard-coded value and use the default that is set. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4493b81b |
|
08-Mar-2017 |
Mahesh Bandewar <maheshb@google.com> |
bonding: initialize work-queues during creation of bond Initializing work-queues every time ifup operation performed is unnecessary and can be performed only once when the port is created. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d5e73f7b |
|
08-Mar-2017 |
Mahesh Bandewar <maheshb@google.com> |
bonding: restructure arp-monitor In preparation to move the work-queue initialization to port creation from current port_open phase. Work-queue initialization does not make sense every time we do 'ifup/ifdown'. So moving to port creation phase. Arp monitoring work depends on the bonding mode and that is not tied to the port creation and can change anytime during the life after port creation. So this restructuring allows us to move the initialization at creation without losing the ability to arm the correct work for the mode user has selected. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31c05415 |
|
02-Mar-2017 |
WANG Cong <xiyou.wangcong@gmail.com> |
bonding: use ETH_MAX_MTU as max mtu This restores the ability of setting bond device's mtu to 9000. Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra") Reported-by: daznis@gmail.com Reported-by: Brad Campbell <lists2009@fnarfbargle.com> Cc: Jarod Wilson <jarod@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a8eca326 |
|
06-Feb-2017 |
Ido Schimmel <idosch@mellanox.com> |
net: remove ndo_neigh_{construct, destroy} from stacked devices In commit 18bfb924f000 ("net: introduce default neigh_construct/destroy ndo calls for L2 upper devices") we added these ndos to stacked devices such as team and bond, so that calls will be propagated to mlxsw. However, previous commit removed the reliance on these ndos and no new users of these ndos have appeared since above mentioned commit. We can therefore safely remove this dead code. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d67576d |
|
02-Feb-2017 |
Zhu Yanjun <yanjun.zhu@oracle.com> |
bonding: Remove unnecessary returned value check The function bond_info_query alwarys returns 0. As such, in the function bond_do_ioctl, it is not necessary to check the returned value. So the interface type of the function bond_info_query is changed to void. The redundant check is removed. Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bc1f4470 |
|
06-Jan-2017 |
stephen hemminger <stephen@networkplumber.org> |
net: make ndo_get_stats64 a void function The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c7d03a00 |
|
16-Nov-2016 |
Alexey Dobriyan <adobriyan@gmail.com> |
netns: make struct pernet_operations::id unsigned int Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d46b6349 |
|
25-Oct-2016 |
Philippe Reynes <tremyfr@gmail.com> |
net: bonding: use new api ethtool_{get|set}_link_ksettings The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3208b20 |
|
17-Oct-2016 |
David Ahern <dsa@cumulusnetworks.com> |
net: bonding: Flip to the new dev walk API Convert alb_send_learning_packets and bond_has_this_ip to use the new netdev_walk_all_upper_dev_rcu API. In both cases this is just a code conversion; no functional change is intended. v2 - removed typecast of data and simplified bond_upper_dev_walk Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4ad41c1e |
|
03-Sep-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
bonding: quit messing with IOCTL The only remaining users are issuing SIOCGMIIPHY and SIOCGMIIREG, neither of which deals with userland pointers. Simply calling ->ndo_do_ioctl() is fine; no messing with set_fs() is needed. It used to mess with SIOCETHTOOL, which would've needed set_fs(), but that has been killed in "[NET] ethtool ops are the only way" 9 years ago... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
24b27fc4 |
|
01-Sep-2016 |
Mahesh Bandewar <maheshb@google.com> |
bonding: Fix bonding crash Following few steps will crash kernel - (a) Create bonding master > modprobe bonding miimon=50 (b) Create macvlan bridge on eth2 > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \ type macvlan (c) Now try adding eth2 into the bond > echo +eth2 > /sys/class/net/bond0/bonding/slaves <crash> Bonding does lots of things before checking if the device enslaved is busy or not. In this case when the notifier call-chain sends notifications, the bond_netdev_event() assumes that the rx_handler /rx_handler_data is registered while the bond_enslave() hasn't progressed far enough to register rx_handler for the new slave. This patch adds a rx_handler check that can be performed right at the beginning of the enslave code to avoid getting into this situation. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f9f225eb |
|
30-Aug-2016 |
Bhaktipriya Shridhar <bhaktipriya96@gmail.com> |
bonding: Remove deprecated create_singlethread_workqueue alloc_ordered_workqueue() with WQ_MEM_RECLAIM set, replaces deprecated create_singlethread_workqueue(). This is the identity conversion. The workqueue "wq" queues multiple work items viz &bond->mcast_work, &nnw->work, &bond->mii_work, &bond->arp_work, &bond->alb_work, &bond->mii_work, &bond->ad_work, &bond->slave_arr_work which require strict execution ordering. Hence, an ordered dedicated workqueue has been used. Since, it is a network driver, WQ_MEM_RECLAIM has been set to ensure forward progress under memory pressure. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0d039f33 |
|
09-Aug-2016 |
Zhu Yanjun <zyjzyj2000@gmail.com> |
bonding: fix the typo The message "803.ad" should be "802.3ad". Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1533e773 |
|
21-Jul-2016 |
Mark Bloch <markb@mellanox.com> |
net/bonding: Enforce active-backup policy for IPoIB bonds When using an IPoIB bond currently only active-backup mode is a valid use case and this commit strengthens it. Since commit 2ab82852a270 ("net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()") was introduced till 4.7-rc1, IPoIB didn't support the set_mac_address ndo, and hence the fail over mac policy always applied to IPoIB bonds. With the introduction of commit 492a7e67ff83 ("IB/IPoIB: Allow setting the device address"), that doesn't hold and practically IPoIB bonds are broken as of that. To fix it, lets go to fail over mac if the device doesn't support the ndo OR this is IPoIB device. As a by-product, this commit also prevents a stack corruption which occurred when trying to copy 20 bytes (IPoIB) device address to a sockaddr struct that has only 16 bytes of storage. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a30b0168 |
|
04-Jul-2016 |
Aviv Heller <avivh@mellanox.com> |
bonding: fix enslavement slave link notifications Currently, link notifications are not sent by bond_set_slave_link_state() upon enslavement if the slave is enslaved when up. This happens because slave->link default init value is 0, which is the same as BOND_LINK_UP, resulting in bond_set_slave_link_state() ignoring this transition. This patch sets the default value of slave->link to BOND_LINK_NOCHANGE, assuring it will count as a state transition and thus trigger notification logic. Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
18bfb924 |
|
05-Jul-2016 |
Jiri Pirko <jiri@mellanox.com> |
net: introduce default neigh_construct/destroy ndo calls for L2 upper devices L2 upper device needs to propagate neigh_construct/destroy calls down to lower devices. Do this by defining default ndo functions and use them in team, bond, bridge and vlan. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d3fff6c4 |
|
09-Jun-2016 |
Eric Dumazet <edumazet@google.com> |
net: add netdev_lockdep_set_classes() helper It is time to add netdev_lockdep_set_classes() helper so that lockdep annotations per device type are easier to manage. This removes a lot of copies and missing annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f9eb8aea |
|
06-Jun-2016 |
Eric Dumazet <edumazet@google.com> |
net_sched: transform qdisc running bit into a seqcount Instead of using a single bit (__QDISC___STATE_RUNNING) in sch->__state, use a seqcount. This adds lockdep support, but more importantly it will allow us to sample qdisc/class statistics without having to grab qdisc root lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fe30937b |
|
17-Mar-2016 |
Eric Dumazet <edumazet@google.com> |
bonding: fix bond_get_stats() bond_get_stats() can be called from rtnetlink (with RTNL held) or from /proc/net/dev seq handler (with RCU held) The logic added in commit 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") kind of assumed only one cpu could run there. If multiple threads are reading /proc/net/dev, stats can be really messed up after a while. A second problem is that some fields are 32bit, so we need to properly handle the wrap around problem. Given that RTNL is not always held, we need to use bond_for_each_slave_rcu(). Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1098cee6 |
|
16-Mar-2016 |
Zhang Shengju <zhangshengju@cmss.chinamobile.com> |
bonding: remove duplicate set of flag IFF_MULTICAST Remove unnecessary set of flag IFF_MULTICAST, since ether_setup already does this. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9856909c |
|
24-Feb-2016 |
David Decotigny <decot@googlers.com> |
net: bonding: use __ethtool_get_ksettings Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
266b495f |
|
08-Feb-2016 |
Jay Vosburgh <jay.vosburgh@canonical.com> |
bonding: don't use stale speed and duplex information There is presently a race condition between the bonding periodic link monitor and the updating of a slave's speed and duplex. The former occurs on a periodic basis, and the latter in response to a driver's calling of netif_carrier_on. It is possible for the periodic monitor to run between the driver call of netif_carrier_on and the receipt of the NETDEV_CHANGE event that causes bonding to update the slave's speed and duplex. This manifests most notably as a report that a slave is up and "0 Mbps full duplex" after enslavement, but in principle could report an incorrect speed and duplex after any link up event if the device comes up with a different speed or duplex. This affects the 802.3ad aggregator selection, as the speed and duplex are selection criteria. This is fixed by updating the speed and duplex in the periodic monitor, prior to using that information. This was done historically in bonding, but the call to bond_update_speed_duplex was removed in commit 876254ae2758 ("bonding: don't call update_speed_duplex() under spinlocks"), as it might sleep under lock. Later, the locking was changed to only hold RTNL, and so after commit 876254ae2758 ("bonding: don't call update_speed_duplex() under spinlocks") this call is again safe. Tested-by: "Tantilov, Emil S" <emil.s.tantilov@intel.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: dingtianhong <dingtianhong@huawei.com> Fixes: 876254ae2758 ("bonding: don't call update_speed_duplex() under spinlocks") Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
21a75f09 |
|
02-Feb-2016 |
Jay Vosburgh <jay.vosburgh@canonical.com> |
bonding: Fix ARP monitor validation The current logic in bond_arp_rcv will accept an incoming ARP for validation if (a) the receiving slave is either "active" (which includes the currently active slave, or the current ARP slave) or, (b) there is a currently active slave, and it has received an ARP since it became active. For case (b), the receiving slave isn't the currently active slave, and is receiving the original broadcast ARP request, not an ARP reply from the target. This logic can fail if there is no currently active slave. In this situation, the ARP probe logic cycles through all slaves, assigning each in turn as the "current_arp_slave" for one arp_interval, then setting that one as "active," and sending an ARP probe from that slave. The current logic expects the ARP reply to arrive on the sending current_arp_slave, however, due to switch FDB updating delays, the reply may be directed to another slave. This can arise if the bonding slaves and switch are working, but the ARP target is not responding. When the ARP target recovers, a condition may result wherein the ARP target host replies faster than the switch can update its forwarding table, causing each ARP reply to be sent to the previous current_arp_slave. This will never pass the logic in bond_arp_rcv, as neither of the above conditions (a) or (b) are met. Some experimentation on a LAN shows ARP reply round trips in the 200 usec range, but my available switches never update their FDB in less than 4000 usec. This patch changes the logic in bond_arp_rcv to additionally accept an ARP reply for validation on any slave if there is a current ARP slave and it sent an ARP probe during the previous arp_interval. Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works") Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1e2a8868 |
|
09-Feb-2016 |
Zhang Shengju <zhangshengju@cmss.chinamobile.com> |
bonding: use return instead of goto Replace 'goto' with 'return' to remove unnecessary check at label: err_undo_flags. The reason is that 'err_undo_flags' do two things for the first slave device: 1.revert bond mac address if it is set by the slave device. 2.revert bond device type if it's not ARPHRD_ETHER. It's not necessary for the following three places, they changed neither bond mac address nor type. It's straightforward to return directly. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d66bd905 |
|
02-Feb-2016 |
Zhang Shengju <zhangshengju@cmss.chinamobile.com> |
bonding: trivial: style fixes remove some redudant brackets, use sizeof(*) instead of sizeof(struct x). Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c6140a29 |
|
02-Feb-2016 |
Zhang Shengju <zhangshengju@cmss.chinamobile.com> |
bonding: add slave device name for debug netdev_dbg() will add bond device name, it will be helpful if we print slave device name. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f344b0d9 |
|
01-Feb-2016 |
Jarod Wilson <jarod@redhat.com> |
bond: track sum of rx_nohandler for all slaves Sample output with this set applied for an active-backup bond: $ cat /sys/devices/virtual/net/bond0/lower_p7p1/statistics/rx_nohandler 16568 $ cat /sys/devices/virtual/net/bond0/lower_p5p2/statistics/rx_nohandler 16583 $ cat /sys/devices/virtual/net/bond0/statistics/rx_nohandler 33151 CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
03d84a5f |
|
11-Jan-2016 |
Karl Heiss <kheiss@gmail.com> |
bonding: Prevent IPv6 link local address on enslaved devices Commit 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") undoes the fix provided by commit c2edacf80e15 ("bonding / ipv6: no addrconf for slaves separately from master") by effectively setting the slave flag after the slave has been opened. If the slave comes up quickly enough, it will go through the IPv6 addrconf before the slave flag has been set and will get a link local IPv6 address. In order to ensure that addrconf knows to ignore the slave devices on state change, set IFF_SLAVE before dev_open() during bonding enslavement. Fixes: 1f718f0f4f97 ("bonding: populate neighbour's private on enslave") Signed-off-by: Karl Heiss <kheiss@gmail.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a188222b |
|
14-Dec-2015 |
Tom Herbert <tom@herbertland.com> |
net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK The name NETIF_F_ALL_CSUM is a misnomer. This does not correspond to the set of features for offloading all checksums. This is a mask of the checksum offload related features bits. It is incorrect to set both NETIF_F_HW_CSUM and NETIF_F_IP_CSUM or NETIF_F_IPV6 at the same time for features of a device. This patch: - Changes instances of NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK (where NETIF_F_ALL_CSUM is being used as a mask). - Changes bonding, sfc/efx, ipvlan, macvlan, vlan, and team drivers to use NEITF_F_HW_CSUM in features list instead of NETIF_F_ALL_CSUM. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ce3ea1c7 |
|
03-Dec-2015 |
yzhu1 <Yanjun.Zhu@windriver.com> |
net: bonding: remove redudant brackets It is not necessary to use two brackets. As such, the redudant brackets are removed. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
57beaca8 |
|
02-Dec-2015 |
Jiri Pirko <jiri@mellanox.com> |
bonding: set inactive flags on release Be correct and symmetric to enslave and set inactive flags during release. That gives LAG offload drivers - lower state change listeners - possibility to do proper cleanup. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f7c7eb7f |
|
02-Dec-2015 |
Jiri Pirko <jiri@mellanox.com> |
bonding: implement lower state change propagation Let netdev notifier listeners know about link and slave state change. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d397061 |
|
02-Dec-2015 |
Jiri Pirko <jiri@mellanox.com> |
bonding: allow notifications for bond_set_slave_link_state Similar to state notifications. We allow caller to indicate if the notification should happen now or later, depending on if he holds rtnl mutex or not. Introduce bond_slave_link_notify function (similar to bond_slave_state_notify) which is later on called with rtnl mutex and goes over slaves and executes delayed notification. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
41f0b049 |
|
02-Dec-2015 |
Jiri Pirko <jiri@mellanox.com> |
bonding: fill-up LAG changeupper info struct and pass it along Initialize netdev_lag_upper_info structure by TX type according to current bonding mode and pass it along via netdev_master_upper_dev_link. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
29bf24af |
|
02-Dec-2015 |
Jiri Pirko <jiri@mellanox.com> |
net: add possibility to pass information about upper device via notifier Sometimes the drivers and other code would find it handy to know some internal information about upper device being changed. So allow upper-code to pass information down to notifier listeners during linking. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6dffb044 |
|
02-Dec-2015 |
Jiri Pirko <jiri@mellanox.com> |
net: propagate upper priv via netdev_master_upper_dev_link Eliminate netdev_master_upper_dev_link_private and pass priv directly as a parameter of netdev_master_upper_dev_link. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
40baec22 |
|
06-Nov-2015 |
Jay Vosburgh <jay.vosburgh@canonical.com> |
bonding: fix panic on non-ARPHRD_ETHER enslave failure Since commit 7d5cd2ce529b, when bond_enslave fails on devices that are not ARPHRD_ETHER, if needed, it resets the bonding device back to ARPHRD_ETHER by calling ether_setup. Unfortunately, ether_setup clobbers dev->flags, clearing IFF_UP if the bond device is up, leaving it in a quasi-down state without having actually gone through dev_close. For bonding, if any periodic work queue items are active (miimon, arp_interval, etc), those will remain running, as they are stopped by bond_close. At this point, if the bonding module is unloaded or the bond is deleted, the system will panic when the work function is called. This panic is resolved by calling dev_close on the bond itself prior to calling ether_setup. Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52bc6716 |
|
31-Oct-2015 |
Mahesh Bandewar <maheshb@google.com> |
bonding: simplify / unify event handling code for 3ad mode. Old logic of updating state-machine is not required since ad_update_actor_keys() does it implicitly. The only loss is the notification differentiation between speed vs. duplex change. Now only one unified notification is printed. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e87eb405 |
|
15-Oct-2015 |
Eric Dumazet <edumazet@google.com> |
bonding: support encapsulated ipv6 TSO If using a sixtofour device on top of a bonding device, skb segmentation of TCP traffic is done right before calling bonding xmit, because bonding only enables TSO for IPv4. This patch improves single flow performance by about 120 % on my hosts, because segmentation is deferred right before calling slave xmit. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4b1b865e |
|
15-Sep-2015 |
Eric Dumazet <edumazet@google.com> |
bonding: use l4 hash if available If skb carries a l4 hash, no need to perform a flow dissection. Performance is slightly better : lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.39012e+06 lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.39393e+06 lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.39988e+06 After patch : lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.43579e+06 lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.44304e+06 lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100 2.44312e+06 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cd79a238 |
|
01-Sep-2015 |
Tom Herbert <tom@herbertland.com> |
flow_dissector: Add flags argument to skb_flow_dissector functions The flags argument will allow control of the dissection process (for instance whether to parse beyond L3). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b0d4943e |
|
28-Aug-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bonding: fix bond_poll_controller bh_enable warning The problem is rcu_read_unlock_bh() which triggers a warning when irqs are disabled. ndo_poll_controller should run with irqs disabled always so we can drop the rcu_read_lock_bh. [ 98.502922] bond0: making interface eth1 the new active one [ 98.503039] ------------[ cut here ]------------ [ 98.503039] WARNING: CPU: 0 PID: 1744 at kernel/softirq.c:150 __local_bh_enable_ip+0x96/0xc0() [ 98.503039] Modules linked in: bonding(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netconsole ppdev joydev parport_pc serio_raw parport i2c_piix4 video acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc virtio_net e1000 ata_generic pcnet32 mii virtio_pci virtio_ring virtio pata_acpi [ 98.503039] CPU: 0 PID: 1744 Comm: ifenslave Tainted: G OE 4.2.0-rc7+ #56 [ 98.503039] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 98.503039] 0000000000000000 00000000e96ba230 ffff880020c236b8 ffffffff8183f105 [ 98.503039] 0000000000000000 0000000000000000 ffff880020c236f8 ffffffff810a9496 [ 98.503039] ffff88002ea99e08 0000000000000200 ffffffffa02a8e06 ffff88002ea99e08 [ 98.503039] Call Trace: [ 98.503039] [<ffffffff8183f105>] dump_stack+0x4c/0x65 [ 98.503039] [<ffffffff810a9496>] warn_slowpath_common+0x86/0xc0 [ 98.503039] [<ffffffffa02a8e06>] ? bond_poll_controller+0x146/0x250 [bonding] [ 98.503039] [<ffffffff810a95ca>] warn_slowpath_null+0x1a/0x20 [ 98.503039] [<ffffffff810ae376>] __local_bh_enable_ip+0x96/0xc0 [ 98.503039] [<ffffffffa02a8e2f>] bond_poll_controller+0x16f/0x250 [bonding] [ 98.503039] [<ffffffffa02a8cf3>] ? bond_poll_controller+0x33/0x250 [bonding] [ 98.503039] [<ffffffff810feaed>] ? trace_hardirqs_off+0xd/0x10 [ 98.503039] [<ffffffff81848afb>] ? _raw_spin_unlock_irqrestore+0x5b/0x60 [ 98.503039] [<ffffffff816ec48e>] netpoll_poll_dev+0x6e/0x350 [ 98.503039] [<ffffffff816eb977>] ? netpoll_start_xmit+0x137/0x1d0 [ 98.503039] [<ffffffff816b2e8b>] ? __alloc_skb+0x5b/0x210 [ 98.503039] [<ffffffff816ec89d>] netpoll_send_skb_on_dev+0x12d/0x2a0 [ 98.503039] [<ffffffff816eccde>] netpoll_send_udp+0x2ce/0x430 [ 98.503039] [<ffffffffa0190850>] write_msg+0xb0/0xf0 [netconsole] [ 98.503039] [<ffffffff81116b63>] call_console_drivers.constprop.25+0x133/0x260 [ 98.503039] [<ffffffff81117934>] console_unlock+0x2f4/0x580 [ 98.503039] [<ffffffff81117ea5>] ? vprintk_emit+0x2e5/0x630 [ 98.503039] [<ffffffff81117ee5>] vprintk_emit+0x325/0x630 [ 98.503039] [<ffffffff81118379>] vprintk_default+0x29/0x40 [ 98.503039] [<ffffffff8183de4f>] printk+0x55/0x6b [ 98.503039] [<ffffffff816c754c>] __netdev_printk+0x16c/0x260 [ 98.503039] [<ffffffff816c7a12>] netdev_info+0x62/0x80 [ 98.503039] [<ffffffffa02ab464>] bond_change_active_slave+0x134/0x6a0 [bonding] [ 98.503039] [<ffffffffa02aba95>] bond_select_active_slave+0xc5/0x310 [bonding] [ 98.503039] [<ffffffffa02aeb78>] bond_enslave+0x1088/0x10c0 [bonding] [ 98.503039] [<ffffffffa02af46b>] bond_do_ioctl+0x37b/0x400 [bonding] [ 98.503039] [<ffffffff81101d8d>] ? trace_hardirqs_on+0xd/0x10 [ 98.503039] [<ffffffff816dc437>] ? rtnl_lock+0x17/0x20 [ 98.503039] [<ffffffff816e5fd1>] dev_ifsioc+0x331/0x3e0 [ 98.503039] [<ffffffff816e62dc>] dev_ioctl+0xec/0x6c0 [ 98.503039] [<ffffffff816a6c6a>] sock_do_ioctl+0x4a/0x60 [ 98.503039] [<ffffffff816a7300>] sock_ioctl+0x1c0/0x250 [ 98.503039] [<ffffffff81271bfe>] do_vfs_ioctl+0x2ee/0x540 [ 98.503039] [<ffffffff810fd943>] ? up_read+0x23/0x40 [ 98.503039] [<ffffffff81070993>] ? __do_page_fault+0x1d3/0x420 [ 98.503039] [<ffffffff8127e246>] ? __fget_light+0x66/0x90 [ 98.503039] [<ffffffff81271ec9>] SyS_ioctl+0x79/0x90 [ 98.503039] [<ffffffff8184936e>] entry_SYSCALL_64_fastpath+0x12/0x76 [ 98.503039] ---[ end trace 00cfa804b0670051 ]--- Fixes: 616f45416ca0 ("bonding: implement bond_poll_controller()") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1e6f20ca |
|
18-Aug-2015 |
Phil Sutter <phil@nwl.cc> |
net: bonding: convert to using IFF_NO_QUEUE Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b02e3e94 |
|
11-Aug-2015 |
Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> |
bonding: Gratuitous ARP gets dropped when first slave added When the first slave is added (such as during bootup) the first gratuitous ARP gets dropped. We don't see this drop during a failover. The packet gets dropped in qdisc (noop_enqueue). The fix is to delay the sending of gratuitous ARPs till the bond dev's carrier is present. It can also be worked around by setting num_grat_arp to more than 1. Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a951bc1e |
|
16-Jul-2015 |
dingtianhong <dingtianhong@huawei.com> |
bonding: correct the MAC address for "follow" fail_over_mac policy The "follow" fail_over_mac policy is useful for multiport devices that either become confused or incur a performance penalty when multiple ports are programmed with the same MAC address, but the same MAC address still may happened by this steps for this policy: 1) echo +eth0 > /sys/class/net/bond0/bonding/slaves bond0 has the same mac address with eth0, it is MAC1. 2) echo +eth1 > /sys/class/net/bond0/bonding/slaves eth1 is backup, eth1 has MAC2. 3) ifconfig eth0 down eth1 became active slave, bond will swap MAC for eth0 and eth1, so eth1 has MAC1, and eth0 has MAC2. 4) ifconfig eth1 down there is no active slave, and eth1 still has MAC1, eth2 has MAC2. 5) ifconfig eth0 up the eth0 became active slave again, the bond set eth0 to MAC1. Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same MAC address, it will break this policy for ACTIVE_BACKUP mode. This patch will fix this problem by finding the old active slave and swap them MAC address before change active slave. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7d5cd2ce |
|
15-Jul-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bonding: correctly handle bonding type change on enslave failure If the bond is enslaving a device with different type it will be setup by it, but if after being setup the enslave fails the bond doesn't switch back its type and also keeps pointers to foreign structures that can be long gone. Thus revert back any type changes if the enslave failed and the bond had to change its type. Example: Before patch: $ echo lo > bond0/bonding/slaves -bash: echo: write error: Cannot assign requested address $ ip l sh bond0 20: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default link/loopback 16:54:78:34:bd:41 brd 00:00:00:00:00:00 $ echo +eth1 > bond0/bonding/slaves $ ip l sh bond0 20: bond0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff (notice the MASTER flag is gone) After patch: $ echo lo > bond0/bonding/slaves -bash: echo: write error: Cannot assign requested address $ ip l sh bond0 21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 6e:66:94:f6:07:fc brd ff:ff:ff:ff:ff:ff $ echo +eth1 > bond0/bonding/slaves $ ip l sh bond0 21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: e36b9d16c6a6 ("bonding: clean muticast addresses when device changes type") Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
06f6d109 |
|
15-Jul-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bonding: fix destruction of bond with devices different from arphrd_ether When the bonding is being unloaded and the netdevice notifier is unregistered it executes NETDEV_UNREGISTER for each device which should remove the bond's proc entry but if the device enslaved is not of ARPHRD_ETHER type and is in front of the bonding, it may execute bond_release_and_destroy() first which would release the last slave and destroy the bond device leaving the proc entry and thus we will get the following error (with dynamic debug on for bond_netdev_event to see the events order): [ 908.963051] eql: event: 9 [ 908.963052] eql: IFF_SLAVE [ 908.963054] eql: event: 2 [ 908.963056] eql: IFF_SLAVE [ 908.963058] eql: event: 6 [ 908.963059] eql: IFF_SLAVE [ 908.963110] bond0: Releasing active interface eql [ 908.976168] bond0: Destroying bond bond0 [ 908.976266] bond0 (unregistering): Released all slaves [ 908.984097] ------------[ cut here ]------------ [ 908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575 remove_proc_entry+0x112/0x160() [ 908.984110] remove_proc_entry: removing non-empty directory 'net/bonding', leaking at least 'bond0' [ 908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore usb_common [last unloaded: bonding] [ 908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G W O 4.2.0-rc2+ #8 [ 908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 908.984172] 0000000000000000 ffffffff81732d41 ffffffff81525b34 ffff8800358dfda8 [ 908.984175] ffffffff8106c521 ffff88003595af78 ffff88003595af40 ffff88003e3a4280 [ 908.984178] ffffffffa058d040 0000000000000000 ffffffff8106c59a ffffffff8172ebd0 [ 908.984181] Call Trace: [ 908.984188] [<ffffffff81525b34>] ? dump_stack+0x40/0x50 [ 908.984193] [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0 [ 908.984196] [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50 [ 908.984199] [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160 [ 908.984205] [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30 [bonding] [ 908.984208] [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding] [ 908.984217] [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70 [ 908.984225] [<ffffffff8142f52d>] ? unregister_pernet_operations+0x8d/0xd0 [ 908.984228] [<ffffffff8142f58d>] ? unregister_pernet_subsys+0x1d/0x30 [ 908.984232] [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding] [ 908.984236] [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250 [ 908.984241] [<ffffffff81086f99>] ? task_work_run+0x89/0xc0 [ 908.984244] [<ffffffff8152b732>] ? entry_SYSCALL_64_fastpath+0x16/0x75 [ 908.984247] ---[ end trace 7c006ed4abbef24b ]--- Thus remove the proc entry manually if bond_release_and_destroy() is used. Because of the checks in bond_remove_proc_entry() it's not a problem for a bond device to change namespaces (the bug fixed by the Fixes commit) but since commit f9399814927ad ("bonding: Don't allow bond devices to change network namespaces.") that can't happen anyway. Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from the netdev events") Tested-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
22f94e62 |
|
15-Jul-2015 |
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> |
bonding: trivial: remove unused variables Get rid of these: drivers/net/bonding//bond_main.c: In function ‘bond_update_slave_arr’: drivers/net/bonding//bond_main.c:3754:6: warning: variable ‘slaves_in_agg’ set but not used [-Wunused-but-set-variable] int slaves_in_agg; ^ CC [M] drivers/net/bonding//bond_3ad.o drivers/net/bonding//bond_3ad.c: In function ‘ad_marker_response_received’: drivers/net/bonding//bond_3ad.c:1870:61: warning: parameter ‘marker’ set but not used [-Wunused-but-set-parameter] static void ad_marker_response_received(struct bond_marker *marker, ^ drivers/net/bonding//bond_3ad.c:1871:19: warning: parameter ‘port’ set but not used [-Wunused-but-set-parameter] struct port *port) ^ Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b5a983f3 |
|
07-Jul-2015 |
Mazhar Rana <mazhar.rana@cyberoam.com> |
bonding: "primary_reselect" with "failure" is not working properly When "primary_reselect" is set to "failure", primary interface should not become active until current active slave is down. But if we set first member of bond device as a "primary" interface and "primary_reselect" is set to "failure" then whenever primary interface's link get back(up) it become active slave even if current active slave is still up. With this patch, "bond_find_best_slave" will not traverse members if primary interface is not candidate for failover/reselection and current active slave is still up. Signed-off-by: Mazhar Rana <mazhar.rana@cyberoam.com> Signed-off-by: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c3f83241 |
|
04-Jun-2015 |
Tom Herbert <tom@herbertland.com> |
net: Add full IPv6 addresses to flow_keys This patch adds full IPv6 addresses into flow_keys and uses them as input to the flow hash function. The implementation supports either IPv4 or IPv6 addresses in a union, and selector is used to determine how may words to input to jhash2. We also add flow_get_u32_dst and flow_get_u32_src functions which are used to get a u32 representation of the source and destination addresses. For IPv6, ipv6_addr_hash is called. These functions retain getting the legacy values of src and dst in flow_keys. With this patch, Ethertype and IP protocol are now included in the flow hash input. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
45d4122c |
|
13-May-2015 |
Samudrala, Sridhar <sridhar.samudrala@intel.com> |
switchdev: add support for fdb add/del/dump via switchdev_port_obj ops. - introduce port fdb obj and generic switchdev_port_fdb_add/del/dump() - use switchdev_port_fdb_add/del/dump in rocker/team/bonding ndo ops. - add support for fdb obj in switchdev_port_obj_add/del/dump() - switch rocker to implement fdb ops via switchdev_ops v3: updated to sync with named union changes. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
06635a35 |
|
12-May-2015 |
Jiri Pirko <jiri@resnulli.us> |
flow_dissect: use programable dissector in skb_flow_dissect and friends Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1bd758eb |
|
12-May-2015 |
Jiri Pirko <jiri@resnulli.us> |
net: change name of flow_dissector header to match the .c file name add couple of empty lines on the way. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7889cbee |
|
10-May-2015 |
Scott Feldman <sfeldma@gmail.com> |
switchdev: remove NETIF_F_HW_SWITCH_OFFLOAD feature flag Roopa said remove the feature flag for this series and she'll work on bringing it back if needed at a later date. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
85fdb956 |
|
10-May-2015 |
Scott Feldman <sfeldma@gmail.com> |
switchdev: cut over to new switchdev_port_bridge_getlink Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
54ba5a0b |
|
10-May-2015 |
Scott Feldman <sfeldma@gmail.com> |
switchdev: cut over to new switchdev_port_bridge_dellink Rocker, bonding and team and switch over to the new switchdev_port_bridge_dellink to avoid duplicating code in each driver. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fc8f40d8 |
|
10-May-2015 |
Scott Feldman <sfeldma@gmail.com> |
switchdev: cut over to new switchdev_port_bridge_setlink Rocker, bonding, and team can now use the switchdev bridge setlink to parse raw netlink; no need to duplicate this code in each driver. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ebb9a03a |
|
10-May-2015 |
Jiri Pirko <jiri@resnulli.us> |
switchdev: s/netdev_switch_/switchdev_/ and s/NETDEV_SWITCH_/SWITCHDEV_/ Turned out that "switchdev" sticks. So just unify all related terms to use this prefix. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d22a5fc0 |
|
09-May-2015 |
Mahesh Bandewar <maheshb@google.com> |
bonding: Implement user key part of port_key in an AD system. The port key has three components - user-key, speed-part, and duplex-part. The LSBit is for the duplex-part, next 5 bits are for the speed while the remaining 10 bits are the user defined key bits. Get these 10 bits from the user-space (through the SysFs interface) and use it to form the admin port-key. Allowed range for the user-key is 0 - 1023 (10 bits). If it is not provided then use zero for the user-key-bits (default). It can set using following example code - # modprobe bonding mode=4 # usr_port_key=$(( RANDOM & 0x3FF )) # echo $usr_port_key > /sys/class/net/bond0/bonding/ad_user_port_key # echo +eth1 > /sys/class/net/bond0/bonding/slaves ... # ip link set bond0 up Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> [jt: * fixed up style issues reported by checkpatch * fixed up context from change in ad_actor_sys_prio patch] Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
74514957 |
|
09-May-2015 |
Mahesh Bandewar <maheshb@google.com> |
bonding: Allow userspace to set actors' macaddr in an AD-system. In an AD system, the communication between actor and partner is the business between these two entities. In the current setup anyone on the same L2 can "guess" the LACPDU contents and then possibly send the spoofed LACPDUs and trick the partner causing connectivity issues for the AD system. This patch allows to use a random mac-address obscuring it's identity making it harder for someone in the L2 is do the same thing. This patch allows user-space to choose the mac-address for the AD-system. This mac-address can not be NULL or a Multicast. If the mac-address is set from user-space; kernel will honor it and will not overwrite it. In the absence (value from user space); the logic will default to using the masters' mac as the mac-address for the AD-system. It can be set using example code below - # modprobe bonding mode=4 # sys_mac_addr=$(printf '%02x:%02x:%02x:%02x:%02x:%02x' \ $(( (RANDOM & 0xFE) | 0x02 )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF )) \ $(( RANDOM & 0xFF ))) # echo $sys_mac_addr > /sys/class/net/bond0/bonding/ad_actor_system # echo +eth1 > /sys/class/net/bond0/bonding/slaves ... # ip link set bond0 up Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> [jt: fixed up style issues reported by checkpatch] Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6791e466 |
|
09-May-2015 |
Mahesh Bandewar <maheshb@google.com> |
bonding: Allow userspace to set actors' system_priority in AD system This patch allows user to randomize the system-priority in an ad-system. The allowed range is 1 - 0xFFFF while default value is 0xFFFF. If user does not specify this value, the system defaults to 0xFFFF, which is what it was before this patch. Following example code could set the value - # modprobe bonding mode=4 # sys_prio=$(( 1 + RANDOM + RANDOM )) # echo $sys_prio > /sys/class/net/bond0/bonding/ad_actor_sys_prio # echo +eth1 > /sys/class/net/bond0/bonding/slaves ... # ip link set bond0 up Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> [jt: * fixed up style issues reported by checkpatch * changed how the default value is set in bond_check_params(), this makes the default consistent between what gets set for a new bond and what the default is claimed to be in the bonding options.] Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e913fb27 |
|
29-Apr-2015 |
Pai <vpai@akamai.com> |
net: Fix Kernel Panic in bonding driver debugfs file: rlb_hash_table This patch fixes a Kernel Panic in bonding driver debugfs file: rlb_hash_table. $> modprobe bonding mode=6 $> cat /sys/kernel/debug/bonding/bond0/rlb_hash_table This will crash the kernel. The struct alb_bond_info is initialized only when the bonding interface is initialized (ip link set bond0 up) and not at the time it is allocated. If we try to read the table before that, it'll result in a kernel panic. The patch applies against both net and net-next Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
73b5a6f2 |
|
26-Apr-2015 |
Matan Barak <matanb@mellanox.com> |
net/bonding: Make DRV macros private The bonding modules currently defines four macros with general names that pollute the global namespace: DRV_VERSION DRV_RELDATE DRV_NAME DRV_DESCRIPTION Fixing that by defining a private bonding_priv.h header files which includes those defines. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f5e2dc5d |
|
29-Mar-2015 |
Anton Nayshtut <anton@swortex.com> |
bonding: Bonding Overriding Configuration logic restored. Before commit 3900f29021f0bc7fe9815aa32f1a993b7dfdd402 ("bonding: slight optimizztion for bond_slave_override()") the override logic was to send packets with non-zero queue_id through the slave with corresponding queue_id, under two conditions only - if the slave can transmit and it's up. The above mentioned commit changed this logic by introducing an additional condition - whether the bond is active (indirectly, using the slave_can_tx and later - bond_is_active_slave), that prevents the user from implementing more complex policies according to the Documentation/networking/bonding.txt. Signed-off-by: Anton Nayshtut <anton@swortex.com> Signed-off-by: Alexey Bogoslavsky <alexey@swortex.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4847f049 |
|
26-Mar-2015 |
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> |
bonding: Don't segment multiple tagged packets on bonding device Bonding devices don't need to segment multiple tagged packets since their slaves can segment them. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
616f4541 |
|
04-Mar-2015 |
Mahesh Bandewar <maheshb@google.com> |
bonding: implement bond_poll_controller() This patches implements the poll_controller support for all bonding driver. If the slaves have poll_controller net_op defined, this implementation calls them. This is mode agnostic implementation and iterates through all slaves (based on mode) and calls respective handler. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
950ddcb1 |
|
19-Feb-2015 |
Mahesh Bandewar <maheshb@google.com> |
bonding: simple code refactor Remove duplicate code. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
92e584fe |
|
08-Feb-2015 |
Moni Shoua <monis@mellanox.com> |
net/bonding: Fix potential bad memory access during bonding events When queuing work to send the NETDEV_BONDING_INFO netdev event, it's possible that when the work is executed, the pointer to the slave becomes invalid. This can happen if between queuing the event and the execution of the work, the net-device was un-ensvaled and re-enslaved. Fix that by queuing a work with the data of the slave instead of the slave structure. Fixes: 69e6113343cf ('net/bonding: Notify state change on slaves') Reported-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69e61133 |
|
03-Feb-2015 |
Moni Shoua <monis@mellanox.com> |
net/bonding: Notify state change on slaves Use notifier chain to dispatch an event upon a change in slave state. Event is dispatched with slave specific info. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69a2338e |
|
03-Feb-2015 |
Moni Shoua <monis@mellanox.com> |
net/bonding: Move slave state changes to a helper function Move slave state changes to a helper function, this is a pre-step for adding functionality of dispatching an event when this helper is called. This commit doesn't add new functionality. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c158cba3 |
|
29-Jan-2015 |
Roopa Prabhu <roopa@cumulusnetworks.com> |
bonding: handle NETIF_F_HW_SWITCH_OFFLOAD flag and add ndo_bridge_setlink/dellink handlers We want bond to pick up the offload flag if any of its slaves have it. NETIF_F_HW_SWITCH_OFFLOAD flag is added to the mask, so that netdev_increment_features does not ignore it. This also adds ndo_bridge_setlink and ndo_bridge_dellink handlers. These currently point to the default handlers provided by the switchdev api. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
30369104 |
|
25-Jan-2015 |
Jonathan Toppins <jtoppins@cumulusnetworks.com> |
bonding: cleanup and remove dead code fix sparse warning about non-static function drivers/net/bonding/bond_main.c:3737:5: warning: symbol 'bond_3ad_xor_xmit' was not declared. Should it be static? Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8bbe71a5 |
|
25-Jan-2015 |
Wilson Kok <wkok@cumulusnetworks.com> |
bonding: fix bond_open() don't always set slave active flag Mode 802.3ad, fix incorrect bond slave active state when slave is not in active aggregator. During bond_open(), the bonding driver always sets the slave active flag to true if the bond is not in active-backup, alb, or tlb modes. Bonding should let the aggregator selection logic set the active flag when in 802.3ad mode. Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2477bc9a |
|
25-Jan-2015 |
Jonathan Toppins <jtoppins@cumulusnetworks.com> |
bonding: update bond carrier state when min_links option changes Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
24f87d4c |
|
26-Jan-2015 |
Eric Dumazet <edumazet@google.com> |
bonding: handle more gso types In commit 5a7baa78851b ("bonding: Advertize vxlan offload features when supported"), Or Gerlitz added support conditional vxlan offload. In this patch I also add support for all kind of tunnels, but we allow a bonding device to not require segmentation, as it is always better to make this segmentation at the very last stage, if a particular slave device requires it. Tested: Setup a GRE tunnel, on a physical NIC not having tx-gre-segmentation. Results on bnx2x are even better, as we no longer have to segment in software. ethtool -K bond0 tx-gre-segmentation off super_netperf 50 --google-pacing-rate 30000000 -H 10.7.8.152 -l 15 7538.32 ethtool -K bond0 tx-gre-segmentation on super_netperf 50 --google-pacing-rate 30000000 -H 10.7.8.152 -l 15 10200.5 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a22a9e41 |
|
22-Dec-2014 |
Wengang Wang <wen.gang.wang@oracle.com> |
bonding: change error message to debug message in __bond_release_one() In __bond_release_one(), when the interface is not a slave or not a slave of "this" master, it log error message. The message actually should be a debug message matching what bond_enslave() does. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
62749e2c |
|
19-Nov-2014 |
Jiri Pirko <jiri@resnulli.us> |
vlan: rename __vlan_put_tag to vlan_insert_tag_set_proto Name fits better. Plus there's going to be introduced __vlan_insert_tag later on. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b4bef1b5 |
|
19-Nov-2014 |
Jiri Pirko <jiri@resnulli.us> |
vlan: kill vlan_put_tag helper Since both tx and rx paths work with skb->vlan_tci, there's no need for this function anymore. Switch users directly to __vlan_hwaccel_put_tag. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b8e4500f |
|
18-Nov-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: fix curr_active_slave/carrier with loadbalance arp monitoring Since commit 6fde8f037e60 ("bonding: fix locking in bond_loadbalance_arp_mon()") we can have a stale bond carrier state and stale curr_active_slave when using arp monitoring in loadbalance modes. The reason is that in bond_loadbalance_arp_mon() we can't have do_failover == true but slave_state_changed == false, whenever do_failover is true then slave_state_changed is also true. Then the following piece from bond_loadbalance_arp_mon(): if (slave_state_changed) { bond_slave_state_change(bond); if (BOND_MODE(bond) == BOND_MODE_XOR) bond_update_slave_arr(bond, NULL); } else if (do_failover) { block_netpoll_tx(); bond_select_active_slave(bond); unblock_netpoll_tx(); } will execute only the first branch, always and regardless of do_failover. Since these two events aren't related in such way, we need to decouple and consider them separately. For example this issue could lead to the following result: Bonding Mode: load balancing (round-robin) *MII Status: down* MII Polling Interval (ms): 0 Up Delay (ms): 0 Down Delay (ms): 0 ARP Polling Interval (ms): 100 ARP IP target/s (n.n.n.n form): 192.168.9.2 Slave Interface: ens12 *MII Status: up* Speed: 10000 Mbps Duplex: full Link Failure Count: 2 Permanent HW addr: 00:0f:53:01:42:2c Slave queue ID: 0 Slave Interface: eth1 *MII Status: up* Speed: Unknown Duplex: Unknown Link Failure Count: 70 Permanent HW addr: 52:54:00:2f:0f:8e Slave queue ID: 0 Since some interfaces are up, then the status of the bond should also be up, but it will never change unless something invokes bond_set_carrier() (i.e. enslave, bond_select_active_slave etc). Now, if I force the calling of bond_select_active_slave via for example changing primary_reselect (it can change in any mode), then the MII status goes to "up" because it calls bond_select_active_slave() which should've been done from bond_loadbalance_arp_mon() itself. CC: Veaceslav Falico <vfalico@gmail.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Ding Tianhong <dingtianhong@huawei.com> Fixes: 6fde8f037e60 ("bonding: fix locking in bond_loadbalance_arp_mon()") Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@gmail.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fbe168ba |
|
12-Nov-2014 |
Michal Kubeček <mkubecek@suse.cz> |
net: generic dev_disable_lro() stacked device handling Large receive offloading is known to cause problems if received packets are passed to other host. Therefore the kernel disables it by calling dev_disable_lro() whenever a network device is enslaved in a bridge or forwarding is enabled for it (or globally). For virtual devices we need to disable LRO on the underlying physical device (which is actually receiving the packets). Current dev_disable_lro() code handles this propagation for a vlan (including 802.1ad nested vlan), macvlan or a vlan on top of a macvlan. It doesn't handle other stacked devices and their combinations, in particular propagation from a bond to its slaves which often causes problems in virtualization setups. As we now have generic data structures describing the upper-lower device relationship, dev_disable_lro() can be generalized to disable LRO also for all lower devices (if any) once it is disabled for the device itself. For bonding and teaming devices, it is necessary to disable LRO not only on current slaves at the moment when dev_disable_lro() is called but also on any slave (port) added later. v2: use lower device links for all devices (including vlan and macvlan) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ef8019b |
|
10-Nov-2014 |
David S. Miller <davem@davemloft.net> |
net: Move bonding headers under include/net This ways drivers like cxgb4 don't need to do ugly relative includes. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31aa860e |
|
31-Oct-2014 |
Eric Dumazet <edumazet@google.com> |
bonding: add bond_tx_drop() helper Because bonding stats are usually sum of slave stats, it was not easy to account for tx drops at bonding layer. We can use dev->tx_dropped for this, as this counter is later added to the device stats (in dev_get_stats()) This extends the idea we had in commit ee6377147409a ("bonding: Simplify the xmit function for modes that use xmit_hash") for bond_3ad_xor_xmit() to other bonding modes. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
02875878 |
|
05-Oct-2014 |
Eric Dumazet <edumazet@google.com> |
net: better IFF_XMIT_DST_RELEASE support Testing xmit_more support with netperf and connected UDP sockets, I found strange dst refcount false sharing. Current handling of IFF_XMIT_DST_RELEASE is not optimal. Dropping dst in validate_xmit_skb() is certainly too late in case packet was queued by cpu X but dequeued by cpu Y The logical point to take care of drop/force is in __dev_queue_xmit() before even taking qdisc lock. As Julian Anastasov pointed out, need for skb_dst() might come from some packet schedulers or classifiers. This patch adds new helper to cleanly express needs of various drivers or qdiscs/classifiers. Drivers that need skb_dst() in their ndo_start_xmit() should call following helper in their setup instead of the prior : dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; -> netif_keep_dst(dev); Instead of using a single bit, we use two bits, one being eventually rebuilt in bonding/team drivers. The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being rebuilt in bonding/team. Eventually, we could add something smarter later. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee637714 |
|
04-Oct-2014 |
Mahesh Bandewar <maheshb@google.com> |
bonding: Simplify the xmit function for modes that use xmit_hash Earlier change to use usable slave array for TLB mode had an additional performance advantage. So extending the same logic to all other modes that use xmit-hash for slave selection (viz 802.3AD, and XOR modes). Also consolidating this with the earlier TLB change. The main idea is to build the usable slaves array in the control path and use that array for slave selection during xmit operation. Measured performance in a setup with a bond of 4x1G NICs with 200 instances of netperf for the modes involved (3ad, xor, tlb) cmd: netperf -t TCP_RR -H <TargetHost> -l 60 -s 5 Mode TPS-Before TPS-After 802.3ad : 468,694 493,101 TLB (lb=0): 392,583 392,965 XOR : 475,696 484,517 Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5f0c5f73 |
|
28-Sep-2014 |
Andy Gospodarek <gospo@cumulusnetworks.com> |
bonding: make global bonding stats more reliable As the code stands today, bonding stats are based simply on the stats from the member interfaces. If a member was to be removed from a bond, the stats would instantly drop. This would be confusing to an admin would would suddonly see interface stats drop while traffic is still flowing. In addition to preventing the stats drops mentioned above, new members will now be added to the bond and only traffic received after the member was added to the bond will be counted as part of bonding stats. Bonding counters will also be updated when any slaves are dropped to make sure the reported stats are reliable. v2: Changes suggested by Nik to properly allocate/free stats memory. v3: Properly destroy workqueue and fix netlink configuration path. v4: Moved cached stats into bonding and slave structs as there does not seem to be a complexity/performance benefit to using alloc'd memory vs in-struct memory. Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
37ab7ddf |
|
19-Sep-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: remove the unnecessary notes for bond_xmit_broadcast() Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a64d044e |
|
19-Sep-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: slight optimization for bond_xmit_roundrobin() When the slave is the curr_active_slave, no need to check whether the slave is active or not, it is always active. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e0974585 |
|
15-Sep-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: consolidate ASSERT_RTNL()s and remove the unnecessary Consolidate the calls to ASSERT_RTNL() before bond_select_active_slave() inside bond_select_active_slave() itself and remove the ASSERT_RTNL() from bond_hw_addr_swap() as it's not exported and its only caller - bond_change_active_slave() already has an ASSERT_RTNL(). Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
547942ca |
|
15-Sep-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: trivial: style and comment fixes First adjust a couple of locking comments that were left inaccurate, then adjust comments to use the netdev styling and remove extra new lines where necessary and add a couple of new lines between declarations and code. These are all trivial styling changes, no functional change. Also removed a couple of outdated or obvious comments. This patch is by no means a complete fix of all netdev style violations but it gets the bonding closer. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9a72c2da |
|
12-Sep-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: fix div by zero while enslaving and transmitting The problem is that the slave is first linked and slave_cnt is incremented afterwards leading to a div by zero in the modes that use it as a modulus. What happens is that in bond_start_xmit() bond_has_slaves() is used to evaluate further transmission and it becomes true after the slave is linked in, but when slave_cnt is used in the xmit path it is still 0, so fetch it once and transmit based on that. Since it is used only in round-robin and XOR modes, the fix is only for them. Thanks to Eric Dumazet for pointing out the fault in my first try to fix this. Call trace (took it out of net-next kernel, but it's the same with net): [46934.330038] divide error: 0000 [#1] SMP [46934.330041] Modules linked in: bonding(O) 9p fscache snd_hda_codec_generic crct10dif_pclmul [46934.330041] bond0: Enslaving eth1 as an active interface with an up link [46934.330051] ppdev joydev crc32_pclmul crc32c_intel 9pnet_virtio ghash_clmulni_intel snd_hda_intel 9pnet snd_hda_controller parport_pc serio_raw pcspkr snd_hda_codec parport virtio_balloon virtio_console snd_hwdep snd_pcm pvpanic i2c_piix4 snd_timer i2ccore snd soundcore virtio_blk virtio_net virtio_pci virtio_ring virtio ata_generic pata_acpi floppy [last unloaded: bonding] [46934.330053] CPU: 1 PID: 3382 Comm: ping Tainted: G O 3.17.0-rc4+ #27 [46934.330053] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [46934.330054] task: ffff88005aebf2c0 ti: ffff88005b728000 task.ti: ffff88005b728000 [46934.330059] RIP: 0010:[<ffffffffa0198c33>] [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450 [bonding] [46934.330060] RSP: 0018:ffff88005b72b7f8 EFLAGS: 00010246 [46934.330060] RAX: 0000000000000679 RBX: ffff88004b077000 RCX: 000000000000002a [46934.330061] RDX: 0000000000000000 RSI: ffff88004b3f0500 RDI: ffff88004b077940 [46934.330061] RBP: ffff88005b72b830 R08: 00000000000000c0 R09: ffff88004a83e000 [46934.330062] R10: 000000000000ffff R11: ffff88004b1f12c0 R12: ffff88004b3f0500 [46934.330062] R13: ffff88004b3f0500 R14: 000000000000002a R15: ffff88004b077940 [46934.330063] FS: 00007fbd91a4c740(0000) GS:ffff88005f080000(0000) knlGS:0000000000000000 [46934.330064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [46934.330064] CR2: 00007f803a8bb000 CR3: 000000004b2c9000 CR4: 00000000000406e0 [46934.330069] Stack: [46934.330071] ffffffff811e6169 00000000e772fa05 ffff88004b077000 ffff88004b3f0500 [46934.330072] ffffffff81d17d18 000000000000002a 0000000000000000 ffff88005b72b8a0 [46934.330073] ffffffff81620108 ffffffff8161fe0e ffff88005b72b8c4 ffff88005b302000 [46934.330073] Call Trace: [46934.330077] [<ffffffff811e6169>] ? __kmalloc_node_track_caller+0x119/0x300 [46934.330084] [<ffffffff81620108>] dev_hard_start_xmit+0x188/0x410 [46934.330086] [<ffffffff8161fe0e>] ? harmonize_features+0x2e/0x90 [46934.330088] [<ffffffff81620b06>] __dev_queue_xmit+0x456/0x590 [46934.330089] [<ffffffff81620c50>] dev_queue_xmit+0x10/0x20 [46934.330090] [<ffffffff8168f022>] arp_xmit+0x22/0x60 [46934.330091] [<ffffffff8168f090>] arp_send.part.16+0x30/0x40 [46934.330092] [<ffffffff8168f1e5>] arp_solicit+0x115/0x2b0 [46934.330094] [<ffffffff8160b5d7>] ? copy_skb_header+0x17/0xa0 [46934.330096] [<ffffffff8162875a>] neigh_probe+0x4a/0x70 [46934.330097] [<ffffffff8162979c>] __neigh_event_send+0xac/0x230 [46934.330098] [<ffffffff8162a00b>] neigh_resolve_output+0x13b/0x220 [46934.330100] [<ffffffff8165f120>] ? ip_forward_options+0x1c0/0x1c0 [46934.330101] [<ffffffff81660478>] ip_finish_output+0x1f8/0x860 [46934.330102] [<ffffffff81661f08>] ip_output+0x58/0x90 [46934.330103] [<ffffffff81661602>] ? __ip_local_out+0xa2/0xb0 [46934.330104] [<ffffffff81661640>] ip_local_out_sk+0x30/0x40 [46934.330105] [<ffffffff81662a66>] ip_send_skb+0x16/0x50 [46934.330106] [<ffffffff81662ad3>] ip_push_pending_frames+0x33/0x40 [46934.330107] [<ffffffff8168854c>] raw_sendmsg+0x88c/0xa30 [46934.330110] [<ffffffff81612b31>] ? skb_recv_datagram+0x41/0x60 [46934.330111] [<ffffffff816875a9>] ? raw_recvmsg+0xa9/0x1f0 [46934.330113] [<ffffffff816978d4>] inet_sendmsg+0x74/0xc0 [46934.330114] [<ffffffff81697a9b>] ? inet_recvmsg+0x8b/0xb0 [46934.330115] bond0: Adding slave eth2 [46934.330116] [<ffffffff8160357c>] sock_sendmsg+0x9c/0xe0 [46934.330118] [<ffffffff81603248>] ? move_addr_to_kernel.part.20+0x28/0x80 [46934.330121] [<ffffffff811b4477>] ? might_fault+0x47/0x50 [46934.330122] [<ffffffff816039b9>] ___sys_sendmsg+0x3a9/0x3c0 [46934.330125] [<ffffffff8144a14a>] ? n_tty_write+0x3aa/0x530 [46934.330127] [<ffffffff810d1ae4>] ? __wake_up+0x44/0x50 [46934.330129] [<ffffffff81242b38>] ? fsnotify+0x238/0x310 [46934.330130] [<ffffffff816048a1>] __sys_sendmsg+0x51/0x90 [46934.330131] [<ffffffff816048f2>] SyS_sendmsg+0x12/0x20 [46934.330134] [<ffffffff81738b29>] system_call_fastpath+0x16/0x1b [46934.330144] Code: 48 8b 10 4c 89 ee 4c 89 ff e8 aa bc ff ff 31 c0 e9 1a ff ff ff 0f 1f 00 4c 89 ee 4c 89 ff e8 65 fb ff ff 31 d2 4c 89 ee 4c 89 ff <f7> b3 64 09 00 00 e8 02 bd ff ff 31 c0 e9 f2 fe ff ff 0f 1f 00 [46934.330146] RIP [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450 [bonding] [46934.330146] RSP <ffff88005b72b7f8> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> Fixes: 278b208375 ("bonding: initial RCU conversion") Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8c0bc550 |
|
11-Sep-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: adjust locking comments Now that locks have been removed, remove some unnecessary comments and adjust others to reflect reality. Also add a comment to "mode_lock" to describe its current users and give a brief summary why they need it. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e470259f |
|
11-Sep-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: 3ad: convert to bond->mode_lock Now that we have bond->mode_lock, we can remove the state_machine_lock and use it in its place. There're no fast paths requiring the per-port spinlocks so it should be okay to consolidate them into mode_lock. Also move it inside the unbinding function as we don't want to expose mode_lock outside of the specific modes. Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4bab16d7 |
|
11-Sep-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: alb: convert to bond->mode_lock The ALB/TLB specific spinlocks are no longer necessary as we now have bond->mode_lock for this purpose, so convert them and remove them from struct alb_bond_info. Also remove the unneeded lock/unlock functions and use spin_lock/unlock directly. Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b7435628 |
|
11-Sep-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert curr_slave_lock to a spinlock and rename it curr_slave_lock is now a misleading name, a much better name is mode_lock as it'll be used for each mode's purposes and it's no longer necessary to use a rwlock, a simple spinlock is enough. Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1c72cfdc9 |
|
11-Sep-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: clean curr_slave_lock use Mostly all users of curr_slave_lock already have RTNL as we've discussed previously so there's no point in using it, the one case where the lock must stay is the 3ad code, in fact it's the only one. It's okay to remove it from bond_do_fail_over_mac() as it's called with RTNL and drops the curr_slave_lock anyway. bond_change_active_slave() is one of the main places where curr_slave_lock was used, it's okay to remove it as all callers use RTNL these days before calling it, that's why we move the ASSERT_RTNL() in the beginning to catch any potential offenders to this rule. The RTNL argument actually applies to all of the places where curr_slave_lock has been removed from in this patch. Also remove the unnecessary bond_deref_active_protected() macro and use rtnl_dereference() instead. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
37b7021d |
|
09-Sep-2014 |
Masanari Iida <standby24x7@gmail.com> |
net:bonding: Add missing space in bonding driver parameter description This patch adds missing space between "interface" and "by" in bonding module parameter description. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
87163ef9 |
|
09-Sep-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: remove last users of bond->lock and bond->lock itself The usage of bond->lock in bond_main.c was completely unnecessary as it didn't help to sync with anything, most of the spots already had RTNL. Since there're no more users of bond->lock, remove it. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
059b47e8 |
|
09-Sep-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert primary_slave to use RCU This is necessary mainly for two bonding call sites: procfs and sysfs as it was dereferenced without any real protection. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bdbc5f13 |
|
09-Sep-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: 3ad: use curr_slave_lock instead of bond->lock In 3ad mode the only syncing needed by bond->lock is for the wq and the recv handler, so change them to use curr_slave_lock. There're no locking dependencies here as 3ad doesn't use curr_slave_lock at all. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a67eed57 |
|
25-Jul-2014 |
Dan Carpenter <dan.carpenter@oracle.com> |
bonding: fix a memory leak in bond_arp_send_all() This test is reversed so the memory is always leaked. It's better style to remove the test anyway. Fixes: 3e403a77779f ('bonding: make it possible to have unlimited nested upper vlans') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3e403a77 |
|
17-Jul-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: make it possible to have unlimited nested upper vlans Currently we're limited by a constant level of vlan nestings, and fail to find anything beyound that level (currently 2). To fix this - remove the limit of nestings when going through device tree, and when the end device is found - allocate the needed amount of vlan tags and return them, instead of found/not found. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
23fa5c2c |
|
16-Jul-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: destroy proc directory only after all bonds are gone Currently we might arrive to bond_net_exit() with some bonds left (that were created while the module is unloading). We take care of that by destroying sysfs (the last possibility to add new bonds) and then destroying all the remaining bonds. However, we destroy the /proc/net/bonding directory before destroying those last bonds, and get a warning that we're trying to destroy a non-empty proc directory (containing /proc/net/bonding/bondX). Fix this by moving bond_destroy_proc_dir() after all the bonds are destroyed, so that we're sure that no bonds exist. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
14056e79 |
|
16-Jul-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: use rtnl_deref in bond_change_rx_flags() As it's always called with RTNL held, via dev_set_allmulti/promiscuity. Also, remove the wrong comment. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ce04d635 |
|
17-Jul-2014 |
Jianhua Xie <jianhua.xie@freescale.com> |
bonding: enhance L2 hash helper with packet type Current L2 hash helper calculates destination eth addr and source ether addr as L2 hash factors. This patch is adding packet type ID field into L2 hash factors. While one of BOND_XMIT_POLICY_LAYER2 or BOND_XMIT_POLICY_{LAYER|ENCAP}23 is applied, for the 2nd level hash, enhanced hash method can help to distribute different types of packets like IPv4/IPv6 packets to different slave devices. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: David S. Miller <davem@davemloft.net> CC: Pan Jiafei <Jiafei.Pan@freescale.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3385323 |
|
15-Jul-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: remove pr_fmt from bond_main.c To maintain the same message structure as netdev_* functions print. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
76444f50 |
|
15-Jul-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: convert bond_main.c to use netdev_printk instead of pr_ Converted only the parts where we've had a valid net_device, skipping the init/deinit and options verification. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f5442441 |
|
15-Jul-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: permit enslaving interfaces without set_mac support Currently we exit if the slave isn't the first slave, doesn't support mac address setting and fail_over_mac isn't FOM_ACTIVE. It's wrong because we only require ndo_set_mac_address in case bonding is in active-backup mode and FOM isn't FOM_ACTIVE. To fix this - only exit with an error if we're in a/b mode and have fail_over_mac != FOM_ACTIVE. Also, maintain current behaviour on the first slave (forcibly change fom to FOM_ACTIVE) to not break anyone's configuration. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
85741718 |
|
15-Jul-2014 |
Eric Dumazet <edumazet@google.com> |
bonding: add proper __rcu annotation for current_arp_slave Using __rcu annotation actually helps to spot all accesses to bond->current_arp_slave are correctly protected, with LOCKDEP support. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4740d638 |
|
15-Jul-2014 |
Eric Dumazet <edumazet@google.com> |
bonding: add proper __rcu annotation for curr_active_slave RCU was added to bonding in linux-3.12 but lacked proper sparse annotations. Using __rcu annotation actually helps to spot all accesses to bond->curr_active_slave are correctly protected, with LOCKDEP support. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Veaceslav Falico <vfalico@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c835a677 |
|
14-Jul-2014 |
Tom Gundersen <teg@jklm.no> |
net: set name_assign_type in alloc_netdev() Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert all users to pass NET_NAME_UNKNOWN. Coccinelle patch: @@ expression sizeof_priv, name, setup, txqs, rxqs, count; @@ ( -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs) +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs) | -alloc_netdev_mq(sizeof_priv, name, setup, count) +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count) | -alloc_netdev(sizeof_priv, name, setup) +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup) ) v9: move comments here from the wrong commit Signed-off-by: Tom Gundersen <teg@jklm.no> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
548d28bd |
|
13-Jul-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: fix ad_select module param check Obvious copy/paste error when I converted the ad_select to the new option API. "lacp_rate" there should be "ad_select" so we can get the proper value. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: David S. Miller <davem@davemloft.net> Fixes: 9e5f5eebe765 ("bonding: convert ad_select to use the new option API") Reported-by: Karim Scheik <karim.scheik@prisma-solutions.at> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e721f87d |
|
02-Jul-2014 |
Jiri Pirko <jiri@resnulli.us> |
bonding: remove no longer relevant vlan warnings These warnings are no longer relevant. Even when last slave is removed, there is a valid address assigned to bond (random). The correct functionality of vlans is ensured by maintaining unicast list in vlan_sync_address(). Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
763e0ecd |
|
27-Jun-2014 |
Jiri Pirko <jiri@resnulli.us> |
bonding: allow to add vlans on top of empty bond This limitation maybe had some reason in the past, but now there is not one -> removing this. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a7baa78 |
|
17-Jun-2014 |
Or Gerlitz <ogerlitz@mellanox.com> |
bonding: Advertize vxlan offload features when supported When the underlying device supports TCP offloads for VXLAN/UDP encapulated traffic, we need to reflect that through the hw_enc_features field of the bonding net-device. This will cause the xmit path in the core networking stack to provide bonding with encapsulated GSO frames to offload into the HW etc. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
14af9963 |
|
04-Jun-2014 |
Vlad Yasevich <vyasevic@redhat.com> |
bonding: Support macvlans on top of tlb/rlb mode bonds To make TLB mode work, the patch allows learning packets to be sent using mac addresses assigned to macvlan devices, also taking into an account vlans that may be between the bond and macvlan device. To make RLB work, all we have to do is accept ARP packets for addresses added to the bond dev->uc list. Since RLB mode will take care to update the peers directly with correct mac addresses, learning packets for these addresses do not have be send to switch. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c565b488 |
|
04-Jun-2014 |
Vlad Yasevich <vyasevic@redhat.com> |
bonding: Turn on IFF_UNICAST_FLT on bond devices Bonding devices manage the unicast filters of the underlying interfaces, but do not turn on IFF_UNICAST_FLT flag. Thus anytime a unicast address is added to the bond, the bond is places in promiscuous mode. Turn on IFF_UNICAST_FLT on the bond device so that the bond does not go into promiscuous mode needlesly. If an underlying device does not support unicast filtering, that device will automaticall enter promiscuous mode already. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dc73c41f |
|
21-May-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: populate essential new_slave->bond/dev early The new bond_free_slave() needs new_slave->bond to verify if additional structures were allocated, so populate it early so that, in case of failure in bond_enslave(), we would be able to get it. Also populate the new_slave->dev field, as it's too one of the most needed things to assign early. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a9b3ace4 |
|
20-May-2014 |
Michal Kubeček <mkubecek@suse.cz> |
bonding: fix vlan_features computing bond_compute_features() uses netdev_increment_features() to combine vlan_features of slaves into vlan_features of the bond. As netdev_increment_features() only adds most features and we start with BOND_VLAN_FEATURES, we can end up with features none of the slaves provided. If there is at least one slave, initialize vlan_features only with the flags in NETIF_F_ALL_FOR_ALL. Right now there is none in BOND_VLAN_FEATURES but stating it explicitely will make the code more future proof. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
44a40855 |
|
16-May-2014 |
Vlad Yasevich <vyasevic@redhat.com> |
bonding: Fix stacked device detection in arp monitoring Prior to commit fbd929f2dce460456807a51e18d623db3db9f077 bonding: support QinQ for bond arp interval the arp monitoring code allowed for proper detection of devices stacked on top of vlans. Since the above commit, the code can still detect a device stacked on top of single vlan, but not a device stacked on top of Q-in-Q configuration. The search will only set the inner vlan tag if the route device is the vlan device. However, this is not always the case, as it is possible to extend the stacked configuration. With this patch it is possible to provision devices on top Q-in-Q vlan configuration that should be used as a source of ARP monitoring information. For example: ip link add link bond0 vlan10 type vlan proto 802.1q id 10 ip link add link vlan10 vlan100 type vlan proto 802.1q id 100 ip link add link vlan100 type macvlan Note: This patch limites the number of stacked VLANs to 2, just like before. The original, however had another issue in that if we had more then 2 levels of VLANs, we would end up generating incorrectly tagged traffic. This is no longer possible. Fixes: fbd929f2dce460456807a51e18d623db3db9f077 (bonding: support QinQ for bond arp interval) CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@redhat.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Ding Tianhong <dingtianhong@huawei.com> CC: Patric McHardy <kaber@trash.net> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8557cd74 |
|
15-May-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: replace SLAVE_IS_OK() with bond_slave_can_tx() They're verifying the same thing (except of IFF_UP, which is implied for netif_running(), which is also a prerequisite). CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
891ab54d |
|
15-May-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: rename {, bond_}slave_can_tx and clean it up CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b6adc610 |
|
15-May-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: convert IS_UP(slave->dev) to inline function Also, remove the IFF_UP verification cause we can't be netif_running() with being also IFF_UP. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2807a9fe |
|
15-May-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: make IS_IP_TARGET_UNUSABLE_ADDRESS an inline function Also, use standard IP primitives to check the address. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
01844098 |
|
15-May-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: create a macro for bond mode and use it CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ec0865a9 |
|
15-May-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: make USES_PRIMARY inline functions Change the name a bit to better reflect its scope, and update some comments. Two functions added - one which takes bond as a param and the other which takes the mode. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
267bed77 |
|
15-May-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: make BOND_NO_USES_ARP an inline function Also, change its name to better reflect its scope, and skip the "no" part. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d1e2e5cd |
|
15-May-2014 |
Veaceslav Falico <vfalico@gmail.com> |
bonding: make TX_QUEUE_OVERRIDE() macro an inline function Also, make it accept bonding as a parameter and change the name a bit to better reflect its scope. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3fdddd85 |
|
12-May-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: alloc the structure ad_info dynamically in per slave The struct ad_slave_info is very huge, and only be used for 802.3ad mode, so alloc the structure dynamically could save 356 Bits for every slave in non 802.3ad mode. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bedabf90 |
|
07-May-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: simplify the slave_do_arp_validate_only() The argument slave is not used for slave_do_arp_validate_only(), so no need to keep it, make the function more simple. Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e9f0fb88 |
|
22-Apr-2014 |
Mahesh Bandewar <maheshb@google.com> |
bonding: Add tlb_dynamic_lb parameter for tlb mode The aggresive load balancing causes packet re-ordering as active flows are moved from a slave to another within the group. Sometime this aggresive lb is not necessary if the preference is for less re-ordering. This parameter if used with value "0" disables this dynamic flow shuffling minimizing packet re-ordering. Of course the side effect is that it has to live with the static load balancing that the hashing distribution provides. This impact is less severe if the correct xmit-hashing-policy is used for the tlb setup. The default value of the parameter is set to "1" mimicing the earlier behavior. Ran the netperf test with 200 stream for 1 min between two hosts with 4x1G trunk (xmit-lb mode with xmit-policy L3+4) before and after these changes. Following was the command used for those 200 instances - netperf -t TCP_RR -l 60 -s 5 -H <host> -- -r81920,81920 Transactions per second: Before change: 1,367.11 After change: 1,470.65 Change-Id: Ie3f75c77282cf602e83a6e833c6eb164e72a0990 Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f05b42ea |
|
22-Apr-2014 |
Mahesh Bandewar <maheshb@google.com> |
bonding: Added bond_tlb_xmit() for tlb mode. Re-organized the xmit function for the lb mode separating tlb xmit from the alb mode. This will enable use of the hashing policies like 802.3ad mode. Also extended use of xmit-hash-policy to tlb mode. Now the tlb-mode defaults to BOND_XMIT_POLICY_LAYER2 if the xmit policy module parameter is not set (just like 802.3ad, or Xor mode). Change-Id: I140257403d272df75f477b380207338d0f04963e Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee62e868 |
|
22-Apr-2014 |
Mahesh Bandewar <maheshb@google.com> |
bonding: Changed hashing function to just provide hash Modified the hash function to return just hash separating from the modulo operation that can be performed by the caller. This is to make way for the tlb mode to use the same hashing policies that are used in the 802.3ad and Xor mode. Change-Id: I276609e87e0ca213c4d1b17b79c5e0b0f3d0dd6f Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
db298686 |
|
08-Apr-2014 |
Thomas Richter <tmricht@linux.vnet.ibm.com> |
bonding: Remove debug_fs files when module init fails Remove the bonding debug_fs entries when the module initialization fails. The debug_fs entries should be removed together with all other already allocated resources. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7db8df02 |
|
01-Apr-2014 |
zheng.li <zheng.x.li@oracle.com> |
bonding: Inactive slaves should keep inactive flag's value bond_open is not setting the inactive flag correctly for some modes (alb and tlb), resulting in error behavior if the bond has been administratively set down and then back up. This effect should not occur when slaves are added while the bond is up; it's something that only happens after a down/up bounce of the bond. For example, in bond tlb or alb mode, domu send some ARP request which go out from dom0 bond's active slave, then the ARP broadcast request packets go back to inactive slave from switch, because the inactive slave's inactive flag is zero, kernel will receive the packets and pass them to bridge that cause dom0's bridge map domu's MAC address to port of bond, bridge should map domu's MAC to port of vif. Signed-off-by: Zheng Li <zheng.x.li@oracle.com> Signed-off-by: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a8779ec1 |
|
27-Mar-2014 |
Eric W. Biederman <ebiederm@xmission.com> |
netpoll: Remove gfp parameter from __netpoll_setup The gfp parameter was added in: commit 47be03a28cc6c80e3aa2b3e8ed6d960ff0c5c0af Author: Amerigo Wang <amwang@redhat.com> Date: Fri Aug 10 01:24:37 2012 +0000 netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup() slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Eric suggested to pass gfp flags to __netpoll_setup(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> The reason for the gfp parameter was removed in: commit c4cdef9b7183159c23c7302aaf270d64c549f557 Author: dingtianhong <dingtianhong@huawei.com> Date: Tue Jul 23 15:25:27 2013 +0800 bonding: don't call slave_xxx_netpoll under spinlocks The slave_xxx_netpoll will call synchronize_rcu_bh(), so the function may schedule and sleep, it should't be called under spinlocks. bond_netpoll_setup() and bond_netpoll_cleanup() are always protected by rtnl lock, it is no need to take the read lock, as the slave list couldn't be changed outside rtnl lock. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Nothing else that calls __netpoll_setup or ndo_netpoll_setup requires a gfp paramter, so remove the gfp parameter from both of these functions making the code clearer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4873ac3c |
|
25-Mar-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: add net_ratelimt to avoid spam in arp interval Remove the unnecessary log and add net_ratelimit to the others, in order to avoid spam the log. Cc: Joe Perches <joe@perches.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fbd929f2 |
|
25-Mar-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: support QinQ for bond arp interval The bond send arp request to indicate that the slave is active, and if the bond dev is a vlan dev, it will set the vlan tag in skb to notice the vlan group, but the bond could only send a skb with 802.1q proto, not support for QinQ. So add outer tag for lower vlan tag and inner tag for upper vlan tag to support QinQ, The new skb will be consist of two vlan tag just like this: dst mac | src mac | outer vlan tag | inner vlan tag | data | ..... If We don't need QinQ, the inner vlan tag could be set to 0 and use outer vlan tag as a normal vlan group. Using "ip link" to configure the bond for QinQ and add test log: ip link add link bond0 bond0.20 type vlan proto 802.1ad id 20 ip link add link bond0.20 bond0.20.200 type vlan proto 802.1q id 200 ifconfig bond0.20 11.11.20.36/24 ifconfig bond0.20.200 11.11.200.36/24 echo +11.11.200.37 > /sys/class/net/bond0/bonding/arp_ip_target 90:e2:ba:07:4a:5c (oui Unknown) > Broadcast, ethertype 802.1Q-QinQ (0x88a8),length 50: vlan 20, p 0,ethertype 802.1Q, vlan 200, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 11.11.200.37 tell 11.11.200.36, length 28 90:e2:ba:06:f9:86 (oui Unknown) > 90:e2:ba:07:4a:5c (oui Unknown), ethertype 802.1Q-QinQ (0x88a8), length 50: vlan 20, p 0, ethertype 802.1Q, vlan 200, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Reply 11.11.200.37 is-at 90:e2:ba:06:f9:86 (oui Unknown), length 28 v1->v2: remove the comment "TODO: QinQ?". Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9152e26d |
|
25-Mar-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: ratelimit pr_err() for bond xmit broadcast It may spam if the system is out of the memory, add ratelimit for it. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
054bb880 |
|
25-Mar-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: slight optimization for bond xmit path Add unlikely() micro to the unlikely conditions in the bond xmit path for slight optimization. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2bb77ab4 |
|
11-Mar-2014 |
Eric W. Biederman <ebiederm@xmission.com> |
bonding: Call dev_kfree_skby_any instead of kfree_skb. Replace kfree_skb with dev_kfree_skb_any in functions that can be called in hard irq and other contexts. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3253339 |
|
04-Mar-2014 |
stephen hemminger <stephen@networkplumber.org> |
bonding: options handling cleanup Make local functions static (ie. only used in bond_options.c) Make bond options parsing tables constant. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fca28094 |
|
04-Mar-2014 |
stephen hemminger <stephen@networkplumber.org> |
bonding: remove dead code These functions are defined but no longer used. Compile tested only. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
28572760 |
|
27-Feb-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: send arp requests even if there's no route to them Currently we're only sending arp requests if we have a route to the target (and, thus, can find out the source ip address). There are some use cases, however, where we don't want/need to set an ip address (or set up a specific route) for bonding to use arp monitoring *for traffic generation*. We can easily send arp probes (arp requests with src ip == 0) to generate arp broadcast responses from the target ip and use them for determining if the target is up. This, obviously, won't work with arp validation - because we don't have the ip address set and, thus, will filter out the responses. So in that case - print a warning. CC: François CACHEREUL <f.cachereul@alphalink.fr> CC: Zhenjie Chen <zhchen@redhat.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
09a89c21 |
|
26-Feb-2014 |
Jiri Bohac <jbohac@suse.cz> |
bonding: disallow enslaving a bond to itself Enslaving a bond to itself leads to an endless loop and hangs the kernel. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Tested-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee6154e1 |
|
26-Feb-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: fix a div error caused by the slave release path There's a bug in the slave release function which leads the transmit functions which use the bond->slave_cnt to a div by 0 because we might just have released our last slave and made slave_cnt == 0 but at the same time we may have a transmitter after the check for an empty list which will fetch it and use it in the slave id calculation. Fix it by moving the slave_cnt after synchronize_rcu so if this was our last slave any new transmitters will see an empty slave list which is checked after rcu lock but before calling the mode transmit functions which rely on bond->slave_cnt. Fixes: 278b208375 ("bonding: initial RCU conversion") CC: Veaceslav Falico <vfalico@redhat.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Jay Vosburgh <fubar@us.ibm.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b0929915 |
|
25-Feb-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor Veaceslav has reported and fix this problem by commit f2ebd477f141bc0 (bonding: restructure locking of bond_ab_arp_probe()). According Jay's opinion, the current solution is not very well, because the notification is to indicate that the interface has actually changed state in a meaningful way, but these calls in the ab ARP monitor are internal settings of the flags to allow the ARP monitor to search for a slave to become active when there are no active slaves. The flag setting to active or backup is to permit the ARP monitor's response logic to do the right thing when deciding if the test slave (current_arp_slave) is up or not. So the best way to fix the problem is that we should not send a notification when the slave is in testing state, and check the state at the end of the monitor, if the slave's state recover, avoid to send pointless notification twice. And RTNL is really a big lock, hold it regardless the slave's state changed or not when the current_active_slave is null will loss performance (every 100ms), so we should hold it only when the slave's state changed and need to notify. I revert the old commit and add new modifications. Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5e5b0665 |
|
25-Feb-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for 802.3ad mode The problem was introduced by the commit 1d3ee88ae0d (bonding: add netlink attributes to slave link dev). The bond_set_active_slave() and bond_set_backup_slave() will use rtmsg_ifinfo to send slave's states, so these two functions should be called in RTNL. In 802.3ad mode, acquiring RTNL for the __enable_port and __disable_port cases is difficult, as those calls generally already hold the state machine lock, and cannot unconditionally call rtnl_lock because either they already hold RTNL (for calls via bond_3ad_unbind_slave) or due to the potential for deadlock with bond_3ad_adapter_speed_changed, bond_3ad_adapter_duplex_changed, bond_3ad_link_change, or bond_3ad_update_lacp_rate. All four of those are called with RTNL held, and acquire the state machine lock second. The calling contexts for __enable_port and __disable_port already hold the state machine lock, and may or may not need RTNL. According to the Jay's opinion, I don't think it is a problem that the slave don't send notify message synchronously when the status changed, normally the state machine is running every 100 ms, send the notify message at the end of the state machine if the slave's state changed should be better. I fix the problem through these steps: 1). add a new function bond_set_slave_state() which could change the slave's state and call rtmsg_ifinfo() according to the input parameters called notify. 2). Add a new slave parameter which called should_notify, if the slave's state changed and don't notify yet, the parameter will be set to 1, and then if the slave's state changed again, the param will be set to 0, it indicate that the slave's state has been restored, no need to notify any one. 3). the __enable_port and __disable_port should not call rtmsg_ifinfo in the state machine lock, any change in the state of slave could set a flag in the slave, it will indicated that an rtmsg_ifinfo should be called at the end of the state machine. Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7a4ddcd9 |
|
21-Feb-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: remove no longer needed lock for bond_xxx_info_query() The bond_xxx_info_query() was already in RTNL, so no need to use bond lock to protect the bond slave list, so remove it. Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
82741808 |
|
21-Feb-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: netpoll: remove unwanted slave_dev_support_netpoll() The __netpoll_setup() will check the slave's flag and ndo_poll_controller just like the slave_dev_support_netpoll() does, and slave_dev_support_netpoll() was not used by any place, so remove it. Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
010d3c39 |
|
19-Feb-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: fix bond_arp_rcv() race of curr_active_slave bond->curr_active_slave can be changed between its deferences, even to NULL, and thus we might panic. We're always holding the rcu (rx_handler->bond_handle_frame()->bond_arp_rcv()) so fix this by rcu_dereferencing() it and using the saved. Reported-by: Ding Tianhong <dingtianhong@huawei.com> Fixes: aeea64a ("bonding: don't trust arp requests unless active slave really works") CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2a7c183b |
|
18-Feb-2014 |
Joe Perches <joe@perches.com> |
bonding: More use of ether_addr_copy It's smaller and faster for some architectures. Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
49f17de7 |
|
17-Feb-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: rename last_arp_rx to last_rx To reflect the new meaning. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e603460 |
|
17-Feb-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: trivial: rename slave->jiffies to ->last_link_up slave->jiffies is updated every time the slave becomes active, which, for bonding, means that its link is 'up'. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f8ff080d |
|
17-Feb-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: remove useless updating of slave->dev->last_rx Now that all the logic is handled via last_arp_rx, we don't need to use last_rx. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ff71529d |
|
17-Feb-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: use last_arp_rx in bond_loadbalance_arp_mon() Now that last_arp_rx correctly show the last time we've received an ARP, we can use it safely instead of slave->dev->last_rx. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f2cb691a |
|
17-Feb-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: use the new options to correctly set last_arp_rx Now that the options are in place - arp_validate can be set to receive all the traffic or only arp packets to verify if the slave is up, when the slave isn't validated. CC: Rob Landley <rob@landley.net> CC: "David S. Miller" <davem@davemloft.net> CC: Nikolay Aleksandrov <nikolay@redhat.com> CC: Ding Tianhong <dingtianhong@huawei.com> CC: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3fe68df9 |
|
17-Feb-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: always set recv_probe to bond_arp_rcv in arp monitor Currently we only set bond_arp_rcv() if we're using arp_validate, however this makes us skip updating last_arp_rx if we're not validating incoming ARPs - thus, if arp_validate is off, last_arp_rx will never be updated. Fix this by always setting up recv_probe = bond_arp_rcv, even if we're not using arp_validate. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6db4a545 |
|
17-Feb-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: always update last_arp_rx on packet recieve Currently we're updating the last_arp_rx only when we've validate the packet, however afterwards we use it as 'ANY last packet received', but not only validated ARPs. Fix this by updating it in case of any packet received. It won't break the arp_validation=0 because we, anyway, return the correct slave->dev->last_rx in slave_last_rx(). CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
13ac34a8 |
|
17-Feb-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: permit using arp_validate with non-ab modes Currently it's disabled because it's sometimes hard, in typical configs, to make it work - because of the nature how the loadbalance modes work - as it's hard to deliver valid arp replies to correct slaves by the switch. However we still can use arp_validation in loadbalance with several other configs, per example with arp_validate == 2 for backup with one broadcast domain, without the switch(es) doing any balancing - this way we'd be (a bit more) sure that the slave is up. So, enable it to let users decide which one works/suits them best. Also correct the mode limitation from BOND_OPT_ARP_VALIDATE. CC: Nikolay Aleksandrov <nikolay@redhat.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3b7d636b |
|
17-Feb-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: remove bond->lock from bond_arp_rcv We're always called with rcu_read_lock() held (bond_arp_rcv() is only called from bond_handle_frame(), which is rx_handler and always called under rcu from __netif_receive_skb_core() ). The slave active/passive and/or bonding params can change in-flight, however we don't really care about that - we only modify the last time packet was received, which is harmless. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
99932d4f |
|
16-Feb-2014 |
Daniel Borkmann <daniel@iogearbox.net> |
netdevice: add queue selection fallback handler for ndo_select_queue Add a new argument for ndo_select_queue() callback that passes a fallback handler. This gets invoked through netdev_pick_tx(); fallback handler is currently __netdev_pick_tx() as most drivers invoke this function within their customized implementation in case for skbs that don't need any special handling. This fallback handler can then be replaced on other call-sites with different queue selection methods (e.g. in packet sockets, pktgen etc). This also has the nice side-effect that __netdev_pick_tx() is then only invoked from netdev_pick_tx() and export of that function to modules can be undone. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ada0f863 |
|
15-Feb-2014 |
Joe Perches <joe@perches.com> |
bonding: Convert memcpy(foo, bar, ETH_ALEN) to ether_addr_copy(foo, bar) ether_addr_copy is smaller and faster for some architectures. This relies on a stack frame being at least __aligned(2) for one use of an Ethernet address on the stack. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
90194264 |
|
15-Feb-2014 |
Joe Perches <joe@perches.com> |
bonding: Neaten pr_<level> Add missing terminating newlines. Convert uses of pr_info to pr_cont in bond_check_params. Standardize upper/lower case styles. Typo fixes, remove unnecessary parentheses and periods. Alignment neatening. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
91565ebb |
|
15-Feb-2014 |
Joe Perches <joe@perches.com> |
bonding: Convert pr_warning to pr_warn, neatening Use more current logging style. Coalesce formats, realign arguments, drop unnecessary periods. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
805d157e |
|
11-Feb-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: remove the redundant judgements for bond_set_mac_address() The dev_set_mac_address() will check the dev->netdev_ops->ndo_set_mac_address, so no need to check it in bond_set_mac_address(). Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f80889a5 |
|
11-Feb-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: Fix deadlock in bonding driver when using netpoll The bonding driver take write locks and spin locks that are shared by the tx path in enslave processing and notification processing, If the netconsole is in use, the bonding can call printk which puts us in the netpoll tx path, if the netconsole is attached to the bonding driver, result in deadlock. So add protection for these place, by checking the netpoll_block_tx state, we can defer the sending of the netconsole frames until a later time using the retransmit feature of netpoll_send_skb that is triggered on the return code NETDEV_TX_BUSY. Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b8790b5 |
|
10-Feb-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: remove unwanted bond lock for enslave processing The bond enslave processing don't hold bond->lock anymore, so release an unlocked rw lock will cause warning message, remove the unwanted read_unlock(&bond->lock). Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc689aaa |
|
24-Jan-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: fail_over_mac should only affect AB mode in bond_set_mac_address() The fail_over_mac could be set to active or follow in any time for all modes, so if the fail_over_mac is not none and the current mode is not active-backup, the bond_set_mac_address() could not change the master and slave's MAC address. In bond_set_mac_address(), the fail_over_mac should only affect AB mode, so modify to check the mode in addition to fail_over_mac when setting bond's MAC address. Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
00503b6f |
|
24-Jan-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: fail_over_mac should only affect AB mode at enslave and removal processing According to bonding.txt, the fail_over_ma should only affect active-backup mode, but I found that the fail_over_mac could be set to active or follow in all modes, this will cause new slave could not be set to bond's MAC address at enslave processing and restore its own MAC address at removal processing. The correct way to fix the problem is that we should not add restrictions when setting options, just need to modify the bond enslave and removal processing to check the mode in addition to fail_over_mac when setting a slave's MAC during enslavement. The change active slave processing already only calls the fail_over_mac function when in active-backup mode. Thanks for Jay's suggestion. The patch also modify the pr_warning() to pr_warn(). Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6fde8f03 |
|
27-Jan-2014 |
Ding Tianhong <dingtianhong@huawei.com> |
bonding: fix locking in bond_loadbalance_arp_mon() The commit 1d3ee88ae0d605629bf369 (bonding: add netlink attributes to slave link dev) has add rtmsg_ifinfo() in bond_set_active_slave() and bond_set_backup_slave(), so the two function need to called in RTNL lock, but bond_loadbalance_arp_mon() only calling these functions in RCU, warning message will occurs. fix this by add a new function bond_slave_state_change(), which will reset the slave's state after slave link check, so remove the bond_set_xxx_slave() from the cycle and only record the slave_state_changed, this will call the new function to set all slaves to new state in RTNL later. Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f2ebd477 |
|
27-Jan-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: restructure locking of bond_ab_arp_probe() Currently we're calling it from under RCU context, however we're using some functions that require rtnl to be held. Fix this by restructuring the locking - don't call it under any locks, aquire rcu_read_lock() if we're sending _only_ (i.e. we have the active slave present), and use rtnl locking otherwise - if we need to modify (in)active flags of a slave. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
98b90f26 |
|
27-Jan-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: RCUify bond_ab_arp_probe Currently bond_ab_arp_probe() is always called under rcu_read_lock(), however to work with curr_active_slave we're still holding the curr_slave_lock. To remove that curr_slave_lock - rcu_dereference the bond's curr_active_slave and use it further - so that we're sure the slave won't go away, and we don't care if it will change in the meanwhile. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f9399814 |
|
22-Jan-2014 |
Weilong Chen <chenweilong@huawei.com> |
bonding: Don't allow bond devices to change network namespaces. Like bridge, bonding as netdevice doesn't cross netns boundaries. Bonding ports and bonding itself live in same netns. Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3bad540e |
|
22-Jan-2014 |
Jiri Pirko <jiri@resnulli.us> |
bonding: convert netlink to use slave data info api Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d1fbd3ed |
|
22-Jan-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert active_slave to use the new option API This patch adds the necessary changes so active_slave would use the new bonding option API. Also some trivial/style fixes. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
388d3a6d |
|
22-Jan-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert primary_reselect to use the new option API This patch adds the necessary changes so primary_reselect would use the new bonding option API. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b98d9c66 |
|
22-Jan-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert miimon to use the new option API This patch adds the necessary changes so miimon would use the new bonding option API. The "default" definition has been removed as it was 0. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9e5f5eeb |
|
22-Jan-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert ad_select to use the new option API This patch adds the necessary changes so ad_select would use the new bonding option API. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d3131de7 |
|
22-Jan-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert lacp_rate to use the new option API This patch adds the necessary changes so lacp_rate would use the new bonding option API. Also some trivial/style error fixes. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7bdb04ed |
|
22-Jan-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert arp_interval to use the new option API This patch adds the necessary changes so arp_interval would use the new bonding option API. The "default" definition has been removed as it was 0. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1df6b6aa |
|
22-Jan-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert fail_over_mac to use the new option API This patch adds the necessary changes so fail_over_mac would use the new bonding option API. Also fixes a trivial copy/paste error in bond_check_params where the wrong variable was used for the error msg. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
edf36b24 |
|
22-Jan-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert arp_all_targets to use the new option API This patch adds the necessary changes so arp_all_targets would use the new bonding option API. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
16228881 |
|
22-Jan-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert arp_validate to use the new option API This patch adds the necessary changes so arp_validate would use the new bonding option API. Also fix some trivial/style errors. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a4b32ce7 |
|
22-Jan-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert xmit_hash_policy to use the new option API This patch adds the necessary changes so xmit_hash_policy would use the new bonding option API. Also fix some trivial/style errors. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aa59d851 |
|
22-Jan-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert packets_per_slave to use the new option API This patch adds the necessary changes so packets_per_slave would use the new bonding option API. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2b3798d5 |
|
22-Jan-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: convert mode setting to use the new option API This patch makes the bond's mode setting use the new option API and adds support for dependency printing which relies on having an entry for the mode option in the bond_opts[] array. Also add the ability to print the mode name when mode dependency fails and fix some trivial/style errors. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
809fa972 |
|
21-Jan-2014 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
reciprocal_divide: update/correction of the algorithm Jakub Zawadzki noticed that some divisions by reciprocal_divide() were not correct [1][2], which he could also show with BPF code after divisions are transformed into reciprocal_value() for runtime invariance which can be passed to reciprocal_divide() later on; reverse in BPF dump ended up with a different, off-by-one K in some situations. This has been fixed by Eric Dumazet in commit aee636c4809fa5 ("bpf: do not use reciprocal divide"). This follow-up patch improves reciprocal_value() and reciprocal_divide() to work in all cases by using Granlund and Montgomery method, so that also future use is safe and without any non-obvious side-effects. Known problems with the old implementation were that division by 1 always returned 0 and some off-by-ones when the dividend and divisor where very large. This seemed to not be problematic with its current users, as far as we can tell. Eric Dumazet checked for the slab usage, we cannot surely say so in the case of flex_array. Still, in order to fix that, we propose an extension from the original implementation from commit 6a2d7a955d8d resp. [3][4], by using the algorithm proposed in "Division by Invariant Integers Using Multiplication" [5], Torbjörn Granlund and Peter L. Montgomery, that is, pseudocode for q = n/d where q, n, d is in u32 universe: 1) Initialization: int l = ceil(log_2 d) uword m' = floor((1<<32)*((1<<l)-d)/d)+1 int sh_1 = min(l,1) int sh_2 = max(l-1,0) 2) For q = n/d, all uword: uword t = (n*m')>>32 q = (t+((n-t)>>sh_1))>>sh_2 The assembler implementation from Agner Fog [6] also helped a lot while implementing. We have tested the implementation on x86_64, ppc64, i686, s390x; on x86_64/haswell we're still half the latency compared to normal divide. Joint work with Daniel Borkmann. [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c [3] https://gmplib.org/~tege/division-paper.pdf [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556 [6] http://www.agner.org/optimize/asmlib.zip Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Austin S Hemmelgarn <ahferroin7@gmail.com> Cc: linux-kernel@vger.kernel.org Cc: Jesse Gross <jesse@nicira.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1d3ee88a |
|
16-Jan-2014 |
sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com> |
bonding: add netlink attributes to slave link dev If link is IFF_SLAVE, extend link dev netlink attributes to include slave attributes with new IFLA_SLAVE nest. Add netlink notification (RTM_NEWLINK) when slave status changes from backup to active, or visa-versa. Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE attributes. Currently only used by bonding driver, but could be used by other aggregating devices with slaves. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07699f9a |
|
16-Jan-2014 |
sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com> |
bonding: add sysfs /slave dir for bond slave devices. Add sub-directory under /sys/class/net/<interface>/slave with read-only attributes for slave. Directory only appears when <interface> is a slave. $ tree /sys/class/net/eth2/slave/ /sys/class/net/eth2/slave/ ├── ad_aggregator_id ├── link_failure_count ├── mii_status ├── perm_hwaddr ├── queue_id └── state $ cat /sys/class/net/eth2/slave/* 2 0 up 40:02:10:ef:06:01 0 active Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ec775b9 |
|
15-Jan-2014 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: handle slave's name change with primary_slave logic Currently, if a slave's name change, we just pass it by. However, if the slave is a current primary_slave, then we end up with using a slave, whose name != params.primary, for primary_slave. And vice-versa, if we don't have a primary_slave but have params.primary set - we will not detected a new primary_slave. Fix this by catching the NETDEV_CHANGENAME event and setting primary_slave accordingly. Also, if the primary_slave was changed, issue a reselection of the active slave, cause the priorities have changed. Reported-by: Ding Tianhong <dingtianhong@huawei.com> CC: Ding Tianhong <dingtianhong@huawei.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0917b933 |
|
14-Jan-2014 |
Ying Xue <ying.xue@windriver.com> |
bonding: use __dev_get_by_name instead of dev_get_by_name to find interface The following call chain indicates that bond_do_ioctl() is protected under rtnl_lock. If we use __dev_get_by_name() instead of dev_get_by_name() to find interface handler in it, this would help us avoid to change reference counter of interface once. dev_ioctl() rtnl_lock() dev_ifsioc() bond_do_ioctl() rtnl_unlock() Additionally we also change the coding style in bond_do_ioctl(), letting it more readable for us. Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f663dd9a |
|
10-Jan-2014 |
Jason Wang <jasowang@redhat.com> |
net: core: explicitly select a txq before doing l2 forwarding Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The will cause several issues: - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan instead of lower device which misses the necessary txq synchronization for lower device such as txq stopping or frozen required by dev watchdog or control path. - dev_hard_start_xmit() was called with NULL txq which bypasses the net device watchdog. - dev_hard_start_xmit() does not check txq everywhere which will lead a crash when tso is disabled for lower device. Fix this by explicitly introducing a new param for .ndo_select_queue() for just selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also extended to accept this parameter and dev_queue_xmit_accel() was used to do l2 forwarding transmission. With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of dev_queue_xmit() to do the transmission. In the future, it was also required for macvtap l2 forwarding support since it provides a necessary synchronization method. Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: e1000-devel@lists.sourceforge.net Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ec029fac |
|
03-Jan-2014 |
sfeldma@cumulusnetworks.com <sfeldma@cumulusnetworks.com> |
bonding: add ad_select attribute netlink support Add IFLA_BOND_AD_SELECT to allow get/set of bonding parameter ad_select via netlink. Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6da67d26 |
|
30-Dec-2013 |
stephen hemminger <stephen@networkplumber.org> |
bonding: make more functions static More functions in bonding that can be declared static because they are only used in one file. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
844223ab |
|
01-Jan-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: use ether_addr_equal_64bits to instead of ether_addr_equal The net_device.dev_addr have more than 2 bytes of additional data after the mac addr, so it is safe to use the ether_addr_equal_64bits(). Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d316dedd |
|
01-Jan-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: remove unwanted return value for bond_dev_queue_xmit() The return value for bond_dev_queue_xmit() will not be used anymore, so remove the return value. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3900f290 |
|
01-Jan-2014 |
dingtianhong <dingtianhong@huawei.com> |
bonding: slight optimizztion for bond_slave_override() When the skb is xmit by the function bond_slave_override(), it will have duplicate judgement for slave state, and I think it will consumes a little performance, maybe it is negligible, so I simplify the function and remove the unwanted judgement. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
834db4bc |
|
20-Dec-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: ust micro BOND_NO_USE_ARP to simplify the mode check The bond 3ad and TLB/ALB has the same check path, so combine them. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3a7129e5 |
|
20-Dec-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: add option lp_interval for loading module The bond driver could set the lp_interval when loading module. Suggested-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
da131ddb |
|
29-Dec-2013 |
stephen hemminger <stephen@networkplumber.org> |
bonding: make local function static bond_xmit_slave_id is only used in main. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f2369109 |
|
12-Dec-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: rebuild the bond_resend_igmp_join_requests_delayed() The bond_resend_igmp_join_requests_delayed() and bond_resend_igmp_join_requests() should be integrated, because the bond_resend_igmp_join_requests_delayed() did nothing except bond_resend_igmp_join_requests(). The bond igmp_retrans could only be changed in bond_change_active_slave and here, bond_change_active_slave will be called in RTNL and curr_slave_lock, the bond_resend_igmp_join_requests already hold RTNL, so no need to free RTNL and hold curr_slave_lock again, it may be a small optimization, so move the igmp_retrans in RTNL and remove the curr_slave_lock. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be79bd04 |
|
12-Dec-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: add RCU for bond_3ad_state_machine_handler() The bond_3ad_state_machine_handler() use the bond lock to protect the bond slave list and slave port together, but it is not enough, the bond slave list was link and unlink in RTNL, not bond lock, so I add RCU to protect the slave list from leaving. The bond lock is still used here, because when the slave has been removed from the list by the time the state machine runs, it appears to be possible for both function to manupulate the same aggregator->lag_ports by finding the aggregator via two different ports that are both members of that aggregator (i.e., port A of the agg is being unbound, and port B of the agg is runing its state machine). If I remove the bond lock, there are nothing to mutex changes to aggregator->lag_ports between bond_3ad_state_machine_handler and bond_3ad_unbind_slave, So the bond lock is the simplest way to protect aggregator->lag_ports. There was a lot of function need RCU protect, I have two choice to make the function in RCU-safe, (1) create new similar functions and make the bond slave list in RCU. (2) modify the existed functions and make them in read-side critical section, because the RCU read-side critical sections may be nested. I choose (2) because it is no need to create more similar functions. The nots in the function is still too old, clean up the nots. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c8517035 |
|
12-Dec-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: remove unwanted lock for bond enslave and release The bond_change_active_slave() and bond_select_active_slave() do't need bond lock anymore, so remove the unwanted bond lock for these two functions. The bond_select_active_slave() will release and acquire curr_slave_lock, so the curr_slave_lock need to protect the function. In bond enslave and bond release, the bond slave list is also protected by RTNL, so bond lock is no need to exist, remove the lock and clean the functions. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb9fa4b0 |
|
12-Dec-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: rebuild the lock use for bond_activebackup_arp_mon() The bond_activebackup_arp_mon() use the bond lock for read to protect the slave list, it is no effect, and the RTNL is only called for bond_ab_arp_commit() and peer notify, for the performance better, use RCU to replace with the bond lock, to the bond slave list need to called in RCU, add a new bond_first_slave_rcu() to get the first slave in RCU protection. In bond_ab_arp_probe(), the bond->current_arp_slave may changd if bond release slave, just like: bond_ab_arp_probe() bond_release() cpu 0 cpu 1 ... if (bond->current_arp_slave...) ... ... bond->current_arp_slave = NULl bond->current_arp_slave->dev->name ... So the current_arp_slave need to dereference in the section. Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e52f4fe |
|
12-Dec-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: rebuild the lock use for bond_loadbalance_arp_mon() The bond_loadbalance_arp_mon() use the bond lock to protect the bond slave list, it is no effect, so I could use RTNL or RCU to replace it, considering the performance impact, the RCU is more better here, so the bond lock replace with the RCU. The bond_select_active_slave() need RTNL and curr_slave_lock together, but there is no RTNL lock here, so add a rtnl_rtylock. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4cb4f97b |
|
12-Dec-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: rebuild the lock use for bond_mii_monitor() The bond_mii_monitor() still use bond lock to protect bond slave list, it is no effect, I have 2 way to fix the problem, move the RTNL to the top of the function, or add RCU to protect the bond slave list, according to the Jay Vosburgh's opinion, 10 times one second is a truely big performance loss if use RTNL to protect the whole monitor, so I would take the advice and use RCU to protect the bond slave list. The bond_has_slave() will not protect by anything, there will no things happen if the slave list is be changed, unless the bond was free, but it will not happened before the monitor, the bond will closed before be freed. The peers notify for the bond will calling curr_active_slave, so derefence the slave to make sure we will accessing the same slave if the curr_active_slave changed, as the rcu dereference need in read-side critical sector and bond_change_active_slave() will call it with no RCU hold, so add peer notify in rcu_read_lock which will be nested in monitor. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b2e7aceb |
|
12-Dec-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: remove the no effect lock for bond_select_active_slave() The bond slave list was no longer protected by bond lock and only protected by RTNL or RCU, so anywhere that use bond lock to protect slave list is meaningless. remove the release and acquire bond lock for bond_select_active_slave(). The curr_active_slave could only be changed in 3 place: 1. enslave slave. 2. release slave. 3. change_active_slave. all above place were holding bond lock, RTNL and curr_slave_lock together, it is tedious and meaningless, obviously bond lock is no need here, but RTNL or curr_slave_lock is needed, so if you want to access active slave, you have to choose one lock, RTNL or curr_slave_lock, if RTNL is exist, no need to add curr_slave_lock, otherwise curr_slave_lock is better, because of the performance. there are several place calling bond_select_active_slave() and bond_change_active_slave(), the next step I will clean these place and remove the no effect lock. there are some document changed together when update the function. Suggested-by: Jay Vosburgh <fubar@us.ibm.com> Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
36708b89 |
|
09-Dec-2013 |
Paul E. McKenney <paulmck@kernel.org> |
bonding: Use RCU_INIT_POINTER() for better overhead and for sparse Although rcu_assign_pointer() can be used to assign a constant NULL pointer, doing so gets you an unnecessary memory barrier and in some circumstances, sparse warnings. This commit therefore changes the rcu_assign_pointer() of NULL in __bond_release_one() to RCU_INIT_POINTER(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
#
89015c18 |
|
04-Dec-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: add arp_ip_target checks when install the module When I install the bonding with the wrong arp_ip_target, just like arp_ip_target=500.500.500.500, the arp_ip_target was transfored to 245.245.245.244 and stored in the ip target success, it is uncorrect, so I add checks to avoid adding wrong address. The in4_pton() will set wrong ip address to 0.0.0.0 and return 0, also use the micro IS_IP_TARGET_UNUSABLE_ADDRESS to simplify the code. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fe9d04af |
|
22-Nov-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: disable arp and enable mii monitoring when bond change to no uses arp mode Because the ARP monitoring is not support for 802.3ad, but I still could change the mode to 802.3ad from ab mode while ARP monitoring is running, it is incorrect. So add a check for 802.3ad in bonding_store_mode to fix the problem, and make a new macro BOND_NO_USES_ARP() to simplify the code. v2: according to the Dan Williams's suggestion, bond mode is the most important bond option, it should override any of the other sub-options. So when the mode is changed, the conficting values should be cleared or reset, otherwise the user has to duplicate more operations to modify the logic. I disable the arp and enable mii monitoring when the bond mode is changed to AB, TB and 8023AD if the arp interval is true. v3: according to the Nik's suggestion, the default value of miimon should need a name, there is several place to use it, and the bond_store_arp_interval() could use micro BOND_NO_USES_ARP to make the code more simpify. Suggested-by: Dan Williams <dcbw@redhat.com> Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
73958329 |
|
05-Nov-2013 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: extend round-robin mode with packets_per_slave This patch aims to extend round-robin mode with a new option called packets_per_slave which can have the following values and effects: 0 - choose a random slave 1 (default) - standard round-robin, 1 packet per slave >1 - round-robin when >1 packets have been transmitted per slave The allowed values are between 0 and 65535. This patch also fixes the comment style in bond_xmit_roundrobin(). Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f2cd845 |
|
27-Oct-2013 |
David S. Miller <davem@davemloft.net> |
Revert "Merge branch 'bonding_monitor_locking'" This reverts commit 4d961a101e032b4bf223b279b4b35bc77576f5a8, reversing changes made to a00f6fcc7d0c62a91768d9c4ccba4c7d64fbbce3. Revert bond locking changes, they cause regressions and Veaceslav Falico doesn't like how the commit messages were done at all. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
80b9d236 |
|
23-Oct-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: remove bond read lock for bond_activebackup_arp_mon() The bond slave list may change when the monitor is running, the slave list is no longer protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it: 1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe to call call_netdevice_notifiers() in write lock. 2.remove unused bond->lock for monitor function, only use the existing rtnl lock(). 3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored. so I remove the bond->lock and move the rtnl lock to protect the whole monitor function. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7f1bb571 |
|
23-Oct-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: remove bond read lock for bond_loadbalance_arp_mon() The bond slave list may change when the monitor is running, the slave list is no longer protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it: 1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe to call call_netdevice_notifiers() in write lock. 2.remove unused bond->lock for monitor function, only use the existing rtnl lock(). 3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored. so I remove the bond->lock and add the rtnl lock to protect the whole monitor function. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b6c5261 |
|
23-Oct-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: remove bond read lock for bond_mii_monitor() The bond slave list may change when the monitor is running, the slave list is no longer protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it: 1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe to call call_netdevice_notifiers() in write lock. 2.remove unused bond->lock for monitor function, only use the existing rtnl lock(). 3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored. so I remove the bond->lock and move the rtnl lock to protect the whole monitor function. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7f294054 |
|
23-Oct-2013 |
Alexei Starovoitov <ast@kernel.org> |
net: fix rtnl notification in atomic context commit 991fb3f74c "dev: always advertise rx_flags changes via netlink" introduced rtnl notification from __dev_set_promiscuity(), which can be called in atomic context. Steps to reproduce: ip tuntap add dev tap1 mode tap ifconfig tap1 up tcpdump -nei tap1 & ip tuntap del dev tap1 mode tap [ 271.627994] device tap1 left promiscuous mode [ 271.639897] BUG: sleeping function called from invalid context at mm/slub.c:940 [ 271.664491] in_atomic(): 1, irqs_disabled(): 0, pid: 3394, name: ip [ 271.677525] INFO: lockdep is turned off. [ 271.690503] CPU: 0 PID: 3394 Comm: ip Tainted: G W 3.12.0-rc3+ #73 [ 271.703996] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 271.731254] ffffffff81a58506 ffff8807f0d57a58 ffffffff817544e5 ffff88082fa0f428 [ 271.760261] ffff8808071f5f40 ffff8807f0d57a88 ffffffff8108bad1 ffffffff81110ff8 [ 271.790683] 0000000000000010 00000000000000d0 00000000000000d0 ffff8807f0d57af8 [ 271.822332] Call Trace: [ 271.838234] [<ffffffff817544e5>] dump_stack+0x55/0x76 [ 271.854446] [<ffffffff8108bad1>] __might_sleep+0x181/0x240 [ 271.870836] [<ffffffff81110ff8>] ? rcu_irq_exit+0x68/0xb0 [ 271.887076] [<ffffffff811a80be>] kmem_cache_alloc_node+0x4e/0x2a0 [ 271.903368] [<ffffffff810b4ddc>] ? vprintk_emit+0x1dc/0x5a0 [ 271.919716] [<ffffffff81614d67>] ? __alloc_skb+0x57/0x2a0 [ 271.936088] [<ffffffff810b4de0>] ? vprintk_emit+0x1e0/0x5a0 [ 271.952504] [<ffffffff81614d67>] __alloc_skb+0x57/0x2a0 [ 271.968902] [<ffffffff8163a0b2>] rtmsg_ifinfo+0x52/0x100 [ 271.985302] [<ffffffff8162ac6d>] __dev_notify_flags+0xad/0xc0 [ 272.001642] [<ffffffff8162ad0c>] __dev_set_promiscuity+0x8c/0x1c0 [ 272.017917] [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380 [ 272.033961] [<ffffffff8162b109>] dev_set_promiscuity+0x29/0x50 [ 272.049855] [<ffffffff8172e937>] packet_dev_mc+0x87/0xc0 [ 272.065494] [<ffffffff81732052>] packet_notifier+0x1b2/0x380 [ 272.080915] [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380 [ 272.096009] [<ffffffff81761c66>] notifier_call_chain+0x66/0x150 [ 272.110803] [<ffffffff8108503e>] __raw_notifier_call_chain+0xe/0x10 [ 272.125468] [<ffffffff81085056>] raw_notifier_call_chain+0x16/0x20 [ 272.139984] [<ffffffff81620190>] call_netdevice_notifiers_info+0x40/0x70 [ 272.154523] [<ffffffff816201d6>] call_netdevice_notifiers+0x16/0x20 [ 272.168552] [<ffffffff816224c5>] rollback_registered_many+0x145/0x240 [ 272.182263] [<ffffffff81622641>] rollback_registered+0x31/0x40 [ 272.195369] [<ffffffff816229c8>] unregister_netdevice_queue+0x58/0x90 [ 272.208230] [<ffffffff81547ca0>] __tun_detach+0x140/0x340 [ 272.220686] [<ffffffff81547ed6>] tun_chr_close+0x36/0x60 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5378c2e6 |
|
21-Oct-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: move bond-specific init after enslave happens As Jiri noted, currently we first do all bonding-specific initialization (specifically - bond_select_active_slave(bond)) before we actually attach the slave (so that it becomes visible through bond_for_each_slave() and friends). This might result in bond_select_active_slave() not seeing the first/new slave and, thus, not actually selecting an active slave. Fix this by moving all the bond-related init part after we've actually completely initialized and linked (via bond_master_upper_dev_link()) the new slave. Also, remove the bond_(de/a)ttach_slave(), it's useless to have functions to ++/-- one int. After this we have all the initialization of the new slave *before* linking, and all the stuff that needs to be done on bonding *after* it. It has also a bonus effect - we can remove the locking on the new slave init completely, and only use it for bond_select_active_slave(). Reported-by: Jiri Pirko <jiri@resnulli.us> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Ding Tianhong@huawei.com Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
080a06e1 |
|
18-Oct-2013 |
Jiri Pirko <jiri@resnulli.us> |
bonding: remove bond_ioctl_change_active() no longer needed since bond_option_active_slave_set() can be used instead. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0a2a78c4 |
|
18-Oct-2013 |
Jiri Pirko <jiri@resnulli.us> |
bonding: push Netlink bits into separate file Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
32819dc1 |
|
02-Oct-2013 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: modify the old and add new xmit hash policies This patch adds two new hash policy modes which use skb_flow_dissect: 3 - Encapsulated layer 2+3 4 - Encapsulated layer 3+4 There should be a good improvement for tunnel users in those modes. It also changes the old hash functions to: hash ^= (__force u32)flow.dst ^ (__force u32)flow.src; hash ^= (hash >> 16); hash ^= (hash >> 8); Where hash will be initialized either to L2 hash, that is SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted from the upper layer. Flow's dst and src are also extracted based on the xmit policy either directly from the buffer or by using skb_flow_dissect, but in both cases if the protocol is IPv6 then dst and src are obtained by ipv6_addr_hash() on the real addresses. In case of a non-dissectable packet, the algorithms fall back to L2 hashing. The bond_set_mode_ops() function is now obsolete and thus deleted because it was used only to set the proper hash policy. Also we trim a pointer from struct bonding because we no longer need to keep the hash function, now there's only a single hash function - bond_xmit_hash that works based on bond->params.xmit_policy. The hash function and skb_flow_dissect were suggested by Eric Dumazet. The layer names were suggested by Andy Gospodarek, because I suck at semantics. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3241870 |
|
28-Sep-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: RCUify bond_set_rx_mode() Currently we rely on rtnl locking in bond_set_rx_mode(), however it's not always the case: RTNL: assertion failed at drivers/net/bonding/bond_main.c (3391) ... [<ffffffff81651ca5>] dump_stack+0x54/0x74 [<ffffffffa029e717>] bond_set_rx_mode+0xc7/0xd0 [bonding] [<ffffffff81553af7>] __dev_set_rx_mode+0x57/0xa0 [<ffffffff81557ff8>] __dev_mc_add+0x58/0x70 [<ffffffff81558020>] dev_mc_add+0x10/0x20 [<ffffffff8161e26e>] igmp6_group_added+0x18e/0x1d0 [<ffffffff81186f76>] ? kmem_cache_alloc_trace+0x236/0x260 [<ffffffff8161f80f>] ipv6_dev_mc_inc+0x29f/0x320 [<ffffffff8161f9e7>] ipv6_sock_mc_join+0x157/0x260 ... Fix this by using RCU primitives. Reported-by: Joe Lawrence <joe.lawrence@stratus.com> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a0068de |
|
26-Sep-2013 |
Neil Horman <nhorman@tuxdriver.com> |
bonding: Fix broken promiscuity reference counting issue Recently grabbed this report: https://bugzilla.redhat.com/show_bug.cgi?id=1005567 Of an issue in which the bonding driver, with an attached vlan encountered the following errors when bond0 was taken down and back up: dummy1: promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken. The error occurs because, during __bond_release_one, if we release our last slave, we take on a random mac address and issue a NETDEV_CHANGEADDR notification. With an attached vlan, the vlan may see that the vlan and bond mac address were in sync, but no longer are. This triggers a call to dev_uc_add and dev_set_rx_mode, which enables IFF_PROMISC on the bond device. Then, when we complete __bond_release_one, we use the current state of the bond flags to determine if we should decrement the promiscuity of the releasing slave. But since the bond changed promiscuity state during the release operation, we incorrectly decrement the slave promisc count when it wasn't in promiscuous mode to begin with, causing the above error Fix is pretty simple, just cache the bonding flags at the start of the function and use those when determining the need to set promiscuity. This is also needed for the ALLMULTI flag CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Mark Wu <wudxw@linux.vnet.ibm.com> CC: "David S. Miller" <davem@davemloft.net> Reported-by: Mark Wu <wudxw@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
23c147e0 |
|
27-Sep-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: correctly verify for the first slave in bond_enslave After commit 1f718f0f4f97145f4072d2d72dcf85069ca7226d ("bonding: populate neighbour's private on enslave"), we've moved the actual 'linking' in the end of the function - so that, once linked, the slave is ready to be used, and is not still in the process of enslaving. However, 802.3ad verified if it's the first slave by looking at the if (bond_first_slave(bond) == new_slave) which, because we've moved the linking to the end, became broken - on the first slave bond_first_slave(bond) returns NULL. Fix this by verifying if the prev_slave, that equals bond_last_slave(), is actually populated - if it is - then it's not the first slave, and vice versa. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5831d66e |
|
25-Sep-2013 |
Veaceslav Falico <vfalico@redhat.com> |
net: create sysfs symlinks for neighbour devices Also, remove the same functionality from bonding - it will be already done for any device that links to its lower/upper neighbour. The links will be created for dev's kobject, and will look like lower_eth0 for lower device eth0 and upper_bridge0 for upper device bridge0. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4fee991a |
|
25-Sep-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: remove slave lists And all the initialization. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c8c23903 |
|
25-Sep-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: remove bond_prev_slave() We don't really need it, and it's really hard to RCUify the list->prev. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0965a1f3 |
|
25-Sep-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: add bond_has_slaves() and use it Currently we verify if we have slaves by checking if bond->slave_list is empty. Create a define bond_has_slaves() and use it, a bit more readable and easier to change in the future. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4087df87 |
|
25-Sep-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: rework bond_ab_arp_probe() to use bond_for_each_slave() Currently it uses the hard-to-rcuify bond_for_each_slave_from(), and also it doesn't check every slave for disrepencies between the actual IS_UP(slave) and the slave->link == BOND_LINK_UP, but only till we find the next suitable slave. Fix this by using bond_for_each_slave() and storing the first good slave in *before till we find the current_arp_slave, after that we store the first good slave in new_slave. If new_slave is empty - use the slave stored in before, and if it's also empty - then we didn't find any suitable slave. Also, in the meanwhile, check for each slave status. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
77140d29 |
|
25-Sep-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: rework bond_find_best_slave() to use bond_for_each_slave() bond_find_best_slave() does not have to be balanced - i.e. return the slave that is *after* some other slave, but rather return the best slave that suits, except of bond->primary_slave - in which case we just return it if it's suitable. After that we just look through all the slaves and return either first up slave or the slave whose link came back earliest. We also don't care about curr_active_slave lock cause we use it in bond_should_change_active() only and there we take it right away - i.e. it won't go away. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
544a028e |
|
25-Sep-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: use bond_for_each_slave() in bond_uninit() We're safe agains removal there, cause we use neighbours primitives. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9caff1e7 |
|
25-Sep-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: make bond_for_each_slave() use lower neighbour's private It needs a list_head *iter, so add it wherever needed. Use both non-rcu and rcu variants. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Dimitris Michailidis <dm@chelsio.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
81f23b13 |
|
25-Sep-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: remove bond_for_each_slave_continue_reverse() We only use it in rollback scenarios and can easily use the standart bond_for_each_dev() instead. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f718f0f |
|
25-Sep-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: populate neighbour's private on enslave Use the new provided function when attaching the lower slave to populate its ->private with struct slave *new_slave. Also, move it to the end to be able to 'find' it only after it was completely initialized, and deinitialize in the first place on release. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f268f12 |
|
25-Sep-2013 |
Veaceslav Falico <vfalico@redhat.com> |
net: add adj_list to save only neighbours Currently, we distinguish neighbours (first-level linked devices) from non-neighbours by the neighbour bool in the netdev_adjacent. This could be quite time-consuming in case we would like to traverse *only* through neighbours - cause we'd have to traverse through all devices and check for this flag, and in a (quite common) scenario where we have lots of vlans on top of bridge, which is on top of a bond - the bonding would have to go through all those vlans to get its upper neighbour linked devices. This situation is really unpleasant, cause there are already a lot of cases when a device with slaves needs to go through them in hot path. To fix this, introduce a new upper/lower device lists structure - adj_list, which contains only the neighbours. It works always in pair with the all_adj_list structure (renamed from upper/lower_dev_list), i.e. both of them contain the same links, only that all_adj_list contains also non-neighbour device links. It's really a small change visible, currently, only for __netdev_adjacent_dev_insert/remove(), and doesn't change the main linked logic at all. Also, add some comments a fix a name collision in netdev_for_each_upper_dev_rcu() and rework the naming by the following rules: netdev_(all_)(upper|lower)_* If "all_" is present, then we work with the whole list of upper/lower devices, otherwise - only with direct neighbours. Uninline functions - to get better stack traces. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7eacd038 |
|
13-Sep-2013 |
Neil Horman <nhorman@tuxdriver.com> |
bonding: Make alb learning packet interval configurable running bonding in ALB mode requires that learning packets be sent periodically, so that the switch knows where to send responding traffic. However, depending on switch configuration, there may not be any need to send traffic at the default rate of 3 packets per second, which represents little more than wasted data. Allow the ALB learning packet interval to be made configurable via sysfs Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Acked-by: Veaceslav Falico <vfalico@redhat.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5bb9e0b5 |
|
06-Sep-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: fix bond_arp_rcv setting and arp validate desync state We make bond_arp_rcv global so it can be used in bond_sysfs if the bond interface is up and arp_interval is being changed to a positive value and cleared otherwise as per Jay's suggestion. This also fixes a problem where bond_arp_rcv was set even though arp_validate was disabled while the bond was up by unsetting recv_probe in bond_store_arp_validate and respectively setting it if enabled. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c4826861 |
|
02-Sep-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: drop read_lock in bond_compute_features bond_compute_features is always called with RTNL held, so we can safely drop the read bond->lock. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b7b165a |
|
02-Sep-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: drop read_lock in bond_fix_features We're protected by RTNL so nothing can happen and we can safely drop the read bond->lock. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee8487c0 |
|
02-Sep-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: trivial: remove outdated comment and braces We don't have to release all slaves when closing the bond dev, so remove the outdated comment and the braces around the left single statement. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c8d23f7 |
|
02-Sep-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: simplify and fix peer notification This patch aims to remove a use of the bond->lock for mutual exclusion which will later allow easier migration to RCU of the users of this functionality. We use RTNL as a synchronizing mechanism since it's always held when send_peer_notif is set, and when it is decremented from the notifier function. We can also drop some locking, and fix the leakage of the send_peer_notif counter. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3e32582f |
|
28-Aug-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: pr_debug instead of pr_warn in bond_arp_send_all They're simply annoying and will spam dmesg constantly if we hit them, so convert to pr_debug so that we still can access them in case of debugging. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e868b0c9 |
|
28-Aug-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: remove vlan_list/current_alb_vlan Currently there are no real users of vlan_list/current_alb_vlan, only the helpers which maintain them, so remove them. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a59d3d21 |
|
28-Aug-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: use vlan_uses_dev() in __bond_release_one() We always hold the rtnl_lock() in __bond_release_one(), so use vlan_uses_dev() instead of bond_vlan_used(). CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
50223ce4 |
|
28-Aug-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: convert bond_has_this_ip() to use upper devices Currently, bond_has_this_ip() is aware only of vlan upper devices, and thus will return false if the address is associated with the upper bridge or any other device, and thus will break the arp logic. Fix this by using the upper device list. For every upper device we verify if the address associated with it is our address, and if yes - return true. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27bc11e6 |
|
28-Aug-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: make bond_arp_send_all use upper device list Currently, bond_arp_send_all() is aware only of vlans, which breaks configurations like bond <- bridge (or any other 'upper' device) with IP (which is quite a common scenario for virt setups). To fix this we convert the bond_arp_send_all() to first verify if the rt device is the bond itself, and if not - to go through its list of upper vlans and their respectiv upper devices (if the vlan's upper device matches - tag the packet), if still not found - go through all of our upper list devices to see if any of them match the route device for the target. If the match is a vlan device - we also save its vlan_id and tag it in bond_arp_send(). Also, clean the function a bit to be more readable. CC: Vlad Yasevich <vyasevic@redhat.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b8e2fde4 |
|
22-Aug-2013 |
Wei Yongjun <yongjun_wei@trendmicro.com.cn> |
bonding: fix error return code in bond_enslave() Fix to return a negative error code in the add bond vlan ids error handling case instead of 0, as done elsewhere in this function. Introduced by commit 1ff412ad7714f6952f76ffd77f0a7f2f563288a1. (bonding: change the bond's vlan syncing functions with the standard ones) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b20903f2 |
|
05-Aug-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: unwind on bond_add_vlan failure In case of bond_add_vlan() failure currently we'll have the vlan's refcnt bumped up in all slaves, but it will never go down because it failed to get added to the bond, so properly unwind the added vlan if bond_add_vlan fails. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ff412ad |
|
05-Aug-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: change the bond's vlan syncing functions with the standard ones Now we have vlan_vids_add/del_by_dev() which serve the same purpose as bond's bond_add/del_vlans_on_slave() with the good side effect of reverting the changes if one of the additions fails. There's only 1 change in the behaviour of enslave: if adding of the vlans to the slave fails, we'll fail the enslaving because otherwise we might delete some vlan that wasn't added by the bonding. The only way this may happen is with ENOMEM currently, so we're in trouble anyway. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3b380877 |
|
02-Aug-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: modify only neigh_parms owned by us Otherwise, on neighbour creation, bond_neigh_init() will be called with a foreign netdev. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7864a1ad |
|
05-Aug-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: remove locking from bond_set_rx_mode() We're already protected by RTNL lock, so nothing can happen to bond/its slaves, and thus the locking is useless here (both bond->lock and bond->curr_active_slave). Also, add ASSERT_RTNL() both to bond_set_rx_mode() and bond_hw_addr_swap() to catch possible uses of it without RTNL locking. This patch also saves us from a lockdep false-positive in bond_set_rx_mode() vs bond_hw_addr_swap(). CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e7f63f1d |
|
02-Aug-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: add bond_time_in_interval() and use it for time comparison Currently we use a lot of time comparison math for arp_interval comparisons, which are sometimes quite hard to read and understand. All the time comparisons have one pattern: (time - arp_interval_jiffies) <= jiffies <= (time + mod * arp_interval_jiffies + arp_interval_jiffies/2) Introduce a new helper - bond_time_in_interval(), which will do the math in one place and, thus, will clean up the logical code. This helper introduces a bit of overhead (by always calculating the jiffies from arp_interval), however it's really not visible, considering that functions using it usually run once in arp_interval milliseconds. There are several lines slightly over 80 chars, however breaking them would result in more hard-to-read code than several character after the 80 mark. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
def4460c |
|
02-Aug-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: call slave_last_rx() only once per slave Simple cleanup to not call slave_last_rx() on every time function. It won't give any measurable boost - but looks cleaner and easier to understand. There are no time-consuming functions in between these calls, so it's safe to call it in the beginning only once. CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9918d5bf |
|
02-Aug-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: modify only neigh_parms owned by us Otherwise, on neighbour creation, bond_neigh_init() will be called with a foreign netdev. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
278b2083 |
|
01-Aug-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: initial RCU conversion This patch does the initial bonding conversion to RCU. After it the following modes are protected by RCU alone: roundrobin, active-backup, broadcast and xor. Modes ALB/TLB and 3ad still acquire bond->lock for reading, and will be dealt with later. curr_active_slave needs to be dereferenced via rcu in the converted modes because the only thing protecting the slave after this patch is rcu_read_lock, so we need the proper barrier for weakly ordered archs and to make sure we don't have stale pointer. It's not tagged with __rcu yet because there's still work to be done to remove the curr_slave_lock, so sparse will complain when rcu_assign_pointer and rcu_dereference are used, but the alternative to use rcu_dereference_protected would've created much bigger code churn which is more difficult to test and review. That will be converted in time. 1. Active-backup mode 1.1 Perf recording while doing iperf -P 4 - old bonding: iperf spent 0.55% in bonding, system spent 0.29% CPU in bonding - new bonding: iperf spent 0.29% in bonding, system spent 0.15% CPU in bonding 1.2. Bandwidth measurements - old bonding: 16.1 gbps consistently - new bonding: 17.5 gbps consistently 2. Round-robin mode 2.1 Perf recording while doing iperf -P 4 - old bonding: iperf spent 0.51% in bonding, system spent 0.24% CPU in bonding - new bonding: iperf spent 0.16% in bonding, system spent 0.11% CPU in bonding 2.2 Bandwidth measurements - old bonding: 8 gbps (variable due to packet reorderings) - new bonding: 10 gbps (variable due to packet reorderings) Of course the latency has improved in all converted modes, and moreover while doing enslave/release (since it doesn't affect tx anymore). Also I've stress tested all modes doing enslave/release in a loop while transmitting traffic. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
15077228 |
|
01-Aug-2013 |
Nikolay Aleksandrov <razor@BlackWall.org> |
bonding: factor out slave id tx code and simplify xmit paths I factored out the tx xmit code which relies on slave id in bond_xmit_slave_id. It is global because later it can be used also in 3ad mode xmit. Unnecessary obvious comments are removed. Active-backup mode is simplified because bond_dev_queue_xmit always consumes the skb. bond_xmit_xor becomes one line because of bond_xmit_slave_id. bond_for_each_slave_from is not used in bond_xmit_slave_id because later when RCU is used we can avoid important race condition by using standard rculist routines. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
78a646ce |
|
01-Aug-2013 |
Nikolay Aleksandrov <razor@BlackWall.org> |
bonding: simplify broadcast_xmit function We don't need to start from the curr_active_slave as the frame will be sent to all eligible slaves anyway, so we remove the unnecessary local variables, checks and comments, and make it use the standard list API. This has the nice side-effect that later when it's converted to RCU a race condition will be avoided which could lead to double packet tx. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
71bc3b2d |
|
01-Aug-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: remove unnecessary read_locks of curr_slave_lock In all the cases we already hold bond->lock for reading, so the slave can't get away and the check != NULL is sufficient. curr_active_slave can still change after the read_lock is unlocked prior to use of the dereferenced value, so there's no need for it. It either contains a valid slave which we use (and can't get away), or it is NULL which is checked. In some places the read_lock of curr_slave_lock was left because we need it not to change while performing some action (e.g. syncing current active slave's addresses, sending ARP requests through the active slave) such cases will be dealt with individually while converting to RCU. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dec1e90e |
|
01-Aug-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: convert to list API and replace bond's custom list This patch aims to remove struct bonding's first_slave and struct slave's next and prev pointers, and replace them with the standard Linux list API. The old macros are converted to list API as well and some new primitives are available now. The checks if there're slaves that used slave_cnt have been replaced by the list_empty macro. Also a few small style fixes, changing longest -> shortest line in local variable declarations, leaving an empty line before return and removing unnecessary brackets. This is the first step to gradual RCU conversion. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4beac029 |
|
01-Aug-2013 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: fix system hang due to fast igmp timer rescheduling After commit 4aa5dee4d9 ("net: convert resend IGMP to notifier event") we try to acquire rtnl in bond_resend_igmp_join_requests but it can be scheduled with rtnl already held (e.g. when bond_change_active_slave is called with rtnl) causing a loop of immediate reschedules + calls because rtnl_trylock fails each time since it's being already held. For me this issue leads to system hangs very easy: modprobe bonding; ifconfig bond0 up; ifenslave bond0 eth0; rmmod bonding; The fix is to introduce a small (1 jiffy) delay which is enough for the sections holding rtnl to finish without putting any strain on the system. Also adjust the timer in bond_change_active_slave to be 1 jiffy, since most of the time it's called with rtnl already held. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dcfe8048 |
|
27-Jul-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: remove bond_resend_igmp_join_requests read_unlock leftover After commit 4aa5dee4d9 ("net: convert resend IGMP to notifier event") we have 1 read_unlock in bond_resend_igmp_join_requests which isn't paired with a read_lock because it's removed by that commit. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
10eccb46 |
|
24-Jul-2013 |
stephen hemminger <stephen@networkplumber.org> |
bond: cleanup netpoll code This started out with fixing a sparse warning, then I realized that the wrapper function bond_netpoll_info could just be removed by rolling it into the enable code. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f5280948 |
|
24-Jul-2013 |
Wang Sheng-Hui <shhuiw@gmail.com> |
bonding: use pre-defined macro in bond_mode_name instead of magic number 0 We have BOND_MODE_ROUNDROBIN pre-defined as 0, and it's the lowest mode number. Use it to check the arg lower bound instead of magic number 0 in bond_mode_name. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b07ea07b |
|
23-Jul-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: Fixed up a error "do not initialise statics to 0 or NULL" in bond_main.c The error is found by the checkpatch.pl tools. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c4cdef9b |
|
23-Jul-2013 |
dingtianhong <dingtianhong@huawei.com> |
bonding: don't call slave_xxx_netpoll under spinlocks The slave_xxx_netpoll will call synchronize_rcu_bh(), so the function may schedule and sleep, it should't be called under spinlocks. bond_netpoll_setup() and bond_netpoll_cleanup() are always protected by rtnl lock, it is no need to take the read lock, as the slave list couldn't be changed outside rtnl lock. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4aa5dee4 |
|
19-Jul-2013 |
Jiri Pirko <jiri@resnulli.us> |
net: convert resend IGMP to notifier event Until now, bond_resend_igmp_join_requests() looks for vlans attached to bonding device, bridge where bonding act as port manually. It does not care of other scenarios, like stacked bonds or team device above. Make this more generic and use netdev notifier to propagate the event to upper devices and to actually call ip_mc_rejoin_groups(). Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
008aebde |
|
29-Jun-2013 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: combine pr_debugs in bond_set_dev_addr into one Combine the multiple pr_debugs in bond_set_dev_addr into one pr_debug. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae0d6750 |
|
26-Jun-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: when cloning a MAC use NET_ADDR_STOLEN A simple semantic change, when a slave's MAC is cloned by the bond master then set addr_assign_type to NET_ADDR_STOLEN instead of NET_ADDR_SET. Also use bond_set_dev_addr() in BOND_FOM_ACTIVE mode to change the bond's MAC address because the assign_type has to be set properly. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
97a1e639 |
|
26-Jun-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: remove unnecessary dev_addr_from_first member In struct bonding there's a member called dev_addr_from_first which is used to denote when the bond dev should clone the first slave's MAC address but since we have netdev's addr_assign_type variable that is not necessary. We clone the first slave's MAC each time we have a random MAC set to the bond device. This has the nice side-effect of also fixing an inconsistency - when the MAC address of the bond dev is set after its creation, but prior to having slaves, it's not kept and the first slave's MAC is cloned. The only way to keep the MAC was to create the bond device with the MAC address set (e.g. through ip link). In all cases if the bond device is left without any slaves - its MAC gets reset to a random one as before. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d2ada77 |
|
26-Jun-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: remove unnecessary setup_by_slave member We have a member called setup_by_slave in struct bonding to denote if the bond dev has different type than ARPHRD_ETHER, but that is already denoted in bond's netdev type variable if it was setup by the slave, so use that instead of the member. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8599b52e |
|
24-Jun-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: add an option to fail when any of arp_ip_target is inaccessible Currently, we fail only when all of the ips in arp_ip_target are gone. However, in some situations we might need to fail if even one host from arp_ip_target becomes unavailable. All situations, obviously, rely on the idea that we need *completely* functional network, with all interfaces/addresses working correctly. One real world example might be: vlans on top on bond (hybrid port). If bond and vlans have ips assigned and we have their peers monitored via arp_ip_target - in case of switch misconfiguration (trunk/access port), slave driver malfunction or tagged/untagged traffic dropped on the way - we will be able to switch to another slave. Though any other configuration needs that if we need to have access to all arp_ip_targets. This patch adds this possibility by adding a new parameter - arp_all_targets (both as a module parameter and as a sysfs knob). It can be set to: 0 or any (the default) - which works exactly as it's working now - the slave is up if any of the arp_ip_targets are up. 1 or all - the slave is up if all of the arp_ip_targets are up. This parameter can be changed on the fly (via sysfs), and requires the mode to be active-backup and arp_validate to be enabled (it obeys the arp_validate config on which slaves to validate). Internally it's done through: 1) Add target_last_arp_rx[BOND_MAX_ARP_TARGETS] array to slave struct. It's an array of jiffies, meaning that slave->target_last_arp_rx[i] is the last time we've received arp from bond->params.arp_targets[i] on this slave. 2) If we successfully validate an arp from bond->params.arp_targets[i] in bond_validate_arp() - update the slave->target_last_arp_rx[i] with the current jiffies value. 3) When getting slave's last_rx via slave_last_rx(), we return the oldest time when we've received an arp from any address in bond->params.arp_targets[]. If the value of arp_all_targets == 0 - we still work the same way as before. Also, update the documentation to reflect the new parameter. v3->v4: Kill the forgotten rtnl_unlock(), rephrase the documentation part to be more clear, don't fail setting arp_all_targets if arp_validate is not set - it has no effect anyway but can be easier to set up. Also, print a warning if the last arp_ip_target is removed while the arp_interval is on, but not the arp_validate. v2->v3: Use _bh spinlock, remove useless rtnl_lock() and use jiffies for new arp_ip_target last arp, instead of slave_last_rx(). On bond_enslave(), use the same initialization value for target_last_arp_rx[] as is used for the default last_arp_rx, to avoid useless interface flaps. Also, instead of failing to remove the last arp_ip_target just print a warning - otherwise it might break existing scripts. v1->v2: Correctly handle adding/removing hosts in arp_ip_target - we need to shift/initialize all slave's target_last_arp_rx. Also, don't fail module loading on arp_all_targets misconfiguration, just disable it, and some minor style fixes. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aeea64ac |
|
24-Jun-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: don't trust arp requests unless active slave really works Currently, if we receive any arp packet on a backup slave in active-backup mode and arp_validate enabled, we suppose that it's an arp request, swap source/target ip and try to validate it. This optimization gives us virtually no downtime in the most common situation (active and backup slaves are in the same broadcast domain and the active slave failed). However, if we can't reach the arp_ip_target(s), we end up in an endless loop of reselecting slaves, because we receive our arp requests, sent by the active slave, and think that backup slaves are up, thus selecting them as active and, again, sending arp requests, which fool our backup slaves. Fix this by not validating the swapped arp packets if the current active slave didn't receive any arp reply after it was selected as active. This way we will only accept arp requests if we know that the current active slave can actually reach arp_ip_target. v3->v4: Obey 80 lines and make checkpatch.pl happy, per Sergei's suggestion. v1->v3: No change. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2c146102 |
|
24-Jun-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: don't validate arp if we don't have to Currently, we validate all the incoming arps if arp_validate not 0. However, we don't have to validate backup slaves if arp_validate == active and vice versa, so return early in bond_arp_rcv() in these cases. It works correctly now because we verify arp_validate in slave_last_rx(), however we're just doing useless work in bond_arp_rcv(). Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0afee4e8 |
|
24-Jun-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: don't add duplicate targets to arp_ip_target Print a warning and skip them. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
87a7b84b |
|
24-Jun-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: add helper function bond_get_targets_ip(targets, ip) Add function bond_get_targets_ip(targets, ip) which searches through targets array of ips (arp_targets) and returns the position of first match. If ip == 0, returns the first free slot. On failure to find the ip or free slot, return -1. Use it to verify if the arp we've received is valid and in sysfs. v1->v2: Fix "[2/6] bonding: add helper function bond_get_targets_ip(targets, ip)", per Nikolay's advice, to verify if source ip != 0.0.0.0, otherwise we might update 'null' arp_ip_targets' last_rx. Also, address style. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
db4e9b2b |
|
20-Jun-2013 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: fix slave speed reporting in bond_miimon_commit When we have BOND_LINK_UP the speed is reported unconditionally with %u format although it can be SPEED_UNKNOWN (-1). After this patch it returns 0 in that case in an attempt to keep the existing scripts happy. One line is intenionally left 81 chars because it gets ugly if broken. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4f5474e7 |
|
11-Jun-2013 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: fix igmp_retrans type and two related races First the type of igmp_retrans (which is the actual counter of igmp_resend parameter) is changed to u8 to be able to store values up to 255 (as per documentation). There are two races that were hidden there and which are easy to trigger after the previous fix, the first is between bond_resend_igmp_join_requests and bond_change_active_slave where igmp_retrans is set and can be altered by the periodic. The second race condition is between multiple running instances of the periodic (upon execution it can be scheduled again for immediate execution which can cause the counter to go < 0 which in the unsigned case leads to unnecessary igmp retransmissions). Since in bond_change_active_slave bond->lock is held for reading and curr_slave_lock for writing, we use curr_slave_lock for mutual exclusion. We can't drop them as there're cases where RTNL is not held when bond_change_active_slave is called. RCU is unlocked in bond_resend_igmp_join_requests before getting curr_slave_lock since we don't need it there and it's pointless to delay. The decrement is moved inside the "if" block because if we decrement unconditionally there's still a possibility for a race condition although it is much more difficult to hit (many changes have to happen in a very short period in order to trigger) which in the case of 3 parallel running instances of this function and igmp_retrans == 1 (with check bond->igmp_retrans-- > 1) is: f1 passes, doesn't re-schedule, but decrements - igmp_retrans = 0 f2 then passes, doesn't re-schedule, but decrements - igmp_retrans = 255 f3 does the unnecessary retransmissions. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b8fad459 |
|
11-Jun-2013 |
Nikolay Aleksandrov <nikolay@redhat.com> |
bonding: reset master mac on first enslave failure If the bond device is supposed to get the first slave's MAC address and the first enslavement fails then we need to reset the master's MAC otherwise it will stay the same as the failed slave device. We do it after err_undo_flags since that is the first place where the MAC can be changed and we check if it should've been the first slave and if the bond's MAC was set to it because that err place is used by multiple locations prior to changing the master's MAC address. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1b5acd29 |
|
31-May-2013 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: disallow change of MAC if fail_over_mac enabled Currently, if fail_over_mac is set to active, then attempts to change the MAC of the bond itself silently fail. However, if fail_over_mac is set to follow, changes are permitted. Permitting the bond's MAC to change with fail_over_mac=follow will disrupt the follow functionality, which normally controls the assignment of MAC address to the bond and its slaves, and can cause multiple ports to be assigned the same MAC address. which will interfere with the functioning of the device (where the device here is a virtualization-aware card for s390, qeth). Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
303d1cbf |
|
31-May-2013 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Convert hw addr handling to sync/unsync, support ucast addresses This patch converts bonding to use the dev_uc/mc_sync and dev_uc/mc_sync_multiple functions for updating the hardware addresses of bonding slaves. The existing functions to add or remove addresses are removed, and their functionality is replaced with calls to dev_mc_sync or dev_mc_sync_multiple, depending upon the bonding mode. Calls to dev_uc_sync and dev_uc_sync_multiple are also added, so that unicast addresses added to a bond will be properly synced with its slaves. Various functions are renamed to better reflect the new situation, and relevant comments are updated. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
351638e7 |
|
27-May-2013 |
Jiri Pirko <jiri@resnulli.us> |
net: pass info struct via netdevice notifier So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a5c5fd4 |
|
17-May-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: arp_ip_count and arp_targets can be wrong When getting arp_ip_targets if we encounter a bad IP, arp_ip_count still gets increased and all the targets after the wrong one will not be probed if arp_interval is enabled after that (unless a new IP target is added through sysfs) because of the zero entry, in this case reading arp_ip_target through sysfs will show valid targets even if there's a zero entry. Example: 1.2.3.4,4.5.6.7,blah,5.6.7.8 When retrieving the list from arp_ip_target the output would be: 1.2.3.4,4.5.6.7,5.6.7.8 but there will be a 0 entry between 4.5.6.7 and 5.6.7.8. If arp_interval is enabled after that 5.6.7.8 will never be checked because of that. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
acca2674 |
|
17-May-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: replace %x with %pI4 for IPv4 addresses There're few pr_debug() places that can provide the IPv4 address in dotted decimal format instead which is more helpful. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b0ce3508 |
|
16-May-2013 |
Eric Dumazet <edumazet@google.com> |
bonding: allow TSO being set on bonding master In some situations, we need to disable TSO on bonding slaves. bonding device automatically unset TSO in bond_fix_features(), and performance is not good because : 1) We consume more cpu cycles. 2) GSO segmentation has some bugs leading to out of order TCP packets if this segmentation is done before virtual device. This particular problem will be addressed in a separate patch. This patch allows TSO being set/unset on the bonding master, so that GSO segmentation is done after bonding layer. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Michał Mirosław <mirqus@gmail.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Maciej Żenczykowski <maze@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c6cdcf6d |
|
22-Apr-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: fix locking in enslave failure path In commit 3c5913b53fefc9d9e15a2d0f93042766658d9f3f ("bonding: primary_slave & curr_active_slave are not cleaned on enslave failure") I didn't account for the use of curr_active_slave without curr_slave_lock and since there are such users, we should hold bond->lock for writing while setting it to NULL (in the NULL case we don't need the curr_slave_lock). Keeping the bond lock as to avoid the extra release/acquire cycle. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d632ce98 |
|
18-Apr-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: in bond_mc_swap() bond's mc addr list is walked without lock Use netif_addr_lock_bh() to acquire the appropriate lock before walking. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fc7a72ac |
|
18-Apr-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: disable netpoll on enslave failure slave_disable_netpoll() is not called upon enslave failure which would lead to a memory leak. Call slave_disable_netpoll() after err_detach as that's the first error path after enabling netpoll on that slave. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3c5913b53 |
|
18-Apr-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: primary_slave & curr_active_slave are not cleaned on enslave failure On enslave failure primary_slave can point to new_slave which is to be freed, and the same applies to curr_active_slave. So check if this is the case and clean up properly after err_detach because that's the first error code path after they're set. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a506e7b4 |
|
18-Apr-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: vlans don't get deleted on enslave failure The main problem is with vid refcount which only gets bumped up. Delete the vlans after err_detach as that's the first error path after the vlans are added. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
25e40305 |
|
18-Apr-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: mc addresses don't get deleted on enslave failure Add bond_mc_list_flush() after err_detach as that's the first error path after the addresses are added. The main issue is the mc addresses' refcount which only gets bumped up. v2: update log message and don't move code unnecessarily Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bb5b052f |
|
16-Apr-2013 |
Andy Gospodarek <andy@greyhouse.net> |
bond: add support to read speed and duplex via ethtool This patch adds support for the get_settings ethtool op to the bonding driver. This was motivated by users who wanted to get the speed of the bond and compare that against throughput to understand utilization. The behavior before this patch was added was problematic when computing line utilization after trying to get link-speed and throughput via SNMP. Output from ethtool looks like this for a round-robin bond: Settings for bond0: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Speed: 11000Mb/s Duplex: Full Port: Other PHYAD: 0 Transceiver: internal Auto-negotiation: off MDI-X: Unknown Link detected: yes I tested this and verified it works as expected. A test was also done on a version backported to an older kernel and it worked well there. v2: Switch to using ethtool_cmd_speed_set to set speed, added check to SLAVE_IS_OK for each slave in bond, dropped mode-specific calculations as they were not needed, and set port type to 'Other.' v3: Fix useless assignment and checkpatch warning. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1fd9b1fc |
|
18-Apr-2013 |
Patrick McHardy <kaber@trash.net> |
net: vlan: prepare for 802.1ad support Make the encapsulation protocol value a property of VLAN devices and change the device lookup functions to take the protocol value into account. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
80d5c368 |
|
18-Apr-2013 |
Patrick McHardy <kaber@trash.net> |
net: vlan: prepare for 802.1ad VLAN filtering offload Change the rx_{add,kill}_vid callbacks to take a protocol argument in preparation of 802.1ad support. The protocol argument used so far is always htons(ETH_P_8021Q). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f646968f |
|
18-Apr-2013 |
Patrick McHardy <kaber@trash.net> |
net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_* Rename the hardware VLAN acceleration features to include "CTAG" to indicate that they only support CTAGs. Follow up patches will introduce 802.1ad server provider tagging (STAGs) and require the distinction for hardware not supporting acclerating both. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4394542c |
|
15-Apr-2013 |
Eric Dumazet <edumazet@google.com> |
bonding: fix l23 and l34 load balancing in forwarding path Since commit 6b923cb7188d46 (bonding: support for IPv6 transmit hashing) bonding doesn't properly hash traffic in forwarding setups. Vitaly V. Bursov diagnosed that skb_network_header_len() returned 0 in this case. More generally, the transport header might not be in the skb head. Use pskb_may_pull() & skb_header_pointer() to get it right, and use proto_ports_offset() in bond_xmit_hash_policy_l34() to get support for more protocols than TCP and UDP. Reported-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: John Eaglesham <linux@8192.net> Tested-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b6a5a7b9 |
|
11-Apr-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: IFF_BONDING is not stripped on enslave failure While enslaving a new device and after IFF_BONDING flag is set, in case of failure it is not stripped from the device's priv_flags while cleaning up, which could lead to other problems. Cleaning at err_close because the flag is set after dev_open(). v2: no change Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6101391d |
|
11-Apr-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: fix netdev event NULL pointer dereference In commit 471cb5a33dcbd7c529684a2ac7ba4451414ee4a7 ("bonding: remove usage of dev->master") a bug was introduced which causes a NULL pointer dereference. If a bond device is in mode 6 (ALB) and a slave is added it will dereference a NULL pointer in bond_slave_netdev_event(). This is because in bond_enslave we have bond_alb_init_slave() which changes the MAC address of the slave and causes a NETDEV_CHANGEADDR. Then we have in bond_slave_netdev_event(): struct slave *slave = bond_slave_get_rtnl(slave_dev); struct bonding *bond = slave->bond; bond_slave_get_rtnl() dereferences slave_dev->rx_handler_data which at that time is NULL since netdev_rx_handler_register() is called later. This is fixed by checking if slave is NULL before dereferencing it. v2: Comment style changed. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69b0216a |
|
05-Apr-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: fix bonding_masters race condition in bond unloading While the bonding module is unloading, it is considered that after rtnl_link_unregister all bond devices are destroyed but since no synchronization mechanism exists, a new bond device can be created via bonding_masters before unregister_pernet_subsys which would lead to multiple problems (e.g. NULL pointer dereference, wrong RIP, list corruption). This patch fixes the issue by removing any bond devices left in the netns after bonding_masters is removed from sysfs. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ffcdedb6 |
|
05-Apr-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
Revert "bonding: remove sysfs before removing devices" This reverts commit 4de79c737b200492195ebc54a887075327e1ec1d. This patch introduces a new bug which causes access to freed memory. In bond_uninit: list_del(&bond->bond_list); bond_list is linked in bond_net's dev_list which is freed by unregister_pernet_subsys. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4de79c73 |
|
02-Apr-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: remove sysfs before removing devices We have a race condition if we try to rmmod bonding and simultaneously add a bond master through sysfs. In bonding_exit() we first remove the devices (through rtnl_link_unregister() ) and only after that we remove the sysfs. If we manage to add a device through sysfs after that the devices were removed - we'll end up with that device/sysfs structure and with the module unloaded. Fix this by first removing the sysfs and only after that calling rtnl_link_unregister(). Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fcd99434 |
|
01-Apr-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: get netdev_rx_handler_unregister out of locks Now that netdev_rx_handler_unregister contains synchronize_net(), we need to call it outside of bond->lock, cause it might sleep. Also, remove the already unneded synchronize_net(). Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ad999eee |
|
25-Mar-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: cleanup unneeded rcu_read_lock() bond_resend_igmp_join_requests_delayed() calls _resend_igmp_join_requests() under rcu_read_lock(), while it gets its own rcu_read_lock() for the whole function. Remove the lock from the _delayed function. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
876254ae |
|
12-Mar-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: don't call update_speed_duplex() under spinlocks bond_update_speed_duplex() might sleep while calling underlying slave's routines. Move it out of atomic context in bond_enslave() and remove it from bond_miimon_commit() - it was introduced by commit 546add79, however when the slave interfaces go up/change state it's their responsibility to fire NETDEV_UP/NETDEV_CHANGE events so that bonding can properly update their speed. I've tested it on all combinations of ifup/ifdown, autoneg/speed/duplex changes, remote-controlled and local, on (not) MII-based cards. All changes are visible. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
80028ea1 |
|
06-Mar-2013 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: fire NETDEV_RELEASE event only on 0 slaves Currently, if we set up netconsole over bonding and release a slave, netconsole will stop logging on the whole bonding device. Change the behavior to stop the netconsole only when the last slave is released. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c8c4e4c |
|
25-Feb-2013 |
Jiri Pirko <jiri@resnulli.us> |
bond: check if slave count is 0 in case when deciding to take slave's mac in bond_enslave(), check slave_cnt before actually using slave address. introduced by: commit 409cc1f8a41 (bond: have random dev address by default instead of zeroes) Reported-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3f92b63 |
|
18-Feb-2013 |
Doug Goldstein <cardoe@cardoe.com> |
bonding: set sysfs device_type to 'bond' Sets the sysfs device_type to 'bond' for udev. This allows udev rules to be created for bond devices. This is similar to how other network devices set their device_type. Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0896341a |
|
18-Feb-2013 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: fix bond_release_all inconsistencies This patch fixes the following inconsistencies in bond_release_all: - IFF_BONDING flag is not stripped from slaves - MTU is not restored - no netdev notifiers are sent Instead of trying to keep bond_release and bond_release_all in sync I think we can re-use bond_release as the environment for calling it is correct (RTNL is held). I have been running tests for the past week and they came out successful. The only way for bond_release to fail is for the slave to be attached in a different bond or to not be a slave but that cannot happen as RTNL is held and no slave manipulations can be achieved. V2: As suggested bond_release is renamed to __bond_release_one with a new parameter "all" introduced so to avoid calling unnecessary code while destroying a bond, and a wrapper for it called bond_release is created because of ndo_del_link. bond_release_all() is removed. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2cde6acd |
|
11-Feb-2013 |
Neil Horman <nhorman@tuxdriver.com> |
netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock __netpoll_rcu_free is used to free netpoll structures when the rtnl_lock is already held. The mechanism is used to asynchronously call __netpoll_cleanup outside of the holding of the rtnl_lock, so as to avoid deadlock. Unfortunately, __netpoll_cleanup modifies pointers (dev->np), which means the rtnl_lock must be held while calling it. Further, it cannot be held, because rcu callbacks may be issued in softirq contexts, which cannot sleep. Fix this by converting the rcu callback to a work queue that is guaranteed to get scheduled in process context, so that we can hold the rtnl properly while calling __netpoll_cleanup Tested successfully by myself. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> CC: Cong Wang <amwang@redhat.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
387ff911 |
|
31-Jan-2013 |
Gao feng <gaofeng@cn.fujitsu.com> |
netns: bond: allow unprivileged users to control bond device reduce the permission check of bond device's ioctl. allow the userns root to control the bond device. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
409cc1f8 |
|
30-Jan-2013 |
Jiri Pirko <jiri@resnulli.us> |
bond: have random dev address by default instead of zeroes Makes more sense to have randomly generated address by default than to have all zeroes. It also allows user to for example put the bond into bridge without need to have any slaves in it. Also note that this changes only behaviour of bonds with no slaves. Once the first slave device is enslaved, its address will be used (no change here). Also, fix dev_assign_type values on the way. Reported-by: Pavel Šimerda <psimerda@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7826d43f |
|
05-Jan-2013 |
Jiri Pirko <jiri@resnulli.us> |
ethtool: fix drvinfo strings set in drivers Use strlcpy where possible to ensure the string is \0 terminated. Use always sizeof(string) instead of 32, ETHTOOL_BUSINFO_LEN and custom defines. Use snprintf instead of sprint. Remove unnecessary inits of ->fw_version Remove unnecessary inits of drvinfo struct. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
471cb5a3 |
|
03-Jan-2013 |
Jiri Pirko <jiri@resnulli.us> |
bonding: remove usage of dev->master Benefit from new upper dev list and free bonding from dev->master usage. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cfb6f99d |
|
13-Dec-2012 |
Konstantin Khlebnikov <khlebnikov@openvz.org> |
bonding: do not cancel works in bond_uninit() Bonding initializes these works in bond_open() and cancels in bond_close(), thus in bond_uninit() they are already canceled but may be unitialized yet. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Nikolay Aleksandrov <nikolay@redhat.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c772dde3 |
|
06-Dec-2012 |
Ben Hutchings <bhutchings@solarflare.com> |
bonding: Fix check for ethtool get_link operation support Since commit 2c60db037034 ('net: provide a default dev->ethtool_ops') all devices have a non-null ethtool_ops. Test only dev->ethtool_ops->get_link in both places where we care. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
90fb6250 |
|
28-Nov-2012 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: make arp_ip_target parameter checks consistent with sysfs The module can be loaded with arp_ip_target="255.255.255.255" which makes it impossible to remove as the function in sysfs checks for that value, so we make the parameter checks consistent with sysfs. v2: Fix formatting v3: Make description text < 75 columns Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fbb0c41b |
|
28-Nov-2012 |
nikolay@redhat.com <nikolay@redhat.com> |
bonding: fix miimon and arp_interval delayed work race conditions First I would give three observations which will be used later. Observation 1: if (delayed_work_pending(wq)) cancel_delayed_work(wq) This usage is wrong because the pending bit is cleared just before the work's fn is executed and if the function re-arms itself we might end up with the work still running. It's safe to call cancel_delayed_work_sync() even if the work is not queued at all. Observation 2: Use of INIT_DELAYED_WORK() Work needs to be initialized only once prior to (de/en)queueing. Observation 3: IFF_UP is set only after ndo_open is called Related race conditions: 1. Race between bonding_store_miimon() and bonding_store_arp_interval() Because of Obs.1 we can end up having both works enqueued. 2. Multiple races with INIT_DELAYED_WORK() Since the works are not protected by anything between INIT_DELAYED_WORK() and calls to (en/de)queue it is possible for races between the following functions: (races are also possible between the calls to INIT_DELAYED_WORK() and workqueue code) bonding_store_miimon() - bonding_store_arp_interval(), bond_close(), bond_open(), enqueued functions bonding_store_arp_interval() - bonding_store_miimon(), bond_close(), bond_open(), enqueued functions 3. By Obs.1 we need to change bond_cancel_all() Bugs 1 and 2 are fixed by moving all work initializations in bond_open which by Obs. 2 and Obs. 3 and the fact that we make sure that all works are cancelled in bond_close(), is guaranteed not to have any work enqueued. Also RTNL lock is now acquired in bonding_store_miimon/arp_interval so they can't race with bond_close and bond_open. The opposing work is cancelled only if the IFF_UP flag is set and it is cancelled unconditionally. The opposing work is already cancelled if the interface is down so no need to cancel it again. This way we don't need new synchronizations for the bonding workqueue. These bugs (and fixes) are tied together and belong in the same patch. Note: I have left 1 line intentionally over 80 characters (84) because I didn't like how it looks broken down. If you'd prefer it otherwise, then simply break it. v2: Make description text < 75 columns Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4e591b93 |
|
21-Nov-2012 |
Michal Kubeček <mkubecek@suse.cz> |
bonding: in balance-rr mode, set curr_active_slave only if it is up If all slaves of a balance-rr bond with ARP monitor are enslaved with down link state, bond keeps down state even after slaves go up. This is caused by bond_enslave() setting curr_active_slave to first slave not taking into account its link state. As bond_loadbalance_arp_mon() uses curr_active_slave to identify whether slave's down->up transition should update bond's link state, bond stays down even if slaves are up (until first slave goes from up to down at least once). Before commit f31c7937 "bonding: start slaves with link down for ARP monitor", this was masked by slaves always starting in UP state with ARP monitor (and MII monitor not relying on curr_active_slave being NULL if there is no slave up). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0e376bd0 |
|
20-Nov-2012 |
Sarveshwar Bandi <sarveshwar.bandi@emulex.com> |
bonding: Bonding driver does not consider the gso_max_size/gso_max_segs setting of slave devices. Patch sets the lowest gso_max_size and gso_max_segs values of the slave devices during enslave and detach. Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
55462cf3 |
|
13-Oct-2012 |
Jiri Pirko <jiri@resnulli.us> |
vlan: fix bond/team enslave of vlan challenged slave/port In vlan_uses_dev() check for number of vlan devs rather than existence of vlan_info. The reason is that vlan id 0 is there without appropriate vlan dev on it by default which prevented from enslaving vlan challenged dev. Reported-by: Jon Stanley <jstanley@rmrf.net> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
49ee4920 |
|
03-Oct-2012 |
Eric Dumazet <edumazet@google.com> |
bonding: set qdisc_tx_busylock to avoid LOCKDEP splat If a qdisc is installed on a bonding device, its possible to get following lockdep splat under stress : ============================================= [ INFO: possible recursive locking detected ] 3.6.0+ #211 Not tainted --------------------------------------------- ping/4876 is trying to acquire lock: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 but task is already holding lock: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock); lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock); *** DEADLOCK *** May be due to missing lock nesting notation 6 locks held by ping/4876: #0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e5030>] raw_sendmsg+0x600/0xc30 #1: (rcu_read_lock_bh){.+....}, at: [<ffffffff815ba4bd>] ip_finish_output+0x12d/0x870 #2: (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830 #3: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 #4: (&bond->lock){++.?..}, at: [<ffffffffa02128c1>] bond_start_xmit+0x31/0x4b0 [bonding] #5: (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830 stack backtrace: Pid: 4876, comm: ping Not tainted 3.6.0+ #211 Call Trace: [<ffffffff810a0145>] __lock_acquire+0x715/0x1b80 [<ffffffff810a256b>] ? mark_held_locks+0x9b/0x100 [<ffffffff810a1bf2>] lock_acquire+0x92/0x1d0 [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830 [<ffffffff81726b7c>] _raw_spin_lock+0x3c/0x50 [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830 [<ffffffff8106264d>] ? rcu_read_lock_bh_held+0x5d/0x90 [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570 [<ffffffffa0212a6a>] bond_start_xmit+0x1da/0x4b0 [bonding] [<ffffffff815796d0>] dev_hard_start_xmit+0x240/0x6b0 [<ffffffff81597c6e>] sch_direct_xmit+0xfe/0x2a0 [<ffffffff8157a249>] dev_queue_xmit+0x199/0x830 [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570 [<ffffffff815ba96f>] ip_finish_output+0x5df/0x870 [<ffffffff815ba4bd>] ? ip_finish_output+0x12d/0x870 [<ffffffff815bb964>] ip_output+0x54/0xf0 [<ffffffff815bad48>] ip_local_out+0x28/0x90 [<ffffffff815bc444>] ip_send_skb+0x14/0x50 [<ffffffff815bc4b2>] ip_push_pending_frames+0x32/0x40 [<ffffffff815e536a>] raw_sendmsg+0x93a/0xc30 [<ffffffff8128d570>] ? selinux_file_send_sigiotask+0x1f0/0x1f0 [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80 [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220 [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80 [<ffffffff815f6855>] inet_sendmsg+0x125/0x240 [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220 [<ffffffff8155cddb>] sock_sendmsg+0xab/0xe0 [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0 [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0 [<ffffffff8155d18c>] __sys_sendmsg+0x37c/0x390 [<ffffffff81195b2a>] ? fsnotify+0x2ca/0x7e0 [<ffffffff811958e8>] ? fsnotify+0x88/0x7e0 [<ffffffff81361f36>] ? put_ldisc+0x56/0xd0 [<ffffffff8116f98a>] ? fget_light+0x3da/0x510 [<ffffffff8155f6c4>] sys_sendmsg+0x44/0x80 [<ffffffff8172fc22>] system_call_fastpath+0x16/0x1b Avoid this problem using a distinct lock_class_key for bonding devices. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
da210f55 |
|
29-Aug-2012 |
Jiri Bohac <jbohac@suse.cz> |
bonding: add some slack to arp monitoring time limits Currently, all the time limits in the bonding ARP monitor are in multiples of arp_interval -- the time interval at which the ARP monitor is periodically scheduled. With a fast network round-trip and a little scheduling latency of the ARP monitor work, a limit of n*delta_in_ticks may effectively mean (n-1)*delta_in_ticks. This is fatal in case of n==1 (the link will stay down forever) and makes the behaviour non-deterministic in all the other cases. Add a delta_in_ticks/2 time slack to all the time limits. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b923cb7 |
|
21-Aug-2012 |
John Eaglesham <linux@8192.net> |
bonding: support for IPv6 transmit hashing Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. In addition, bounds checking has been added to all transmit hashing functions. The algorithm chosen (xor'ing the bottom three quads of the source and destination addresses together, then xor'ing each byte of that result into the bottom byte, finally xor'ing with the last bytes of the MAC addresses) was selected after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. Its behavior also attempts to closely match that of the IPv4 algorithm. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. Tested-by: John Eaglesham <linux@8192.net> Signed-off-by: John Eaglesham <linux@8192.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e15c3c22 |
|
09-Aug-2012 |
Amerigo Wang <amwang@redhat.com> |
netpoll: check netpoll tx status on the right device Although this doesn't matter actually, because netpoll_tx_running() doesn't use the parameter, the code will be more readable. For team_dev_queue_xmit() we have to move it down to avoid compile errors. Cc: David Miller <davem@davemloft.net> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
38e6bc18 |
|
09-Aug-2012 |
Amerigo Wang <amwang@redhat.com> |
netpoll: make __netpoll_cleanup non-block Like the previous patch, slave_disable_netpoll() and __netpoll_cleanup() may be called with read_lock() held too, so we should make them non-block, by moving the cleanup and kfree() to call_rcu_bh() callbacks. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
47be03a2 |
|
09-Aug-2012 |
Amerigo Wang <amwang@redhat.com> |
netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup() slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Eric suggested to pass gfp flags to __netpoll_setup(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b7bc2a5b |
|
09-Aug-2012 |
Amerigo Wang <amwang@redhat.com> |
net: remove netdev_bonding_change() I don't see any benifits to use netdev_bonding_change() than using call_netdevice_notifiers() directly. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df4ab5b3 |
|
19-Jul-2012 |
Jiri Pirko <jiri@resnulli.us> |
net: rename bond_queue_mapping to slave_dev_queue_mapping As this is going to be used not only by bonding. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d40156aa |
|
19-Jul-2012 |
Jiri Pirko <jiri@resnulli.us> |
rtnl: allow to specify different num for rx and tx queue count Also cut out unused function parameters and possible err in return value. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b6fe83e9 |
|
16-Jul-2012 |
Eric Dumazet <edumazet@google.com> |
bonding: refine IFF_XMIT_DST_RELEASE capability Some workloads greatly benefit of IFF_XMIT_DST_RELEASE capability on output net device, avoiding dirtying dst refcount. bonding currently disables IFF_XMIT_DST_RELEASE unconditionally. If all slaves have the IFF_XMIT_DST_RELEASE bit set, then bonding master can also have it in its priv_flags Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
30fdd8a0 |
|
16-Jul-2012 |
Jiri Pirko <jiri@resnulli.us> |
netpoll: move np->dev and np->dev_name init into __netpoll_setup() Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a64d49c3 |
|
09-Jul-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
bonding: Manage /proc/net/bonding/ entries from the netdev events It was recently reported that moving a bonding device between network namespaces causes warnings from /proc. It turns out after the move we were trying to add and to remove the /proc/net/bonding entries from the wrong network namespace. Move the bonding /proc registration code into the NETDEV_REGISTER and NETDEV_UNREGISTER events where the proc registration and unregistration will always happen at the right time. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
04502430 |
|
12-Jun-2012 |
Eric Dumazet <edumazet@google.com> |
bonding: drop_monitor aware When packets are dropped in TX path, its better to use kfree_skb() instead of dev_kfree_skb() to give proper drop_monitor events. Also move the kfree_skb() call after read_unlock() in bond_alb_xmit() and bond_xmit_activebackup() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
de063b70 |
|
11-Jun-2012 |
Eric Dumazet <edumazet@google.com> |
bonding: remove packet cloning in recv_probe() Cloning all packets in input path have a significant cost. Use skb_header_pointer()/skb_copy_bits() instead of pskb_may_pull() so that recv_probe handlers (bond_3ad_lacpdu_recv / bond_arp_rcv / rlb_arp_recv ) dont touch input skb. bond_handle_frame() can avoid the skb_clone()/dev_kfree_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5ee31c68 |
|
12-Jun-2012 |
Eric Dumazet <edumazet@google.com> |
bonding: Fix corrupted queue_mapping In the transmit path of the bonding driver, skb->cb is used to stash the skb->queue_mapping so that the bonding device can set its own queue mapping. This value becomes corrupted since the skb->cb is also used in __dev_xmit_skb. When transmitting through bonding driver, bond_select_queue is called from dev_queue_xmit. In bond_select_queue the original skb->queue_mapping is copied into skb->cb (via bond_queue_mapping) and skb->queue_mapping is overwritten with the bond driver queue. Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes the packet length into skb->cb, thereby overwriting the stashed queue mappping. In bond_dev_queue_xmit (called from hard_start_xmit), the queue mapping for the skb is set to the stashed value which is now the skb length and hence is an invalid queue for the slave device. If we want to save skb->queue_mapping into skb->cb[], best place is to add a field in struct qdisc_skb_cb, to make sure it wont conflict with other layers (eg : Qdiscc, Infiniband...) This patchs also makes sure (struct qdisc_skb_cb)->data is aligned on 8 bytes : netem qdisc for example assumes it can store an u64 in it, without misalignment penalty. Note : we only have 20 bytes left in (struct qdisc_skb_cb)->data[]. The largest user is CHOKe and it fills it. Based on a previous patch from Tom Herbert. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Tom Herbert <therbert@google.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Roland Dreier <roland@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e42e474 |
|
09-May-2012 |
Joe Perches <joe@perches.com> |
drivers/net: Convert compare_ether_addr to ether_addr_equal Use the new bool function ether_addr_equal to add some clarity and reduce the likelihood for misuse of compare_ether_addr for sorting. Done via cocci script: $ cat compare_ether_addr.cocci @@ expression a,b; @@ - !compare_ether_addr(a, b) + ether_addr_equal(a, b) @@ expression a,b; @@ - compare_ether_addr(a, b) + !ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) == 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) != 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) == 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) != 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !!ether_addr_equal(a, b) + ether_addr_equal(a, b) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
13a8e0c8 |
|
08-May-2012 |
Jiri Bohac <jbohac@suse.cz> |
bonding: don't increase rx_dropped after processing LACPDUs Since commit 3aba891d, bonding processes LACP frames (802.3ad mode) with bond_handle_frame(). Currently a copy of the skb is made and the original is left to be processed by other rx_handlers and the rest of the network stack by returning RX_HANDLER_ANOTHER. As there is no protocol handler for PKT_TYPE_LACPDU, the frame is dropped and dev->rx_dropped increased. Fix this by making bond_handle_frame() return RX_HANDLER_CONSUMED if bonding has processed the LACP frame. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
13b95fb7 |
|
26-Apr-2012 |
Rick Jones <rick.jones2@hp.com> |
bonding: bond_update_speed_duplex() can return void since no callers check its return As none of the callers of bond_update_speed_duplex (need to) check its return value, there is little point in it returning anything. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f31c7937 |
|
16-Apr-2012 |
Michal Kubeček <mkubecek@suse.cz> |
bonding: start slaves with link down for ARP monitor Initialize slave device link state as down if ARP monitor is active and net_carrier_ok() returns zero. Also shift initial value of its last_arp_tx so that it doesn't immediately cause fake detection of "up" state. When ARP monitoring is used, initializing the slave device with up link state can cause ARP monitor to detect link failure before the device is really up (with igb driver, this can take more than two seconds). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
64d683c5 |
|
13-Apr-2012 |
David S. Miller <davem@davemloft.net> |
bonding: Fixup get_tx_queue() op second arg type. I missed this when fixing up the warning in the previous commit. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
efacb309 |
|
10-Apr-2012 |
stephen hemminger <shemminger@vyatta.com> |
rtnetlink & bonding: change args got get_tx_queues Change get_tx_queues, drop unsused arg/return value real_tx_queues, and use return by value (with error) rather than call by reference. Probably bonding should just change to LLTX and the whole get_tx_queues API could disappear! Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a430974 |
|
04-Apr-2012 |
Veaceslav Falico <vfalico@redhat.com> |
bonding: properly unset current_arp_slave on slave link up When a slave comes up, we're unsetting the current_arp_slave without removing active flags from it, which can lead to situations where we have more than one slave with active flags in active-backup mode. To avoid this situation we must remove the active flags from a slave before removing it as a current_arp_slave. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
234bcf8a |
|
03-Apr-2012 |
Shlomo Pongratz <shlomop@mellanox.com> |
net/bonding: correctly proxy slave neigh param setup ndo function The current implemenation was buggy for slaves who use ndo_neigh_setup, since the networking stack invokes the bonding device ndo entry (from neigh_params_alloc) before any devices are enslaved, and the bonding driver can't further delegate the call at that point in time. As a result when bonding IPoIB devices, the neigh_cleanup hasn't been called. Fix that by deferring the actual call into the slave ndo_neigh_setup from the time the bonding neigh_setup is called. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2af73d4b |
|
03-Apr-2012 |
Shlomo Pongratz <shlomop@mellanox.com> |
net/bonding: emit address change event also in bond_release commit 7d26bb103c4 "bonding: emit event when bonding changes MAC" didn't take care to emit the NETDEV_CHANGEADDR event in bond_release, where bonding actually changes the mac address (to all zeroes). As a result the neighbours aren't deleted by the core networking code (which does so upon getting that event). Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7d26bb10 |
|
27-Mar-2012 |
Weiping Pan <wpan@redhat.com> |
bonding: emit event when bonding changes MAC When a bonding device is configured with fail_over_mac=active, we expect to see the MAC address of the new active slave as the source MAC address after failover. But we see that the source MAC address is the MAC address of previous active slave. Emit NETDEV_CHANGEADDR event when bonding changes its MAC address, in order to let arp_netdev_event flush neighbour cache and route cache. How to reproduce this bug ? -----------hostB---------------- hostA ----- switch ---|-- eth0--bond0(192.168.100.2/24)| (192.168.100.1/24 \--|-- eth1-/ | -------------------------------- 1 on hostB, modprobe bonding mode=1 miimon=500 fail_over_mac=active downdelay=1000 num_grat_arp=1 ifconfig bond0 192.168.100.2/24 up ifenslave bond0 eth0 ifenslave bond0 eth1 then eth0 is the active slave, and MAC of bond0 is MAC of eth0. 2 on hostA, ping 192.168.100.2 3 on hostB, tcpdump -i bond0 -p icmp -XXX you will see bond0 uses MAC of eth0 as source MAC in icmp reply. 4 on hostB, ifconfig eth0 down tcpdump -i bond0 -p icmp -XXX (just keep it running in step 3) you will see first bond0 uses MAC of eth1 as source MAC in icmp reply, then it will use MAC of eth0 as source MAC. Signed-off-by: Weiping Pan <wpan@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ffc93f2 |
|
28-Mar-2012 |
David Howells <dhowells@redhat.com> |
Remove all #inclusions of asm/system.h Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
|
#
eaddcd76 |
|
22-Mar-2012 |
Andy Gospodarek <andy@greyhouse.net> |
bonding: remove entries for master_ip and vlan_ip and query devices instead The following patch aimed to resolve an issue where secondary, tertiary, etc. addresses added to bond interfaces could overwrite the bond->master_ip and vlan_ip values. commit 917fbdb32f37e9a93b00bb12ee83532982982df3 Author: Henrik Saavedra Persson <henrik.e.persson@ericsson.com> Date: Wed Nov 23 23:37:15 2011 +0000 bonding: only use primary address for ARP That patch was good because it prevented bonds using ARP monitoring from sending frames with an invalid source IP address. Unfortunately, it didn't always work as expected. When using an ioctl (like ifconfig does) to set the IP address and netmask, 2 separate ioctls are actually called to set the IP and netmask if the mask chosen doesn't match the standard mask for that class of address. The first ioctl did not have a mask that matched the one in the primary address and would still cause the device address to be overwritten. The second ioctl that was called to set the mask would then detect as secondary and ignored, but the damage was already done. This was not an issue when using an application that used netlink sockets as the setting of IP and netmask came down at once. The inconsistent behavior between those two interfaces was something that needed to be resolved. While I was thinking about how I wanted to resolve this, Ralf Zeidler came with a patch that resolved this on a RHEL kernel by keeping a full shadow of the entries in dev->ifa_list for the bonding device and vlan devices in the bonding driver. I didn't like the duplication of the list as I want to see the 'bonding' struct and code shrink rather than grow, but liked the general idea. As the Subject indicates this patch drops the master_ip and vlan_ip elements from the 'bonding' and 'vlan_entry' structs, respectively. This can be done because a device's address-list is now traversed to determine the optimal source IP address for ARP requests and for checks to see if the bonding device has a particular IP address. This code could have all be contained inside the bonding driver, but it made more sense to me to EXPORT and call inet_confirm_addr since it did exactly what was needed. I tested this and a backported patch and everything works as expected. Ralf also helped with verification of the backported patch. Thanks to Ralf for all his help on this. v2: Whitespace and organizational changes based on suggestions from Jay Vosburgh and Dave Miller. v3: Fixup incorrect usage of rcu_read_unlock based on Dave Miller's suggestion. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> CC: Ralf Zeidler <ralf.zeidler@nsn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1c3ac428 |
|
17-Mar-2012 |
Peter Pan(潘卫平) <panweiping3@gmail.com> |
bonding: send igmp report for its master Liang Zheng(lzheng@redhat.com) found that in the following topo, bonding does not send igmp report when we trigger a fail-over of bonding. eth0-- |-- bond0 -- br0 eth1-- modprobe bonding mode=1 miimon=100 resend_igmp=10 ifconfig bond0 up ifenslave bond0 eth0 eth1 brctl addbr br0 ifconfig br0 192.168.100.2/24 up brctl addif br0 bond0 Add 192.168.100.2(br0) into a multicast group, like 224.10.10.10, then trigger a fali-over in bonding. You can see that parameter "resend_igmp" does not work. The reason is that when we add br0 into a multicast group, it does not propagate multicast knowledge down to its ports. If we choose to propagate multicast knowledge down to all ports for bridge, then we have to track every change that is done to bridge, and keep a backup for all ports. It is hard to track, I think. Instead I choose to modify bonding to send igmp report for its master. Changelog: V2: correct comments V3: move this check into bond_resend_igmp_join_requests() V4: only send igmp reports if bond is enslaved to a bridge Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f7d9821a |
|
31-Dec-2011 |
stephen hemminger <shemminger@vyatta.com> |
bonding: fix error handling if slave is busy (v2) If slave device already has a receive handler registered, then the error unwind of bonding device enslave function is broken. The following will leave a pointer to freed memory in the slave device list, causing a later kernel panic. # modprobe dummy # ip li add dummy0-1 link dummy0 type macvlan # modprobe bonding # echo +dummy0 >/sys/class/net/bond0/bonding/slaves The fix is to detach the slave (which removes it from the list) in the unwind path. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
87002b03 |
|
07-Dec-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls This patch adds wrapper for ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid functions. Check for NETIF_F_HW_VLAN_FILTER feature is done in this wrapper. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e586137 |
|
08-Dec-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: make vlan ndo_vlan_rx_[add/kill]_vid return error value Let caller know the result of adding/removing vlan id to/from vlan filter. In some drivers I make those functions to just return 0. But in those where there is able to see if hw setup went correctly, return value is set appropriately. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
917fbdb3 |
|
23-Nov-2011 |
Henrik Saavedra Persson <henrik.e.persson@ericsson.com> |
bonding: only use primary address for ARP Only use the primary address of the bond device for master_ip. This will prevent changing the ARP source address in Active-Backup mode whenever a secondry address is added to the bond device. Signed-off-by: Henrik Saavedra Persson <henrik.e.persson@ericsson.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@drr.davemloft.net>
|
#
34324dc2 |
|
15-Nov-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: remove NETIF_F_NO_CSUM feature bit Only distinct use is checking if NETIF_F_NOCACHE_COPY should be enabled by default. The check heuristics is altered a bit here, so it hits other people than before. The default shouldn't be trusted for performance-critical cases anyway. For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c8f44aff |
|
15-Nov-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: introduce and use netdev_features_t for device features sets v2: add couple missing conversions in drivers split unexporting netdev_fix_features() implemented %pNF convert sock::sk_route_(no?)caps Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
589665f5 |
|
04-Nov-2011 |
Dan Carpenter <dan.carpenter@oracle.com> |
bonding: comparing a u8 with -1 is always false slave->duplex is a u8 type so the in bond_info_show_slave() when we check "if (slave->duplex == -1)", it's always false. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
98f41f69 |
|
31-Oct-2011 |
Weiping Pan <wpan@redhat.com> |
bonding:update speed/duplex for NETDEV_CHANGE Zheng Liang(lzheng@redhat.com) found a bug that if we config bonding with arp monitor, sometimes bonding driver cannot get the speed and duplex from its slaves, it will assume them to be 100Mb/sec and Full, please see /proc/net/bonding/bond0. But there is no such problem when uses miimon. (Take igb for example) I find that the reason is that after dev_open() in bond_enslave(), bond_update_speed_duplex() will call igb_get_settings() , but in that function, it runs ethtool_cmd_speed_set(ecmd, -1); ecmd->duplex = -1; because igb get an error value of status. So even dev_open() is called, but the device is not really ready to get its settings. Maybe it is safe for us to call igb_get_settings() only after this message shows up, that is "igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX". So I prefer to update the speed and duplex for a slave when reseices NETDEV_CHANGE/NETDEV_UP event. Changelog V2: 1 remove the "fake 100/Full" logic in bond_update_speed_duplex(), set speed and duplex to -1 when it gets error value of speed and duplex. 2 delete the warning in bond_enslave() if bond_update_speed_duplex() returns error. 3 make bond_info_show_slave() handle bad values of speed and duplex. Signed-off-by: Weiping Pan <wpan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e6d265e8 |
|
28-Oct-2011 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: eliminate bond_close race conditions This patch resolves two sets of race conditions. Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> reported the first, as follows: The bond_close() calls cancel_delayed_work() to cancel delayed works. It, however, cannot cancel works that were already queued in workqueue. The bond_open() initializes work->data, and proccess_one_work() refers get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when work->data has been initialized. Thus, a panic occurs. He included a patch that converted the cancel_delayed_work calls in bond_close to flush_delayed_work_sync, which eliminated the above problem. His patch is incorporated, at least in principle, into this patch. In this patch, we use cancel_delayed_work_sync in place of flush_delayed_work_sync, and also convert bond_uninit in addition to bond_close. This conversion to _sync, however, opens new races between bond_close and three periodically executing workqueue functions: bond_mii_monitor, bond_alb_monitor and bond_activebackup_arp_mon. The race occurs because bond_close and bond_uninit are always called with RTNL held, and these workqueue functions may acquire RTNL to perform failover-related activities. If bond_close or bond_uninit is waiting in cancel_delayed_work_sync, deadlock occurs. These deadlocks are resolved by having the workqueue functions acquire RTNL conditionally. If the rtnl_trylock() fails, the functions reschedule and return immediately. For the cases that are attempting to perform link failover, a delay of 1 is used; for the other cases, the normal interval is used (as those activities are not as time critical). Additionally, the bond_mii_monitor function now stores the delay in a variable (mimicing the structure of activebackup_arp_mon). Lastly, all of the above renders the kill_timers sentinel moot, and therefore it has been removed. Tested-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
59fdaca9 |
|
24-Oct-2011 |
Maciej Żenczykowski <maze@google.com> |
net: make bonding slaves honour master's skb->priority Signed-off-by: Maciej Żenczykowski <maze@google.com> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4c22400a |
|
12-Oct-2011 |
Eric W. Biederman <ebiederm@xmission.com> |
bonding: Use a per netns implementation of /sys/class/net/bonding_masters. This fixes a network namespace misfeature that bonding_masters looked at current instead of the remembering the context where in which /sys/class/net/bonding_masters was opened in to see which network namespace to act upon. This removes the need for sysfs to handle tagged directories with untagged members allowing for a conceptually simpler sysfs implementation. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d97480b |
|
12-Oct-2011 |
Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> |
bonding: use local function pointer of bond->recv_probe in bond_handle_frame The bond->recv_probe is called in bond_handle_frame() when a packet is received, but bond_close() sets it to NULL. So, a panic occurs when both functions work in parallel. Why this happen: After null pointer check of bond->recv_probe, an sk_buff is duplicated and bond->recv_probe is called in bond_handle_frame. So, a panic occurs when bond_close() is called between the check and call of bond->recv_probe. Patch: This patch uses a local function pointer of bond->recv_probe in bond_handle_frame(). So, it can avoid the null pointer dereference. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a0db2dad |
|
23-Sep-2011 |
Andy Gospodarek <andy@greyhouse.net> |
bonding: properly stop queuing work when requested During a test where a pair of bonding interfaces using ARP monitoring were both brought up and torn down (with an rmmod) repeatedly, a panic in the timer code was noticed. I tracked this down and determined that any of the bonding functions that ran as workqueue handlers and requeued more work might not properly exit when the module was removed. There was a flag protected by the bond lock called kill_timers that is set when the interface goes down or the module is removed, but many of the functions that monitor link status now unlock the bond lock to take rtnl first. There is a chance that another CPU running the rmmod could get the lock and set kill_timers after the first check has passed. This patch does not allow any function to queue work that will make itself run unless kill_timers is not set. I also noticed while doing this work that bond_resend_igmp_join_requests did not have a check for kill_timers, so I added the needed call there as well. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Reported-by: Liang Zheng <lzheng@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4bc71cb9 |
|
02-Sep-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: consolidate and fix ethtool_ops->get_settings calling This patch does several things: - introduces __ethtool_get_settings which is called from ethtool code and from drivers as well. Put ASSERT_RTNL there. - dev_ethtool_get_settings() is replaced by __ethtool_get_settings() - changes calling in drivers so rtnl locking is respected. In iboe_get_rate was previously ->get_settings() called unlocked. This fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same problem. Also fixed by calling __dev_get_by_index() instead of dev_get_by_index() and holding rtnl_lock for both calls. - introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create() so bnx2fc_if_create() and fcoe_if_create() are called locked as they are from other places. - use __ethtool_get_settings() in bonding code Signed-off-by: Jiri Pirko <jpirko@redhat.com> v2->v3: -removed dev_ethtool_get_settings() -added ASSERT_RTNL into __ethtool_get_settings() -prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock around it and __ethtool_get_settings() call v1->v2: add missing export_symbol Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits] Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
afc4b13d |
|
16-Aug-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: remove use of ndo_set_multicast_list in drivers replace it by ndo_set_rx_mode Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d03462b9 |
|
15-Aug-2011 |
Jiri Pirko <jpirko@redhat.com> |
bonding: use ndo_change_rx_flags callback Benefit from use of ndo_change_rx_flags in handling change of promisc and allmulti. No need to store previous state locally. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ba3211cc |
|
15-Aug-2011 |
Peter Pan(潘卫平) <panweiping3@gmail.com> |
bonding:reset backup and inactive flag of slave Eduard Sinelnikov (eduard.sinelnikov@gmail.com) found that if we change bonding mode from active backup to round robin, some slaves are still keeping "backup", and won't transmit packets. As Jay Vosburgh(fubar@us.ibm.com) pointed out that we can work around that by removing the bond_is_active_slave() check, because the "backup" flag is only meaningful for active backup mode. But if we just simply ignore the bond_is_active_slave() check, the transmission will work fine, but we can't maintain the correct value of "backup" flag for each slaves, though it is meaningless for other mode than active backup. I'd like to reset "backup" and "inactive" flag in bond_open, thus we can keep the correct value of them. As for bond_is_active_slave(), I'd like to prepare another patch to handle it. V2: Use C style comment. Move read_lock(&bond->curr_slave_lock). Replace restore with reset, for active backup mode, it means "restore", but for other modes, it means "reset". Signed-off-by: Weiping Pan <panweiping3@gmail.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d5da4510 |
|
10-Aug-2011 |
Jiri Pirko <jpirko@redhat.com> |
bonding: implement get_tx_queues rtnk_link_op If bonding device is created via rtnl, it is created with default number of rx/tx queues. This patch implements callback in bonding so the correct value (previously specified by bonding module param) is used. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b2730f4f |
|
27-Jul-2011 |
Andy Gospodarek <andy@greyhouse.net> |
bonding: reduce noise during init On Tue, Jul 26, 2011 at 05:40:27PM -0700, Joe Perches wrote: > On Tue, 2011-07-26 at 17:37 -0700, Jay Vosburgh wrote: > > Joe Perches <joe@perches.com> wrote: > > >I'd prefer you don't separate the format string > > >into multiple pieces. > > Why not? To me, it looks easier to read split into sections > > that don't wrap lines. > > Harder to grep for a dmesg and the > defect rate of these split formats is > typically higher than single strings > because of bad spacing between string > segments. > I noticed that you took some time back in late 2009 to 'consolidate' the split format-strings present in the bonding driver at the time and I've decided I'm fine to leave them the way they are. The main point of my patch was to change the output and I would like to get that included. Here is my updated patch... Subject: [PATCH net-next-2.6 v2] bonding: reduce noise during init Many are using sysfs to configure bonding rather than module options, so there is no need for bonding to throw this warning in normal cases. Keep the message around when debugging is enabled as it might be useful for someone desperate enough to enable debugging, but eliminate it otherwise. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
550fd08c |
|
26-Jul-2011 |
Neil Horman <nhorman@tuxdriver.com> |
net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared After the last patch, We are left in a state in which only drivers calling ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real hardware call ether_setup for their net_devices and don't hold any state in their skbs. There are a handful of drivers that violate this assumption of course, and need to be fixed up. This patch identifies those drivers, and marks them as not being able to support the safe transmission of skbs by clearning the IFF_TX_SKB_SHARING flag in priv_flags Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Karsten Keil <isdn@linux-pingi.de> CC: "David S. Miller" <davem@davemloft.net> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: Patrick McHardy <kaber@trash.net> CC: Krzysztof Halasa <khc@pm.waw.pl> CC: "John W. Linville" <linville@tuxdriver.com> CC: Greg Kroah-Hartman <gregkh@suse.de> CC: Marcel Holtmann <marcel@holtmann.org> CC: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc0e4070 |
|
19-Jul-2011 |
Jiri Pirko <jpirko@redhat.com> |
bonding: do vlan cleanup Now when all devices are cleaned up, bond can be cleaned up as well - remove bond->vlgrp - remove bond_vlan_rx_register - substitute necessary occurences of vlan_group_get_device Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
62f2a3a4 |
|
13-Jul-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: remove NETIF_F_ALL_TX_OFFLOADS There is no software fallback implemented for SCTP or FCoE checksumming, and so it should not be passed on by software devices like bridge or bonding. For VLAN devices, this is different. First, the driver for underlying device should be prepared to get offloaded packets even when the feature is disabled (especially if it advertises it in vlan_features). Second, devices under VLANs do not get replaced without tearing down the VLAN first. This fixes a mess I accidentally introduced while converting bonding to ndo_fix_features. NETIF_F_SOFT_FEATURES are removed from BOND_VLAN_FEATURES because they are unused as of commit 712ae51afd. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
655f8919 |
|
22-Jun-2011 |
stephen hemminger <shemminger@vyatta.com> |
bonding: add min links parameter to 802.3ad This adds support for a configuring the minimum number of links that must be active before asserting carrier. It is similar to the Cisco EtherChannel min-links feature. This allows setting the minimum number of member ports that must be up (link-up state) before marking the bond device as up (carrier on). This is useful for situations where higher level services such as clustering want to ensure a minimum number of low bandwidth links are active before switchover. See: http://bugzilla.vyatta.com/show_bug.cgi?id=7196 Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
56f8a75c |
|
21-Jun-2011 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
ip: introduce ip_is_fragment helper inline function There are enough instances of this: iph->frag_off & htons(IP_MF | IP_OFFSET) that a helper function is probably warranted. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cefa9993 |
|
19-Jun-2011 |
WANG Cong <amwang@redhat.com> |
netpoll: copy dev name of slaves to struct netpoll Otherwise we will not see the name of the slave dev in error message: [ 388.469446] (null): doesn't support polling, aborting. Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
830a9c75 |
|
10-Jun-2011 |
Jiri Bohac <jbohac@suse.cz> |
bonding: clean up bond_del_vlan() 1) the setting of NETIF_F_VLAN_CHALLENGED in bond_del_vlan() is useless since commit b2a103e6 because bond_fix_features() now sets NETIF_F_VLAN_CHALLENGED whenever the last slave is being removed. 2) the code never triggers anyway as vlan_list is never empty since ad1afb00. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
56d00c67 |
|
08-Jun-2011 |
Peter Pan(潘卫平) <panweiping3@gmail.com> |
bonding:delete lacp_fast from ad_bond_info These is also a bug, that if you modify lacp_rate via sysfs, and add new slaves in bonding, new slaves won't use the latest lacp_rate, since ad_bond_info->lacp_fast is initialized only once, in bond_3ad_initialize(). Since both struct bond_params and ad_bond_info have lacp_fast, they are duplicate and need extra synchronization. bond_3ad_bind_slave() can use bond_params->lacp_fast to initialize port. So we can just remove lacp_fast from struct ad_bond_info. Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
374eeb5a |
|
03-Jun-2011 |
Neil Horman <nhorman@tuxdriver.com> |
bonding: reset queue mapping prior to transmission to physical device (v5) The bonding driver is multiqueue enabled, in which each queue represents a slave to enable optional steering of output frames to given slaves against the default output policy. However, it needs to reset the skb->queue_mapping prior to queuing to the physical device or the physical slave (if it is multiqueue) could wind up transmitting on an unintended tx queue Change Notes: v2) Based on first pass review, updated the patch to restore the origional queue mapping that was found in bond_select_queue, rather than simply resetting to zero. This preserves the value of queue_mapping when it was set on receive in the forwarding case which is desireable. v3) Fixed spelling an casting error in skb->cb v4) fixed to store raw queue_mapping to avoid double decrement v5) Eric D requested that ->cb access be wrapped in a macro. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6f92c66f |
|
01-Jun-2011 |
Jiri Pirko <jpirko@redhat.com> |
bonding: allow all slave speeds No need to check for 10, 100, 1000, 10000 explicitly. Just make this generic and check for invalid values only (similar check is in ethtool userspace app). This enables correct speed handling for slave devices with "nonstandard" speeds. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
90e62474 |
|
24-May-2011 |
Andy Gospodarek <andy@greyhouse.net> |
bonding: cleanup module option descriptions Weiping Pan noticed that the module option description for xmit_hash_policy was incorrect and was nice enough to post a patch to fix it. The text was correct, but created a line over 80 characters and I would rather not add those. I realized I could take a few minutes and clean up all the descriptions and things would look much better. This is the result. Based on patch from Weiping Pan <panweiping3@gmail.com>. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> CC: Weiping Pan <panweiping3@gmail.com> Reviewed-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
94265cf5 |
|
25-May-2011 |
Flavio Leitner <fbl@redhat.com> |
bonding: documentation and code cleanup for resend_igmp Improves the documentation about how IGMP resend parameter works, fix two missing checks and coding style issues. Signed-off-by: Flavio Leitner <fbl@redhat.com> Acked-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9fe0617d |
|
25-May-2011 |
Neil Horman <nhorman@tuxdriver.com> |
bonding: prevent deadlock on slave store with alb mode (v3) This soft lockup was recently reported: [root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters [root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves bonding: bond5: doing slave updates when interface is down. bonding bond5: master_dev is not up in bond_enslave [root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves bonding: bond5: doing slave updates when interface is down. BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444] CPU 12: Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc be2d Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1 RIP: 0010:[<ffffffff80064bf0>] [<ffffffff80064bf0>] .text.lock.spinlock+0x26/00 RSP: 0018:ffff810113167da8 EFLAGS: 00000286 RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025 RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8 RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000 R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282 FS: 00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0 Call Trace: [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14 [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1 [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0 [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450 [<ffffffff8006457b>] __down_write_nested+0x12/0x92 [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7 [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8 [<ffffffff80016b87>] vfs_write+0xce/0x174 [<ffffffff80017450>] sys_write+0x45/0x6e [<ffffffff8005d28d>] tracesys+0xd5/0xe0 It occurs because we are able to change the slave configuarion of a bond while the bond interface is down. The bonding driver initializes some data structures only after its ndo_open routine is called. Among them is the initalization of the alb tx and rx hash locks. So if we add or remove a slave without first opening the bond master device, we run the risk of trying to lock/unlock a spinlock that has garbage for data in it, which results in our above softlock. Note that sometimes this works, because in many cases an unlocked spinlock has the raw_lock parameter initialized to zero (meaning that the kzalloc of the net_device private data is equivalent to calling spin_lock_init), but thats not true in all cases, and we aren't guaranteed that condition, so we need to pass the relevant spinlocks through the spin_lock_init function. Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to the ndo_init path, so they are ready for use by the bond_store_slaves path. Change notes: v2) Based on conversation with Jay and Nicolas it seems that the ability to enslave devices while the bond master is down should be safe to do. As such this is an outlier bug, and so instead we'll just initalize the errant spinlocks in the init path rather than the open path, solving the problem. We'll also remove the warnings about the bond being down during enslave operations, since it should be safe v3) Fix spelling error Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: jtluka@redhat.com CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: nicolas.2p.debian@gmail.com CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
daf9209b |
|
19-May-2011 |
Amerigo Wang <amwang@redhat.com> |
net: rename NETDEV_BONDING_DESLAVE to NETDEV_RELEASE s/NETDEV_BONDING_DESLAVE/NETDEV_RELEASE/ as Andy suggested. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d8fc29d |
|
19-May-2011 |
Amerigo Wang <amwang@redhat.com> |
netpoll: disable netpoll when enslave a device V3: rename NETDEV_ENSLAVE to NETDEV_JOIN Currently we do nothing when we enslave a net device which is running netconsole. Neil pointed out that we may get weird results in such case, so let's disable netpoll on the device being enslaved. I think it is too harsh to prevent the device being ensalved if it is running netconsole. By the way, this patch also removes the NETDEV_GOING_DOWN from netconsole netdev notifier, because netpoll will check if the device is running or not and we don't handle NETDEV_PRE_UP neither. This patch is based on net-next-2.6. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b2a103e6 |
|
06-May-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
bonding: convert to ndo_fix_features This should also fix updating of vlan_features and propagating changes to VLAN devices on the bond. Side effect: it allows user to force-disable some offloads on the bond interface. Note: NETIF_F_VLAN_CHALLENGED is managed by bond_fix_features() now. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0693e88e |
|
06-May-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: bonding: factor out rlock(bond->lock) in xmit path Pull read_lock(&bond->lock) and BOND_IS_OK() to bond_start_xmit() from mode-dependent xmit functions. netif_running() is always true in hard_start_xmit. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1c5cae81 |
|
29-Apr-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: call dev_alloc_name from register_netdevice Force dev_alloc_name() to be called from register_netdevice() by dev_get_valid_name(). That allows to remove multiple explicit dev_alloc_name() calls. The possibility to call dev_alloc_name in advance remains. This also fixes veth creation regresion caused by 84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743 Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ad246c99 |
|
26-Apr-2011 |
Ben Hutchings <bhutchings@solarflare.com> |
ipv4, ipv6, bonding: Restore control over number of peer notifications For backward compatibility, we should retain the module parameters and sysfs attributes to control the number of peer notifications (gratuitous ARPs and unsolicited NAs) sent after bonding failover. Also, it is possible for failover to take place even though the new active slave does not have link up, and in that case the peer notification should be deferred until it does. Change ipv4 and ipv6 so they do not automatically send peer notifications on bonding failover. Change the bonding driver to send separate NETDEV_NOTIFY_PEERS notifications when the link is up, as many times as requested. Since it does not directly control which protocols send notifications, make num_grat_arp and num_unsol_na aliases for a single parameter. Bump the bonding version number and update its documentation. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Acked-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3aba891d |
|
18-Apr-2011 |
Jiri Pirko <jpirko@redhat.com> |
bonding: move processing of recv handlers into handle_frame() Since now when bonding uses rx_handler, all traffic going into bond device goes thru bond_handle_frame. So there's no need to go back into bonding code later via ptype handlers. This patch converts original ptype handlers into "bonding receive probes". These functions are called from bond_handle_frame and they are registered per-mode. Note that vlan packets are also handled because they are always untagged thanks to vlan_untag() Note that this also allows arpmon for eth-bond-bridge-vlan topology. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7c899432 |
|
15-Apr-2011 |
Ben Hutchings <bhutchings@solarflare.com> |
bonding, ipv4, ipv6, vlan: Handle NETDEV_BONDING_FAILOVER like NETDEV_NOTIFY_PEERS It is undesirable for the bonding driver to be poking into higher level protocols, and notifiers provide a way to avoid that. This does mean removing the ability to configure reptitition of gratuitous ARPs and unsolicited NAs. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7d038eb6 |
|
17-Apr-2011 |
David S. Miller <davem@davemloft.net> |
bonding: Fix set-but-unused variable. The variable 'vlan_dev' is set but unused in bond_send_gratuitous_arp(). Just kill it off. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d30530e |
|
13-Apr-2011 |
David Decotigny <decot@google.com> |
net-bonding: Adding support for throughputs larger than 65536 Mbps This updates the bonding driver to support v2.6.27-rc3 enhancements (b11f8d8c aka. "ethtool: Expand ethtool_cmd.speed to 32 bits") which allow to encode the Mbps link speed on 32-bits (Max 4 Pbps) instead of 16 (Max 65536 Mbps). This patch also attempts to compact struct slave by reordering its fields. Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
65cce19c |
|
13-Apr-2011 |
David Decotigny <decot@google.com> |
net-bonding: Fix minor/cosmetic type inconsistencies The __get_link_speed() function returns a u16 value which was stored in a u32 local variable. This patch uses the return value directly, thus fixing that minor type consistency. The 'duplex' field in struct slave being encoded on 8 bits, to be more consistent we use a u8 integer (instead of u16) whenever we copy it to local variables. Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d30ee670 |
|
13-Apr-2011 |
David Decotigny <decot@google.com> |
net-bonding: Fix minor sparse complaints This gets rid of minor sparse complaints: drivers/net/bonding/bond_main.c:4361:4: warning: do-while statement is not a compound statement drivers/net/bonding/bond_main.c:243:12: warning: symbol 'bond_mode_name' was not declared. Should it be static? Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c6e1a0d1 |
|
04-Apr-2011 |
Tom Herbert <therbert@google.com> |
net: Allow no-cache copy from user on transmit This patch uses __copy_from_user_nocache on transmit to bypass data cache for a performance improvement. skb_add_data_nocache and skb_copy_to_page_nocache can be called by sendmsg functions to use this feature, initial support is in tcp_sendmsg. This functionality is configurable per device using ethtool. Presumably, this feature would only be useful when the driver does not touch the data. The feature is turned on by default if a device indicates that it does some form of checksum offload; it is off by default for devices that do no checksum offload or indicate no checksum is necessary. For the former case copy-checksum is probably done anyway, in the latter case the device is likely loopback in which case the no cache copy is probably not beneficial. This patch was tested using 200 instances of netperf TCP_RR with 1400 byte request and one byte reply. Platform is 16 core AMD x86. No-cache copy disabled: 672703 tps, 97.13% utilization 50/90/99% latency:244.31 484.205 1028.41 No-cache copy enabled: 702113 tps, 96.16% utilization, 50/90/99% latency 238.56 467.56 956.955 Using 14000 byte request and response sizes demonstrate the effects more dramatically: No-cache copy disabled: 79571 tps, 34.34 %utlization 50/90/95% latency 1584.46 2319.59 5001.76 No-cache copy enabled: 83856 tps, 34.81% utilization 50/90/95% latency 2508.42 2622.62 2735.88 Note especially the effect on latency tail (95th percentile). This seems to provide a nice performance improvement and is consistent in the tests I ran. Presumably, this would provide the greatest benfits in the presence of an application workload stressing the cache and a lot of transmit data happening. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
35d48903 |
|
21-Mar-2011 |
Jiri Pirko <jpirko@redhat.com> |
bonding: fix rx_handler locking This prevents possible race between bond_enslave and bond_handle_frame as reported by Nicolas by moving rx_handler register/unregister. slave->bond is added to hold pointer to master bonding sructure. That way dev->master is no longer used in bond_handler_frame. Also, this removes "BUG: scheduling while atomic" message Reported-by: Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Tested-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dadaa10b |
|
19-Mar-2011 |
Nicolas de Pesloüan <nicolas.2p.debian@free.fr> |
bonding: fix a typo in a comment Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ceda86a1 |
|
13-Mar-2011 |
Andy Gospodarek <andy@greyhouse.net> |
bonding: enable netpoll without checking link status Only slaves that are up should transmit netpoll frames, so there is no need to check to see if a slave is up before enabling netpoll on it. This resolves a reported failure on active-backup bonds where a slave interface is down when netpoll was enabled. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Tested-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a4eb573 |
|
11-Mar-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: introduce rx_handler results and logic around that This patch allows rx_handlers to better signalize what to do next to it's caller. That makes skb->deliver_no_wcard no longer needed. kernel-doc for rx_handler_result is taken from Nicolas' patch. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2d7011ca |
|
16-Mar-2011 |
Jiri Pirko <jpirko@redhat.com> |
bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag Since bond-related code was moved from net/core/dev.c into bonding, IFF_SLAVE_INACTIVE is no longer needed. Replace is with flag "inactive" stored in slave structure Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e30bc066 |
|
11-Mar-2011 |
Jiri Pirko <jpirko@redhat.com> |
bonding: wrap slave state work transfers slave->state into slave->backup (that it's going to transfer into bitfield. Introduce wrapper inlines to do the work with it. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0bd80dad |
|
16-Mar-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: get rid of multiple bond-related netdevice->priv_flags Now when bond-related code is moved from net/core/dev.c into bonding code, multiple priv_flags are not needed anymore. So let them rot. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f1c1775a |
|
11-Mar-2011 |
Jiri Pirko <jpirko@redhat.com> |
bonding: register slave pointer for rx_handler Register slave pointer as rx_handler data. That would eventually prevent need to loop over slave devices to find the right slave. Use synchronize_net to ensure that bond_handle_frame does not get slave structure freed when working with that. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e826eafa |
|
14-Mar-2011 |
Phil Oester <kernel@linuxace.com> |
bonding: Call netif_carrier_off after register_netdevice Bringing up a bond interface with all network cables disconnected does not properly set the interface as DOWN because the call to netif_carrier_off occurs too early in bond_init. The call needs to occur after register_netdevice has set dev->reg_state to NETREG_REGISTERED, so that netif_carrier_off will trigger the call to linkwatch_fire_event. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fd0e435b |
|
14-Mar-2011 |
Phil Oester <kernel@linuxace.com> |
bonding: Incorrect TX queue offset When packets come in from a device with >= 16 receive queues headed out a bonding interface, syslog gets filled with this: kernel: bond0 selects TX queue 16, but real number of TX queues is 16 because queue_mapping is offset by 1. Adjust return value to account for the offset. This is a revision of my earlier patch (which did not use the skb_rx_queue_* helpers - thanks to Ben for the suggestion). Andy submitted a similar patch which emits a pr_warning on invalid queue selection, but I believe the log spew is not useful. We can revisit that question in the future, but in the interim I believe fixing the core problem is worthwhile. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
78fbfd8a |
|
11-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Create and use route lookup helpers. The idea here is this minimizes the number of places one has to edit in order to make changes to how flows are defined and used. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bd33acc3 |
|
06-Mar-2011 |
Amerigo Wang <amwang@redhat.com> |
bonding: move procfs code into bond_procfs.c V2: Move #ifdef CONFIG_PROC_FS into bonding.h, as suggested by David. bond_main.c is bloating, separate the procfs code out, move them to bond_procfs.c Signed-off-by: WANG Cong <amwang@redhat.com> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
541ac7c9 |
|
02-Mar-2011 |
Changli Gao <xiaosuo@gmail.com> |
bonding: COW before overwriting the destination MAC address When there is a ptype handler holding a clone of this skb, whose destination MAC addresse is overwritten, the owner of this handler may get a corrupted packet. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cca134fe |
|
02-Mar-2011 |
Changli Gao <xiaosuo@gmail.com> |
bonding: remove the unused dummy functions when net poll controller isn't enabled These two functions are only used when net poll controller is enabled. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b23dd4fe |
|
02-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Make output route lookup return rtable directly. Instead of on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b2c4dd2 |
|
23-Feb-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: convert bonding to use rx_handler This patch converts bonding to use rx_handler. Results in cleaner __netif_receive_skb() with much less exceptions needed. Also bond-specific work is moved into bond code. Did performance test using pktgen and counting incoming packets by iptables. No regression noted. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
080e4130 |
|
17-Feb-2011 |
Amerigo Wang <amwang@redhat.com> |
netpoll: remove IFF_IN_NETPOLL flag V4: rebase to net-next-2.6 This patch removes the flag IFF_IN_NETPOLL, we don't need it any more since we have netpoll_tx_running() now. Signed-off-by: WANG Cong <amwang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a8efa22 |
|
17-Feb-2011 |
Amerigo Wang <amwang@redhat.com> |
bonding: sync netpoll code with bridge V4: rebase to net-next-2.6 V3: remove an useless #ifdef. This patch unifies the netpoll code in bonding with netpoll code in bridge, thanks to Herbert that code is much cleaner now. Signed-off-by: WANG Cong <amwang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9232ecca |
|
13-Feb-2011 |
Jiri Pirko <jpirko@redhat.com> |
bond: implement [add/del]_slave ops allow enslaving/releasing using netlink interface Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1765a575 |
|
11-Feb-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: make dev->master general dev->master is now tightly connected to bonding driver. This patch makes this pointer more general and ready to be used by others. - netdev_set_master() - bond specifics moved to new function netdev_set_bond_master() - introduced netif_is_bond_slave() to check if device is a bonding slave Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
acd1130e |
|
24-Jan-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: reduce and unify printk level in netdev_fix_features() Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will be used by ethtool and might spam dmesg unnecessarily. This converts the function to use netdev_info() instead of plain printk(). As a side effect, bonding and bridge devices will now log dropped features on every slave device change. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
04ed3e74 |
|
24-Jan-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: change netdev->features to u32 Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3053251 |
|
20-Jan-2011 |
Neil Horman <nhorman@tuxdriver.com> |
bonding: Ensure that we unshare skbs prior to calling pskb_may_pull Recently reported oops: kernel BUG at net/core/skbuff.c:813! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/virtual/net/bond0/broadcast CPU 8 Modules linked in: sit tunnel4 cpufreq_ondemand acpi_cpufreq freq_table bonding ipv6 dm_mirror dm_region_hash dm_log cdc_ether usbnet mii serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma i7core_edac edac_core bnx2 ixgbe dca mdio sg ext4 mbcache jbd2 sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: microcode] Modules linked in: sit tunnel4 cpufreq_ondemand acpi_cpufreq freq_table bonding ipv6 dm_mirror dm_region_hash dm_log cdc_ether usbnet mii serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma i7core_edac edac_core bnx2 ixgbe dca mdio sg ext4 mbcache jbd2 sd_mod crc_t10dif mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: microcode] Pid: 0, comm: swapper Not tainted 2.6.32-71.el6.x86_64 #1 BladeCenter HS22 -[7870AC1]- RIP: 0010:[<ffffffff81405b16>] [<ffffffff81405b16>] pskb_expand_head+0x36/0x1e0 RSP: 0018:ffff880028303b70 EFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff880c6458ec80 RCX: 0000000000000020 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880c6458ec80 RBP: ffff880028303bc0 R08: ffffffff818a6180 R09: ffff880c6458ed64 R10: ffff880c622b36c0 R11: 0000000000000400 R12: 0000000000000000 R13: 0000000000000180 R14: ffff880c622b3000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880028300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000038653452a4 CR3: 0000000001001000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff8806649c2000, task ffff880c64f16ab0) Stack: ffff880028303bc0 ffffffff8104fff9 000000000000001c 0000000100000000 <0> ffff880000047d80 ffff880c6458ec80 000000000000001c ffff880c6223da00 <0> ffff880c622b3000 0000000000000000 ffff880028303c10 ffffffff81407f7a Call Trace: <IRQ> [<ffffffff8104fff9>] ? __wake_up_common+0x59/0x90 [<ffffffff81407f7a>] __pskb_pull_tail+0x2aa/0x360 [<ffffffffa0244530>] bond_arp_rcv+0x2c0/0x2e0 [bonding] [<ffffffff814a0857>] ? packet_rcv+0x377/0x440 [<ffffffff8140f21b>] netif_receive_skb+0x2db/0x670 [<ffffffff8140f788>] napi_skb_finish+0x58/0x70 [<ffffffff8140fc89>] napi_gro_receive+0x39/0x50 [<ffffffffa01286eb>] ixgbe_clean_rx_irq+0x35b/0x900 [ixgbe] [<ffffffffa01290f6>] ixgbe_clean_rxtx_many+0x136/0x240 [ixgbe] [<ffffffff8140fe53>] net_rx_action+0x103/0x210 [<ffffffff81073bd7>] __do_softirq+0xb7/0x1e0 [<ffffffff810d8740>] ? handle_IRQ_event+0x60/0x170 [<ffffffff810142cc>] call_softirq+0x1c/0x30 [<ffffffff81015f35>] do_softirq+0x65/0xa0 [<ffffffff810739d5>] irq_exit+0x85/0x90 [<ffffffff814cf915>] do_IRQ+0x75/0xf0 [<ffffffff81013ad3>] ret_from_intr+0x0/0x11 <EOI> [<ffffffff8101bc01>] ? mwait_idle+0x71/0xd0 [<ffffffff814cd80a>] ? atomic_notifier_call_chain+0x1a/0x20 [<ffffffff81011e96>] cpu_idle+0xb6/0x110 [<ffffffff814c17c8>] start_secondary+0x1fc/0x23f Resulted from bonding driver registering packet handlers via dev_add_pack and then trying to call pskb_may_pull. If another packet handler (like for AF_PACKET sockets) gets called first, the delivered skb will have a user count > 1, which causes pskb_may_pull to BUG halt when it does its skb_shared check. Fix this by calling skb_share_check prior to the may_pull call sites in the bonding driver to clone the skb when needed. Tested by myself and the reported successfully. Signed-off-by: Neil Horman CC: Andy Gospodarek <andy@greyhouse.net> CC: Jay Vosburgh <fubar@us.ibm.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ffa95ed5 |
|
13-Dec-2010 |
Ben Hutchings <bhutchings@solarflare.com> |
bonding: Change active slave quietly when bond is down bond_change_active_slave() may be called when a slave is added, even if the bond has not been brought up yet. It may then attempt to send packets, and further it may use mcast_work which is uninitialised before the bond is brought up. Add the necessary checks for netif_running(bond->dev). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8387451e |
|
13-Dec-2010 |
Ben Hutchings <bhutchings@solarflare.com> |
bonding/vlan: Remove redundant VLAN tag insertion logic A bond may have a mixture of slave devices with and without hardware VLAN tag insertion capability. Therefore it always claims this capability and performs software VLAN tag insertion if the slave does not. Since commit 7b9c60903714bf0a19d746b228864bad3497284e, this has also been done by dev_hard_start_xmit(). The result is that VLAN- tagged skbs are now double-tagged when transmitted through slave devices without hardware VLAN tag insertion! Remove the now-redundant logic from bond_dev_queue_xmit(). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Reviewed-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f073c7ca |
|
09-Dec-2010 |
Taku Izumi <izumi.taku@jp.fujitsu.com> |
bonding: add the debugfs facility to the bonding driver This patch provides the debugfs facility to the bonding driver. The "bonding" directory is created in the debugfs root and directories of each bonding interface (like bond0, bond1...) are created in that. # mount -t debugfs none /sys/kernel/debug # ls /sys/kernel/debug/bonding bond0 bond1 Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fb4fa76a |
|
06-Dec-2010 |
Neil Horman <nhorman@tuxdriver.com> |
net: Convert netpoll blocking api in bonding driver to be a counter A while back I made some changes to enable netpoll in the bonding driver. Among them was a per-cpu flag that indicated we were in a path that held locks which could cause the netpoll path to block in during tx, and as such the tx path should queue the frame for later use. This appears to have given rise to a regression. If one of those paths on which we hold the per-cpu flag yields the cpu, its possible for us to come back on a different cpu, leading to us clearing a different flag than we set. This results in odd netpoll drops, and BUG backtraces appearing in the log, as we check to make sure that we only clear set bits, and only set clear bits. I had though briefly about changing the offending paths so that they wouldn't sleep, but looking at my origional work more closely, it doesn't appear that a per-cpu flag is warranted. We alrady gate the checking of this flag on IFF_IN_NETPOLL, so we don't hit this in the normal tx case anyway. And practically speaking, the normal use case for netpoll is to only have one client anyway, so we're not going to erroneously queue netpoll frames when its actually safe to do so. As such, lets just convert that per-cpu flag to an atomic counter. It fixes the rescheduling bugs, is equivalent from a performance perspective and actually eliminates some code in the process. Tested by the reporter and myself, successfully Reported-by: Liang Zheng <lzheng@redhat.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: David S. Miller <davem@davemloft.net> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d13a2cb6 |
|
01-Dec-2010 |
David Strand <dpstrand@gmail.com> |
bonding: check for assigned mac before adopting the slaves mac address Restore the check for an unassigned mac address before adopting the first slaves as it's own. The change in behavior was introduced by: commit c20811a79e671a6a1fe86a8c1afe04aca8a7f085 Author: Jiri Pirko <jpirko@redhat.com> bonding: move dev_addr cpy to bond_enslave Signed-off-by: David Strand <dpstrand@gmail.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
866f3b25 |
|
18-Nov-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
bonding: IGMP handling cleanup Instead of iterating in_dev->mc_list from bonding driver, its better to call a helper function provided by igmp.c Details of implementation (locking) are private to igmp code. ip_mc_rejoin_group(struct ip_mc_list *im) becomes ip_mc_rejoin_groups(struct in_device *in_dev); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3006bc38 |
|
18-Nov-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
bonding: fix a race in IGMP handling RCU conversion in IGMP code done in net-next-2.6 raised a race in __bond_resend_igmp_join_requests(). It iterates in_dev->mc_list without appropriate protection (RTNL, or read_lock on in_dev->mc_list_lock). Another cpu might delete an entry while we use it and trigger a fault. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e4a7b93b |
|
28-Oct-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
bonding: remove dev_base_lock use bond_info_seq_start() uses a read_lock(&dev_base_lock) to make sure device doesn’t disappear. Same goal can be achieved using RCU. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a71fb881 |
|
27-Oct-2010 |
Jarek Poplawski <jarkao2@gmail.com> |
bonding: Fix lockdep warning after bond_vlan_rx_register() Fix lockdep warning: [ 52.991402] ====================================================== [ 52.991511] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ] [ 52.991569] 2.6.36-04573-g4b60626-dirty #65 [ 52.991622] ------------------------------------------------------ [ 52.991696] ip/4842 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire: [ 52.991758] (&bond->lock){++++..}, at: [<efe4d300>] bond_set_multicast_list+0x60/0x2c0 [bonding] [ 52.991966] [ 52.991967] and this task is already holding: [ 52.992008] (&bonding_netdev_addr_lock_key){+.....}, at: [<c04e5530>] dev_mc_sync+0x50/0xa0 [ 52.992008] which would create a new lock dependency: [ 52.992008] (&bonding_netdev_addr_lock_key){+.....} -> (&bond->lock){++++..} [ 52.992008] [ 52.992008] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 52.992008] (&(&mc->mca_lock)->rlock){+.-...} [ 52.992008] ... which became SOFTIRQ-irq-safe at: [ 52.992008] [<c0272beb>] __lock_acquire+0x96b/0x1960 [ 52.992008] [<c027415e>] lock_acquire+0x7e/0xf0 [ 52.992008] [<c05f356d>] _raw_spin_lock_bh+0x3d/0x50 [ 52.992008] [<c0584e40>] mld_ifc_timer_expire+0xf0/0x280 [ 52.992008] [<c024cee6>] run_timer_softirq+0x146/0x310 [ 52.992008] [<c024591d>] __do_softirq+0xad/0x1c0 [ 52.992008] [ 52.992008] to a SOFTIRQ-irq-unsafe lock: [ 52.992008] (&bond->lock){++++..} [ 52.992008] ... which became SOFTIRQ-irq-unsafe at: [ 52.992008] ... [<c0272c3b>] __lock_acquire+0x9bb/0x1960 [ 52.992008] [<c027415e>] lock_acquire+0x7e/0xf0 [ 52.992008] [<c05f36b8>] _raw_write_lock+0x38/0x50 [ 52.992008] [<efe4cbe4>] bond_vlan_rx_register+0x24/0x70 [bonding] [ 52.992008] [<c0598010>] register_vlan_dev+0xc0/0x280 [ 52.992008] [<c0599f3a>] vlan_newlink+0xaa/0xd0 [ 52.992008] [<c04ed4b4>] rtnl_newlink+0x404/0x490 [ 52.992008] [<c04ece35>] rtnetlink_rcv_msg+0x1e5/0x220 [ 52.992008] [<c050424e>] netlink_rcv_skb+0x8e/0xb0 [ 52.992008] [<c04ecbac>] rtnetlink_rcv+0x1c/0x30 [ 52.992008] [<c0503bfb>] netlink_unicast+0x24b/0x290 [ 52.992008] [<c0503e37>] netlink_sendmsg+0x1f7/0x310 [ 52.992008] [<c04cd41c>] sock_sendmsg+0xac/0xe0 [ 52.992008] [<c04ceb80>] sys_sendmsg+0x130/0x230 [ 52.992008] [<c04cf04e>] sys_socketcall+0xde/0x280 [ 52.992008] [<c0202d10>] sysenter_do_call+0x12/0x36 [ 52.992008] [ 52.992008] other info that might help us debug this: ... [ Full info at netdev: Wed, 27 Oct 2010 12:24:30 +0200 Subject: [BUG net-2.6 vlan/bonding] lockdep splats ] Use BH variant of write_lock(&bond->lock) (as elsewhere in bond_main) to prevent this dependency. Fixes commit f35188faa0fbabefac476536994f4b6f3677380f [v2.6.36] Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jay Vosburgh <fubar@us.ibm.com>
|
#
26d8ee75 |
|
14-Oct-2010 |
stephen hemminger <shemminger@vyatta.com> |
bonding: make release_and_destroy static Only used in main file. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
379b7383 |
|
15-Oct-2010 |
stephen hemminger <shemminger@vyatta.com> |
bonding: make bond_resend_igmp_join_requests_delayed static Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Flavio Leitner <fleitner@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ff76c95 |
|
19-Oct-2010 |
Neil Horman <nhorman@tuxdriver.com> |
netpoll: Remove netpoll blocking from uninit path Some recent testing in netpoll with bonding showed this backtrace ------------[ cut here ]------------ kernel BUG at drivers/net/bonding/bonding.h:134! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1d.2/usb7/devnum CPU 0 Pid: 1876, comm: rmmod Not tainted 2.6.36-rc3+ #10 D26928/ RIP: 0010:[<ffffffffa0514ba4>] [<ffffffffa0514ba4>] bond_uninit+0x6f4/0x7a0 RSP: 0018:ffff88003b1b5d58 EFLAGS: 00010296 RAX: ffff88003b9b6200 RBX: ffff8800373e8e00 RCX: 00000000000f4240 RDX: 00000000ffffffff RSI: 0000000000000286 RDI: 0000000000000286 RBP: ffff88003b1b5dc8 R08: 0000000000000000 R09: 00000001af7de920 R10: 0000000000000000 R11: ffff880002495e98 R12: ffff880037922700 R13: ffff880038c31000 R14: ffff880037922730 R15: 0000000000000286 FS: 00007f90e6d72700(0000) GS:ffff880002400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000346f0d9ad0 CR3: 000000003b263000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rmmod (pid: 1876, threadinfo ffff88003b1b4000, task ffff88003b36aa80) Stack: 00000000ffffffff ffff88003b1b5d7a ffff8800379221e8 ffff880037922000 <0> ffff88003b1b5dc8 ffffffff813eb5fb ffff88003b1b5da8 0000000031b177a3 <0> ffff88003b1b5da8 ffff880037922000 ffff88003b1b5e48 ffff88003b1b5e48 Call Trace: [<ffffffff813eb5fb>] ? rtmsg_ifinfo+0xcb/0xf0 [<ffffffff813daad8>] rollback_registered_many+0x168/0x280 [<ffffffff813dac09>] unregister_netdevice_many+0x19/0x80 [<ffffffff813e97b3>] __rtnl_kill_links+0x63/0x90 [<ffffffff813e980b>] __rtnl_link_unregister+0x2b/0x60 [<ffffffff813e9bde>] rtnl_link_unregister+0x1e/0x30 [<ffffffffa052124b>] bonding_exit+0x37/0x51 [bonding] [<ffffffff81098b2e>] sys_delete_module+0x19e/0x270 [<ffffffff810bb2b2>] ? audit_syscall_entry+0x252/0x280 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b RIP [<ffffffffa0514ba4>] bond_uninit+0x6f4/0x7a0 [bonding] RSP <ffff88003b1b5d58> ---[ end trace 1395ad691cea24d1 ]--- It occurs because of my recent netpoll blocking patches, which I added to avoid recursive deadlock in the bonding driver. It relies on some per cpu bits, but the shutdown path forces some rescheduling as we cancel workqueues for the driver and wait for some device refcounts. If after the forced reschedule, we wind up on a different cpu we trigger the bughalt in unblock_netpoll_tx. The fix is to remove the netpoll block/unblock calls from bond_release_all. This is safe to do because bond_uninit, which is called via ndo_uninit in rollback_registered_many, doesn't occur until we send a NETDEV_UNREGISTER event, which triggers netconsole to remove us as a netpoll client, so we are guaranteed not to recurse into our own tx path here. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
45b0cb8a |
|
13-Oct-2010 |
Neil Horman <nhorman@tuxdriver.com> |
bonding: Re-enable netpoll over bonding With the inclusion of previous fixup patches, netpoll over bonding apears to work reliably with failover conditions. This reverts Gospos previous commit c22d7ac844f1cb9c6a5fd20f89ebadc2feef891b, and allows access again to the netpoll functionality in the bonding driver. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e843fa50 |
|
13-Oct-2010 |
Neil Horman <nhorman@tuxdriver.com> |
bonding: Fix deadlock in bonding driver resulting from internal locking when using netpoll The monitoring paths in the bonding driver take write locks that are shared by the tx path. If netconsole is in use, these paths can call printk which puts us in the netpoll tx path, which, if netconsole is attached to the bonding driver, result in deadlock (the xmit_lock guards are useless in netpoll_send_skb, as the monitor paths in the bonding driver don't claim the xmit_lock, nor should they). The solution is to use a per cpu flag internal to the driver to indicate when a cpu is holding the lock in a path that might recusrse into the tx path for the driver via netconsole. By checking this flag on transmit, we can defer the sending of the netconsole frames until a later time using the retransmit feature of netpoll_send_skb that is triggered on the return code NETDEV_TX_BUSY. I've tested this and am able to transmit via netconsole while causing failover conditions on the bond slave links. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c2355e1a |
|
13-Oct-2010 |
Neil Horman <nhorman@tuxdriver.com> |
bonding: Fix bonding drivers improper modification of netpoll structure The bonding driver currently modifies the netpoll structure in its xmit path while sending frames from netpoll. This is racy, as other cpus can access the netpoll structure in parallel. Since the bonding driver points np->dev to a slave device, other cpus can inadvertently attempt to send data directly to slave devices, leading to improper locking with the bonding master, lost frames, and deadlocks. This patch fixes that up. This patch also removes the real_dev pointer from the netpoll structure as that data is really only used by bonding in the poll_controller, and we can emulate its behavior by check each slave for IS_UP. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dd53df26 |
|
30-Sep-2010 |
Krzysztof Oledzki <ole@ans.pl> |
bonding: add Speed/Duplex information to /proc/net/bonding/bond Effect: Slave Interface: eth5 MII Status: up Speed: 10000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: XX:XX:XX:XX:XX:XX Slave queue ID: 0 Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
546add79 |
|
06-Oct-2010 |
Krzysztof Piotr Oledzki <ole@ans.pl> |
bonding: reread information about speed and duplex when interface goes up When an interface was enslaved when it was down, bonding thinks it has speed -1 even after it goes up. This leads into selecting a wrong active interface in active/backup mode on mixed 10G/1G or 1G/100M environment. before: bonding: bond0: link status definitely up for interface eth5, 100 Mbps full duplex. bonding: bond0: link status definitely up for interface eth0, 100 Mbps full duplex. after: bonding: bond0: link status definitely up for interface eth5, 10000 Mbps full duplex. bonding: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex. Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
700c2a77 |
|
06-Oct-2010 |
Krzysztof Piotr Oledzki <ole@ans.pl> |
bonding: print information about speed and duplex seen by the driver before: bonding: bond0: link status definitely up for interface eth5 bonding: bond0: link status definitely up for interface eth0 after: bonding: bond0: link status definitely up for interface eth5, 100 Mbps full duplex. bonding: bond0: link status definitely up for interface eth0, 100 Mbps full duplex. Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c2952c31 |
|
05-Oct-2010 |
Flavio Leitner <fleitner@redhat.com> |
bonding: add retransmit membership reports tunable Allow sysadmins to configure the number of multicast membership report sent on a link failure event. Signed-off-by: Flavio Leitner <fleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a37e8ca |
|
05-Oct-2010 |
Flavio Leitner <fleitner@redhat.com> |
bonding: rejoin multicast groups on VLANs During a failover, the IGMP membership is sent to update the switch restoring the traffic, but it misses groups added to VLAN devices running on top of bonding devices. This patch changes it to iterate over all VLAN devices on top of it sending IGMP memberships too. Signed-off-by: Flavio Leitner <fleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27e6f065 |
|
04-Oct-2010 |
Neil Horman <nhorman@tuxdriver.com> |
bonding: fix WARN_ON when writing to bond_master sysfs file Fix a WARN_ON failure in bond_masters sysfs file Got a report of this warning recently bonding: bond0 is being created... ------------[ cut here ]------------ WARNING: at fs/proc/generic.c:590 proc_register+0x14d/0x185() Hardware name: ProLiant BL465c G1 proc_dir_entry 'bonding/bond0' already registered Modules linked in: bonding ipv6 tg3 bnx2 shpchp amd64_edac_mod edac_core ipmi_si ipmi_msghandler serio_raw i2c_piix4 k8temp edac_mce_amd hpwdt microcode hpsa cc iss radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wai t_scan] Pid: 935, comm: ifup-eth Not tainted 2.6.33.5-124.fc13.x86_64 #1 Call Trace: [<ffffffff8104b54c>] warn_slowpath_common+0x77/0x8f [<ffffffff8104b5b1>] warn_slowpath_fmt+0x3c/0x3e [<ffffffff8114bf0b>] proc_register+0x14d/0x185 [<ffffffff8114c20c>] proc_create_data+0x87/0xa1 [<ffffffffa0211e9b>] bond_create_proc_entry+0x55/0x95 [bonding] [<ffffffffa0215e5d>] bond_init+0x95/0xd0 [bonding] [<ffffffff8138cd97>] register_netdevice+0xdd/0x29e [<ffffffffa021240b>] bond_create+0x8e/0xb8 [bonding] [<ffffffffa021c4be>] bonding_store_bonds+0xb3/0x1c1 [bonding] [<ffffffff812aec85>] class_attr_store+0x27/0x29 [<ffffffff8115423d>] sysfs_write_file+0x10f/0x14b [<ffffffff81101acf>] vfs_write+0xa9/0x106 [<ffffffff81101be2>] sys_write+0x45/0x69 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b ---[ end trace a677c3f7f8b16b1e ]--- bonding: Bond creation failed. It happens because a user space writer to bond_master can try to register an already existing bond interface name. Fix it by teaching bond_create to check for the existance of devices with that name first in cases where a non-NULL name parameter has been passed in Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e6599c2e |
|
17-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
bonding: enable gro by default gro can be enabled by default on bonding devices. Actual support depends on the lower devices. One can still use ethtool to switch off GRO if needed. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb32f2a0 |
|
01-Sep-2010 |
Jiri Bohac <jbohac@suse.cz> |
bonding: Fix jiffies overflow problems (again) The time_before_eq()/time_after_eq() functions operate on unsigned long and only work if the difference between the two compared values is smaller than half the range of unsigned long (31 bits on i386). Some of the variables (slave->jiffies, dev->trans_start, dev->last_rx) used by bonding store a copy of jiffies and may not be updated for a long time. With HZ=1000, time_before_eq()/time_after_eq() will start giving bad results after ~25 days. jiffies will never be before slave->jiffies, dev->trans_start, dev->last_rx by more than possibly a couple ticks caused by preemption of this code. This allows us to detect/prevent these overflows by replacing time_before_eq()/time_after_eq() with time_in_range(). Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
03dc2f4c |
|
20-Jul-2010 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: don't lock when copying/clearing VLAN list on slave When copying VLAN information to or removing from a slave during slave addition or removal, the bonding code currently holds the bond->lock for write to prevent concurrent modification of the vlan_list / vlgrp. This is unnecessary, as all of these operations occur under RTNL. Holding the bond->lock also caused might_sleep issues for some drivers' ndo_vlan_* functions. This patch removes the extra locking. Problem reported by Michael Chan <mchan@broadcom.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f35188fa |
|
20-Jul-2010 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: change test for presence of VLANs After commit ad1afb00393915a51c21b1ae8704562bf036855f ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") it is now regular practice for a VLAN "add vid" for VLAN 0 to arrive prior to any VLAN registration or creation of a vlan_group. This patch updates the bonding code that tests for the presence of VLANs configured above bonding. The new logic tests for bond->vlgrp to determine if a registration has occured, instead of testing that bonding's internal vlan_list is empty. The old code would panic when vlan_list was not empty, but vlgrp was still NULL (because only an "add vid" for VLAN 0 had occured). Bonding still adds VLAN 0 to its internal list so that 802.1p frames are handled correctly on transmit when non-VLAN accelerated slaves are members of the bond. The test against bond->vlan_list remains in bond_dev_queue_xmit for this reason. Modification to the bond->vlgrp now occurs under lock (in addition to RTNL), because not all inspections of it occur under RTNL. Additionally, because 8021q will never issue a "kill vid" for VLAN 0, there is now logic in bond_uninit to release any remaining entries from vlan_list. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Cc: Pedro Garcia <pedro.netdev@dondevamos.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
90e1795b |
|
19-Jul-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
bonding: avoid a warning drivers/net/bonding/bond_main.c:179:12: warning: ‘disable_netpoll’ defined but not used Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
28172739 |
|
07-Jul-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: fix 64 bit counters on 32 bit arches There is a small possibility that a reader gets incorrect values on 32 bit arches. SNMP applications could catch incorrect counters when a 32bit high part is changed by another stats consumer/provider. One way to solve this is to add a rtnl_link_stats64 param to all ndo_get_stats64() methods, and also add such a parameter to dev_get_stats(). Rule is that we are not allowed to use dev->stats64 as a temporary storage for 64bit stats, but a caller provided area (usually on stack) Old drivers (only providing get_stats() method) need no changes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c22d7ac8 |
|
25-Jun-2010 |
Andy Gospodarek <andy@greyhouse.net> |
bonding: prevent netpoll over bonded interfaces Support for netpoll over bonded interfaces was added here: commit f6dc31a85cd46a959bdd987adad14c3b645e03c1 Author: WANG Cong <amwang@redhat.com> Date: Thu May 6 00:48:51 2010 -0700 bonding: make bonding support netpoll but it is bad enough that we should probably just disable netpoll over bonding until some of the locking logic in the bonding driver is changed or converted completely to RCU. Simple actions like changing the active slave in active-backup mode will hang the box if a high enough printk debugging level is enabled. Keeping the old code around will be good for anyone that wants to work on it (and for after the RCU conversion), so I propose this small patch rather than ripping it all out. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be1f3c2c |
|
08-Jun-2010 |
Ben Hutchings <bhutchings@solarflare.com> |
net: Enable 64-bit net device statistics on 32-bit architectures Use struct rtnl_link_stats64 as the statistics structure. On 32-bit architectures, insert 32 bits of padding after/before each field of struct net_device_stats to make its layout compatible with struct rtnl_link_stats64. Add an anonymous union in net_device; move stats into the union and add struct rtnl_link_stats64 stats64. Add net_device_ops::ndo_get_stats64, implementations of which will return a pointer to struct rtnl_link_stats64. Drivers that implement this operation must not update the structure asynchronously. Change dev_get_stats() to call ndo_get_stats64 if available, and to return a pointer to struct rtnl_link_stats64. Change callers of dev_get_stats() accordingly. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d8d1f30b |
|
11-Jun-2010 |
Changli Gao <xiaosuo@gmail.com> |
net-next: remove useless union keyword remove useless union keyword in rtable, rt6_info and dn_route. Since there is only one member in a union, the union keyword isn't useful. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bb1d9123 |
|
02-Jun-2010 |
Andy Gospodarek <andy@greyhouse.net> |
bonding: allow user-controlled output slave selection v2: changed bonding module version, modified to apply on top of changes from previous patch in series, and updated documentation to elaborate on multiqueue awareness that now exists in bonding driver. This patch give the user the ability to control the output slave for round-robin and active-backup bonding. Similar functionality was discussed in the past, but Jay Vosburgh indicated he would rather see a feature like this added to existing modes rather than creating a completely new mode. Jay's thoughts as well as Neil's input surrounding some of the issues with the first implementation pushed us toward a design that relied on the queue_mapping rather than skb marks. Round-robin and active-backup modes were chosen as the first users of this slave selection as they seemed like the most logical choices when considering a multi-switch environment. Round-robin mode works without any modification, but active-backup does require inclusion of the first patch in this series and setting the 'all_slaves_active' flag. This will allow reception of unicast traffic on any of the backup interfaces. This was tested with IPv4-based filters as well as VLAN-based filters with good results. More information as well as a configuration example is available in the patch to Documentation/networking/bonding.txt. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ebd8e497 |
|
02-Jun-2010 |
Andy Gospodarek <andy@greyhouse.net> |
bonding: add all_slaves_active parameter v2: changed parameter name from 'keep_all' to 'all_slaves_active' and skipped setting slaves to inactive rather than creating a new flag at Jay's suggestion. In an effort to suppress duplicate frames on certain bonding modes (specifically the modes that do not require additional configuration on the switch or switches connected to the host), code was added in the generic receive patch in 2.6.16. The current behavior works quite well for most users, but there are some times it would be nice to restore old functionality and allow all frames to make their way up the stack. This patch adds support for a new module option and sysfs file called 'all_slaves_active' that will restore pre-2.6.16 functionality if the user desires. The default value is '0' and retains existing behavior, but the user can set it to '1' and allow all frames up if desired. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5206e24c |
|
18-May-2010 |
Jiri Pirko <jpirko@redhat.com> |
bonding: remove unused original_flags struct slave member This is stored but never restored. So remove this as it is useless. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c20811a7 |
|
18-May-2010 |
Jiri Pirko <jpirko@redhat.com> |
bonding: move dev_addr cpy to bond_enslave Move the code that copies slave's mac address in case that's the first slave into bond_enslave. Ifenslave app does this also but that's not a problem. This is something that should be done in bond_enslave, and it shound not matter from where is it called. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b15ba0fb |
|
17-May-2010 |
Jiri Pirko <jpirko@redhat.com> |
bonding: move slave MTU handling from sysfs V2 V1->V2: corrected res/ret use For some reason, MTU handling (storing, and restoring) is taking place in bond_sysfs. The correct place for this code is in bond_enslave, bond_release. So move it there. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f6dc31a8 |
|
06-May-2010 |
WANG Cong <amwang@redhat.com> |
bonding: make bonding support netpoll Based on Andy's work, but I modified a lot. Similar to the patch for bridge, this patch does: 1) implement the 2 methods to support netpoll for bonding; 2) modify netpoll during forwarding packets via bonding; 3) disable netpoll support of bonding when a netpoll-unabled device is added to bonding; 4) enable netpoll support when all underlying devices support netpoll. Cc: Andy Gospodarek <gospo@redhat.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
22bedad3 |
|
01-Apr-2010 |
Jiri Pirko <jpirko@redhat.com> |
net: convert multicast list to list_head Converts the list and the core manipulating with it to be the same as uc_list. +uses two functions for adding/removing mc address (normal and "global" variant) instead of a function parameter. +removes dev_mcast.c completely. +exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for manipulation with lists on a sandbox (used in bonding and 80211 drivers) Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a748ee24 |
|
01-Apr-2010 |
Jiri Pirko <jpirko@redhat.com> |
net: move address list functions to a separate file +little renaming of unicast functions to be smooth with multicast ones Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9e2e61fb |
|
31-Mar-2010 |
Amerigo Wang <amwang@redhat.com> |
bonding: fix potential deadlock in bond_uninit() bond_uninit() is invoked with rtnl_lock held, when it does destroy_workqueue() which will potentially flush all works in this workqueue, if we hold rtnl_lock again in the work function, it will deadlock. So move destroy_workqueue() to destructor where rtnl_lock is not held any more, suggested by Eric. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Jiri Pirko <jpirko@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
00ae7028 |
|
30-Mar-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
bonding: bond_xmit_roundrobin() fix Commit a2fd940f (bonding: fix broken multicast with round-robin mode) added a problem on litle endian machines. drivers/net/bonding/bond_main.c:4159: warning: comparison is always false due to limited range of data type Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a2fd940f |
|
25-Mar-2010 |
Andy Gospodarek <andy@greyhouse.net> |
bonding: fix broken multicast with round-robin mode Round-robin (mode 0) does nothing to ensure that any multicast traffic originally destined for the host will continue to arrive at the host when the link that sent the IGMP join or membership report goes down. One of the benefits of absolute round-robin transmit. Keeping track of subscribed multicast groups for each slave did not seem like a good use of resources, so I decided to simply send on the curr_active slave of the bond (typically the first enslaved device that is up). This makes failover management simple as IGMP membership reports only need to be sent when the curr_active_slave changes. I tested this patch and it appears to work as expected. Originally reported by Lon Hohberger <lhh@redhat.com>. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> CC: Lon Hohberger <lhh@redhat.com> CC: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2381a55c |
|
24-Mar-2010 |
Frans Pop <elendil@planet.nl> |
net/various: remove trailing space in messages Signed-off-by: Frans Pop <elendil@planet.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
32a806c1 |
|
18-Mar-2010 |
Jiri Pirko <jpirko@redhat.com> |
bonding: flush unicast and multicast lists when changing type After the type change, addresses in unicast and multicast lists wouldn't make sense, not to mention possible different lenghts. So flush both lists here. Note "dev_addr_discard" will be very soon replaced by "dev_mc_flush" (once mc_list conversion will be done). Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ca5b404 |
|
10-Mar-2010 |
Jiri Pirko <jpirko@redhat.com> |
bonding: check return value of nofitier when changing type This patch adds the possibility to refuse the bonding type change for other subsystems (such as for example bridge, vlan, etc.) Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93d9b7d7 |
|
10-Mar-2010 |
Jiri Pirko <jpirko@redhat.com> |
net: rename notifier defines for netdev type change Since generally there could be more netdevices changing type other than bonding, making this event type name "bonding-unrelated" Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d6184e4 |
|
27-Feb-2010 |
Patrick McHardy <kaber@trash.net> |
bonding: fix device leak on error in bond_create() When the register_netdevice() call fails, the newly allocated device is not freed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
35cfabdc |
|
01-Feb-2010 |
Ajit Khaparde <ajitkhaparde@gmail.com> |
bonding: Remove net_device_stats from bonding struct There is no need to maintain stats in the bonding structure. Use the instance of net_device_stats in netdevice. Signed-off-by: Ajit Khaparde <ajitk@serverengines.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b473946a |
|
25-Jan-2010 |
stephen hemminger <shemminger@vyatta.com> |
bonding: bond_open error return value The convention for API functions in kernel is to return errno value; bond_open would return -1 if alb setup failed. The only reason that could happen is if kmalloc() failed. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2c8c1e72 |
|
16-Jan-2010 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: spread __net_init, __net_exit __net_init/__net_exit are apparently not going away, so use them to full extent. In some cases __net_init was removed, because it was called from __net_exit code. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f3c8804 |
|
14-Dec-2009 |
Andy Gospodarek <andy@greyhouse.net> |
bonding: allow arp_ip_targets on separate vlans to use arp validation This allows a bond device to specify an arp_ip_target as a host that is not on the same vlan as the base bond device and still use arp validation. A configuration like this, now works: BONDING_OPTS="mode=active-backup arp_interval=1000 arp_ip_target=10.0.100.1 arp_validate=3" 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 qlen 1000 link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff 3: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 qlen 1000 link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff 8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff inet6 fe80::213:21ff:febe:33e9/64 scope link valid_lft forever preferred_lft forever 9: bond0.100@bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether 00:13:21:be:33:e9 brd ff:ff:ff:ff:ff:ff inet 10.0.100.2/24 brd 10.0.100.255 scope global bond0.100 inet6 fe80::213:21ff:febe:33e9/64 scope link valid_lft forever preferred_lft forever Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: eth1 MII Status: up MII Polling Interval (ms): 0 Up Delay (ms): 0 Down Delay (ms): 0 ARP Polling Interval (ms): 1000 ARP IP target/s (n.n.n.n form): 10.0.100.1 Slave Interface: eth1 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:40:05:30:ff:30 Slave Interface: eth0 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:13:21:be:33:e9 Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a4aee5c8 |
|
13-Dec-2009 |
Joe Perches <joe@perches.com> |
drivers/net/bonding/: : use pr_fmt Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt Remove DRV_NAME from pr_<level>s Consolidate long format strings Remove some extra tab indents Remove some unnecessary ()s from pr_<level>s arguments Align pr_<level> arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e95a202 |
|
03-Dec-2009 |
Joe Perches <joe@perches.com> |
drivers/net: Move && and || to end of previous line Only files where David Miller is the primary git-signer. wireless, wimax, ixgbe, etc are not modified. Compile tested x86 allyesconfig only Not all files compiled (not x86 compatible) Added a few > 80 column lines, which I ignored. Existing checkpatch complaints ignored. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
15449745 |
|
29-Nov-2009 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Simplify the bond drivers pernet operations. Take advantage of the new pernet automatic storage management, and stop using compatibility network namespace functions. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f99189b1 |
|
17-Nov-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
netns: net_identifiers should be read_mostly Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6639104b |
|
29-Oct-2009 |
Eric W. Biederman <ebiederm@xmission.com> |
bond: Get the rtnl_link_ops support correct - Don't call rtnl_link_unregister if rtnl_link_register fails - Set .priv_size so we aren't stomping on uninitialized memory when we use netdev_priv, on bond devices created with ip link add type bond. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ec87fd3b |
|
29-Oct-2009 |
Eric W. Biederman <ebiederm@aristanetworks.com> |
bond: Add support for multiple network namespaces Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
88ead977 |
|
29-Oct-2009 |
Eric W. Biederman <ebiederm@aristanetworks.com> |
bond: Implement a basic set of rtnl link ops This implements a basic set of rtnl link ops and takes advantage of the fact that rtnl_link_unregister kills all of the surviving devices to all us to kill bond_free_all. A module alias is added so ip link add can pull in the bonding module. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c67dfb29 |
|
29-Oct-2009 |
Eric W. Biederman <ebiederm@aristanetworks.com> |
bond: Simplify bond device destruction Manually inline the code from bond_deinit to bond_uninit. bond_uninit is the only caller and it is short. Move the call of bond_release_all from the netdev notifier into bond_uninit. The call site is effectively the same and performing the call explicitly allows all the paths for destroying a bonding device to behave the same way. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
30c15ba9 |
|
29-Oct-2009 |
Eric W. Biederman <ebiederm@aristanetworks.com> |
bond: Simplify bond_create. Stop calling dev_get_by_name to see if the bond device already exists. register_netdevice already does that. Stop calling bond_deinit if register_netdevice fails as bond_uninit is guaranteed to be called if bond_init succeeds. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6151b3d4 |
|
29-Oct-2009 |
Eric W. Biederman <ebiederm@aristanetworks.com> |
bond: Simply bond sysfs group creation This patch delegates the work of creating the sysfs groups to the netdev layer and ultimately to the device layer. This closes races between uevents. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d9d52832 |
|
28-Oct-2009 |
Jiri Bohac <jbohac@suse.cz> |
bonding: fix a race condition in calls to slave MII ioctls In mii monitor mode, bond_check_dev_link() calls the the ioctl handler of slave devices. It stores the ndo_do_ioctl function pointer to a static (!) ioctl variable and later uses it to call the handler with the IOCTL macro. If another thread executes bond_check_dev_link() at the same time (even with a different bond, which none of the locks prevent), a race condition occurs. If the two racing slaves have different drivers, this may result in one driver's ioctl handler being called with a pointer to a net_device controlled with a different driver, resulting in unpredictable breakage. Unless I am overlooking something, the "static" must be a copy'n'paste error (?). Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a361c83c |
|
22-Oct-2009 |
Jasper Spaans <spaans@fox-it.com> |
bonding: Remove bond_dev from xmit_hash_policy call. Now that the bonding device is no longer used in determining the device to which to send packets, it can be dropped from the argument list of the various xmit_hash_policy calls. Signed-off-by: Jasper Spaans <spaans@fox-it.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d3da6831 |
|
22-Oct-2009 |
Jasper Spaans <spaans@fox-it.com> |
bonding: Modify hash transmit policies to use the packet's source MAC address Modify bonding hash transmit policies to use the psource MAC address of the packet instead of the MAC address configured for the bonding device. The old sitation conflicts with the documentation. Signed-off-by: Jasper Spaans <spaans@fox-it.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
38fc0026 |
|
13-Oct-2009 |
Nicolas de Pesloüan <nicolas.2p.debian@free.fr> |
bonding: change bond_create_proc_entry() to return void The function bond_create_proc_entry is currently of type int. Two versions of this function exist: The one in the ifdef CONFIG_PROC_FS branch always return 0. The one in the else branch (which is empty) return nothing. When CONFIG_PROC_FS is undef, this cause the following warning: drivers/net/bonding/bond_main.c: In function `bond_create_proc_entry': drivers/net/bonding/bond_main.c:3393: warning: control reaches end of non-void function No caller of this function use the returned value. So change the returned type from int to void and remove the useless return 0; . Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Reported-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
49b4ad92 |
|
07-Oct-2009 |
Nicolas de Pesloüan <nicolas.2p.debian@free.fr> |
bonding: remove useless assignment The variable old_active is first set to bond->curr_active_slave. Then, it is unconditionally set to new_active, without being used in between. The first assignment, having no side effect, is useless. Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3c6aaa24 |
|
07-Oct-2009 |
Nicolas de Pesloüan <nicolas.2p.debian@free.fr> |
bonding: fix a parameter name in error message When parsing module parameters, bond_check_params() erroneously use 'xor_mode' as the name of a module parameter in an error message. The right name for this parameter is 'xmit_hash_policy'. Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a549952a |
|
24-Sep-2009 |
Jiri Pirko <jpirko@redhat.com> |
bonding: introduce primary_reselect option In some cases there is not desirable to switch back to primary interface when it's link recovers and rather stay with currently active one. We need to avoid packetloss as much as we can in some cases. This is solved by introducing primary_reselect option. Note that enslaved primary slave is set as current active no matter what. Patch modified by Jay Vosburgh as follows: fixed bug in action after change of option setting via sysfs, revised the documentation update, and bumped the bonding version number. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b9f60253 |
|
31-Aug-2009 |
Jiri Pirko <jpirko@redhat.com> |
bonding: make ab_arp select active slaves as other modes When I was implementing primary_passive option (formely named primary_lazy) I've run into troubles with ab_arp. This is the only mode which is not using bond_select_active_slave() function to select active slave and instead it selects it itself. This seems to be not the right behaviour and it would be better to do it in bond_select_active_slave() for all cases. This patch makes this happen. Please review. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75c78500 |
|
15-Sep-2009 |
Moni Shoua <monis@voltaire.com> |
bonding: remap muticast addresses without using dev_close() and dev_open() This patch fixes commit e36b9d16c6a6d0f59803b3ef04ff3c22c3844c10. The approach there is to call dev_close()/dev_open() whenever the device type is changed in order to remap the device IP multicast addresses to HW multicast addresses. This approach suffers from 2 drawbacks: *. It assumes tha the device is UP when calling dev_close(), or otherwise dev_close() has no affect. It is worth to mention that initscripts (Redhat) and sysconfig (Suse) doesn't act the same in this matter. *. dev_close() has other side affects, like deleting entries from the routing table, which might be unnecessary. The fix here is to directly remap the IP multicast addresses to HW multicast addresses for a bonding device that changes its type, and nothing else. Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
424efe9c |
|
31-Aug-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
netdev: convert pseudo drivers to netdev_tx_t These are all drivers that don't touch real hardware. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c988853 |
|
27-Aug-2009 |
Petri Gynther <pgynther@google.com> |
bonding: Have bond_check_dev_link examine netif_running bonding: Have bond_check_dev_link examine netif_running Some network devices do not call netif_carrier_off when they are set administratively down. Have the bonding link check function also inspect the netif_running state. Ignore netif_running if the bond_check_dev_link function is called with "reporting" set, as in that case it's inspecting the capabilities of the non-netif_carrier device driver. Signed-off-by: Petri Gynther <pgynther@google.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f5841306 |
|
28-Aug-2009 |
Nicolas de Pesloüan <nicolas.2p.debian@free.fr> |
bonding: Fix useless test: int > INT_MAX max_bonds is of type int and cannot be greater than INT_MAX. Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
89c76c62 |
|
27-Aug-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
bonding: use compare_ether_addr Bonding can use compare_ether_addr() in bond_release. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
278339a4 |
|
27-Aug-2009 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: propogate vlan_features to bonding master Propogate the vlan_features of the slave devices to the bonding master device, using the same logic as for regular features. Tested by Or Gerlitz <ogerlitz@voltaire.com>, who also removed the debug logic from the original test patch. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5e2a8fd |
|
12-Aug-2009 |
Jiri Pirko <jpirko@redhat.com> |
bonding: wipe out printk's I did not introduce new lines over 80 chars. I even eliminated some of them. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e36b9d16 |
|
14-Jul-2009 |
Moni Shoua <monis@Voltaire.COM> |
bonding: clean muticast addresses when device changes type Bonding device forbids slave device of different types under the same master. However, it is possible for a bonding master to change type during its lifetime. This can be either from ARPHRD_ETHER to ARPHRD_INFINIBAND or the other way arround. The change of type requires device level multicast address cleanup because device level multicast addresses depend on the device type. The patch adds a call to dev_close() before the bonding master changes type and dev_open() just after that. In the example below I enslaved an IPoIB device (ib0) under bond0. Since each bonding master starts as device of type ARPHRD_ETHER by default, a change of type occurs when ib0 is enslaved. This is how /proc/net/dev_mcast looks like without the patch 5 bond0 1 0 00ffffffff12601bffff000000000001ff96ca05 5 bond0 1 0 01005e000116 5 bond0 1 0 01005e7ffffd 5 bond0 1 0 01005e000001 5 bond0 1 0 333300000001 6 ib0 1 0 00ffffffff12601bffff000000000001ff96ca05 6 ib0 1 0 333300000001 6 ib0 1 0 01005e000001 6 ib0 1 0 01005e7ffffd 6 ib0 1 0 01005e000116 6 ib0 1 0 00ffffffff12401bffff00000000000000000001 6 ib0 1 0 00ffffffff12601bffff00000000000000000001 and this is how it looks like after the patch. 5 bond0 1 0 00ffffffff12601bffff000000000001ff96ca05 5 bond0 1 0 00ffffffff12601bffff00000000000000000001 5 bond0 1 0 00ffffffff12401bffff0000000000000ffffffd 5 bond0 1 0 00ffffffff12401bffff00000000000000000116 5 bond0 1 0 00ffffffff12401bffff00000000000000000001 6 ib0 1 0 00ffffffff12601bffff000000000001ff96ca05 6 ib0 1 0 00ffffffff12401bffff00000000000000000116 6 ib0 1 0 00ffffffff12401bffff0000000000000ffffffd 6 ib0 2 0 00ffffffff12401bffff00000000000000000001 6 ib0 2 0 00ffffffff12601bffff00000000000000000001 Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ec634fe3 |
|
05-Jul-2009 |
Patrick McHardy <kaber@trash.net> |
net: convert remaining non-symbolic return values in ndo_start_xmit() functions This patch converts the remaining occurences of raw return values to their symbolic counterparts in ndo_start_xmit() functions that were missed by the previous automatic conversion. Additionally code that assumed the symbolic value of NETDEV_TX_OK to be zero is changed to explicitly use NETDEV_TX_OK. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
181470fc |
|
12-Jun-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
bonding: initialization rework Need to rework how bonding devices are initialized to make it more amenable to creating bonding devices via netlink. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
373500db |
|
12-Jun-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
bonding: network device names are case sensative The bonding device acts unlike all other Linux network device functions in that it ignores case of device names. The developer must have come from windows! Cleanup the management of names and use standard routines where possible. Flag places where bonding device still doesn't work right with network namespaces. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d632c3f |
|
12-Jun-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
bonding: fix style issues Resolve some of the complaints from checkpatch, and remove "magic emacs format" comments, and useless MODULE_SUPPORTED_DEVICE(). But should not change actual code. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9e71626c |
|
12-Jun-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
bonding: fix destructor It is not safe to use a network device destructor that is a function in the module, since it can be called after module is unloaded if sysfs handle is open. When eventually using netlink, the device cleanup code needs to be done via uninit function. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7e083840 |
|
12-Jun-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
bonding: remove bonding read/write semaphore The whole read/write semaphore locking can be removed. It doesn't add any protection that isn't already done by using the RTNL mutex properly. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d9321605 |
|
12-Jun-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
bonding: initialize before registration Avoid a unnecessary carrier state transistion that happens when device is registered. Lockdep works better if initialization is done before registration as well. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d2991f75 |
|
12-Jun-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
bonding: bond_create always called with default parameters bond_create() is always called with same parameters so move the argument down. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae63e808 |
|
26-May-2009 |
Jiri Pirko <jpirko@redhat.com> |
bonding: use bond_is_lb() when it's appropriate Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
93f154b5 |
|
18-May-2009 |
Eric Dumazet <dada1@cosmosbay.com> |
net: release dst entry in dev_hard_start_xmit() One point of contention in high network loads is the dst_release() performed when a transmited skb is freed. This is because NIC tx completion calls dev_kree_skb() long after original call to dev_queue_xmit(skb). CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is quite visible if one CPU is 100% handling softirqs for a network device, since dst_clone() is done by other cpus, involving cache line ping pongs. It seems right place to release dst is in dev_hard_start_xmit(), for most devices but ones that are virtual, and some exceptions. David Miller suggested to define a new device flag, set in alloc_netdev_mq() (so that most devices set it at init time), and carefuly unset in devices which dont want a NULL skb->dst in their ndo_start_xmit(). List of devices that must clear this flag is : - loopback device, because it calls netif_rx() and quoting Patrick : "ip_route_input() doesn't accept loopback addresses, so loopback packets already need to have a dst_entry attached." - appletalk/ipddp.c : needs skb->dst in its xmit function - And all devices that call again dev_queue_xmit() from their xmit function (as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9d21493b |
|
17-May-2009 |
Eric Dumazet <dada1@cosmosbay.com> |
net: tx scalability works : trans_start struct net_device trans_start field is a hot spot on SMP and high performance devices, particularly multi queues ones, because every transmitter dirties it. Is main use is tx watchdog and bonding alive checks. But as most devices dont use NETIF_F_LLTX, we have to lock a netdev_queue before calling their ndo_start_xmit(). So it makes sense to move trans_start from net_device to netdev_queue. Its update will occur on a already present (and in exclusive state) cache line, for free. We can do this transition smoothly. An old driver continue to update dev->trans_start, while an updated one updates txq->trans_start. Further patches could also put tx_bytes/tx_packets counters in netdev_queue to avoid dirtying dev->stats (vlan device comes to mind) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9d34d1a2 |
|
08-May-2009 |
Florian Westphal <fw@strlen.de> |
bonding: fix panic if initialization fails If module initialisation failed (e.g. because the bonding sysfs entry cannot be created), kernel panics: IP: [<ffffffff8024910a>] destroy_workqueue+0x2d/0x146 Call Trace: [<ffffffff808268c4>] bond_destructor+0x28/0x78 [<ffffffff80b64471>] netdev_run_todo+0x231/0x25a [<ffffffff80b6dbcd>] rtnl_unlock+0x9/0xb [<ffffffff81567907>] bonding_init+0x83e/0x84a Remove the calls to bond_work_cancel_all() and destroy_workqueue(); both are also called/scheduled via bond_free_all(). bond_destroy_sysfs is unecessary because the sysfs entry has not been created in the error case. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aee64faf |
|
05-May-2009 |
Jiri Pirko <jpirko@redhat.com> |
bonding: get rid of CONFIG_PROC_FS ifdefs Remove CONFIG_PROC_FS ifdefs from the code by adding void functions. Signed-off-by: Jiri Pirko <jpirko@redhat.com> drivers/net/bonding/bond_main.c | 30 ++++++++++++++++++++---------- 1 files changed, 20 insertions(+), 10 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1363d9b1 |
|
01-May-2009 |
Jiri Pirko <jpirko@redhat.com> |
bonding: correct the cleanup in bond_create() This patch makes the cleanup in bond_create nicer :) Also now the forgotten free_netdev is called. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
689c96cc |
|
22-Apr-2009 |
Eric Dumazet <dada1@cosmosbay.com> |
bonding: bond_slave_info_query() fix bond_slave_info_query() should keep a read lock while accessing slave info, or risk accessing stale data and corruption. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
41f89100 |
|
23-Apr-2009 |
Jiri Pirko <jpirko@redhat.com> |
bonding: ignore updelay param when there is no active slave Pointed out by Sean E. Millichamp. Quote from Documentation/networking/bonding.txt: "Note that when a bonding interface has no active links, the driver will immediately reuse the first link that goes up, even if the updelay parameter has been specified (the updelay is ignored in this case). If there are slave interfaces waiting for the updelay timeout to expire, the interface that first went into that state will be immediately reused. This reduces down time of the network if the value of updelay has been overestimated, and since this occurs only in cases with no connectivity, there is no additional penalty for ignoring the updelay." This patch actually changes the behaviour in this way. Signed-off-by: Jiri Pirko <jpirko@redhat.com> drivers/net/bonding/bond_main.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
29112f4e |
|
23-Apr-2009 |
Jiri Pirko <jpirko@redhat.com> |
bonding: use ethtool for link checking first This patch only changes the order of interfaces to use for checking slave link status in bond_check_dev_link() to priorize ethtool interface. Should safe some troubles as ethtool seems to be more supported. Jirka Signed-off-by: Jiri Pirko <jpirko@redhat.com> drivers/net/bonding/bond_main.c | 26 ++++++++++++-------------- 1 files changed, 12 insertions(+), 14 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a31bec0 |
|
13-Apr-2009 |
Brian Haley <brian.haley@hp.com> |
Bonding: fix zero address hole bug in arp_ip_target list Fix a zero address hole bug in the bonding arp_ip_target list that was causing the bond to ignore ARP replies (bugz 13006). Instead of just setting the array entry to zero, we now copy any additional entries down one slot, putting the zero entry at the end. With this change we can now have all the loops that walk the array stop when they hit a zero since there will be no addresses after it. Changes are based in part on code fragment provided in kernel: bugzilla 13006: http://bugzilla.kernel.org/show_bug.cgi?id=13006 by Steve Howard <steve@astutenetworks.com> Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
99b76233 |
|
25-Mar-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc 2/2: remove struct proc_dir_entry::owner Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy as correctly noted at bug #12454. Someone can lookup entry with NULL ->owner, thus not pinning enything, and release it later resulting in module refcount underflow. We can keep ->owner and supply it at registration time like ->proc_fops and ->data. But this leaves ->owner as easy-manipulative field (just one C assignment) and somebody will forget to unpin previous/pin current module when switching ->owner. ->proc_fops is declared as "const" which should give some thoughts. ->read_proc/->write_proc were just fixed to not require ->owner for protection. rmmod'ed directories will be empty and return "." and ".." -- no harm. And directories with tricky enough readdir and lookup shouldn't be modular. We definitely don't want such modular code. Removing ->owner will also make PDE smaller. So, let's nuke it. Kudos to Jeff Layton for reminding about this, let's say, oversight. http://bugzilla.kernel.org/show_bug.cgi?id=12454 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
|
#
5a29f789 |
|
25-Mar-2009 |
Jiri Pirko <jpirko@redhat.com> |
bonding: select current active slave when enslaving device for mode tlb and alb I've hit an issue on my system when I've been using RealTek RTL8139D cards in bonding interface in mode balancing-alb. When I enslave a card, the current active slave (bond->curr_active_slave) is not set and the link is therefore not functional. ---- # cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008) Bonding Mode: adaptive load balancing Primary Slave: None Currently Active Slave: None MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:1f:1f:01:2f:22 ---- The thing that gets it right is when I unplug the cable and then I put it back into the NIC. Then the current active slave is set to eth1 and link is working just fine. Here is dmesg log with bonding DEBUG messages turned on: ---- ADDRCONF(NETDEV_UP): bond0: link is not ready event_dev: bond0, event: 1 IFF_MASTER event_dev: bond0, event: 8 IFF_MASTER bond_ioctl: master=bond0, cmd=35216 slave_dev=cac5d800: slave_dev->name=eth1: eth1: ! NETIF_F_VLAN_CHALLENGED event_dev: eth1, event: 8 eth1: link up, 100Mbps, full-duplex, lpa 0xC5E1 event_dev: eth1, event: 1 event_dev: eth1, event: 8 IFF_SLAVE Initial state of slave_dev is BOND_LINK_UP bonding: bond0: enslaving eth1 as an active interface with an up link. ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready event_dev: bond0, event: 4 IFF_MASTER bond0: no IPv6 routers present <<<<cable unplug>>>> eth1: link down event_dev: eth1, event: 4 IFF_SLAVE bonding: bond0: link status definitely down for interface eth1, disabling it event_dev: bond0, event: 4 IFF_MASTER <<<<cable plug>>>> eth1: link up, 100Mbps, full-duplex, lpa 0xC5E1 event_dev: eth1, event: 4 IFF_SLAVE bonding: bond0: link status definitely up for interface eth1. bonding: bond0: making interface eth1 the new active one. event_dev: eth1, event: 8 IFF_SLAVE event_dev: eth1, event: 8 IFF_SLAVE bonding: bond0: first active interface up! event_dev: bond0, event: 4 IFF_MASTER ---- The current active slave is set by calling bond_select_active_slave() function from bond_miimon_commit() function when the slave (eth1) link goes to state up. I also tested this on other machine with Broadcom NetXtreme II BCM5708 1000Base-T NIC and there all works fine. The thing is that this adapter is down and goes up after few seconds after it is enslaved. This patch calls bond_select_active_slave() in bond_enslave() function for modes alb and tlb and makes sure that the current active slave is set up properly even when the slave state is already up. Tested on both systems, works fine. Notice: The same problem can maybe also occrur in mode 8023AD but I'm unable to test that. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
17d04500 |
|
18-Mar-2009 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Fix updating of speed/duplex changes This patch corrects an omission from the following commit: commit f0c76d61779b153dbfb955db3f144c62d02173c2 Author: Jay Vosburgh <fubar@us.ibm.com> Date: Wed Jul 2 18:21:58 2008 -0700 bonding: refactor mii monitor The un-refactored code checked the link speed and duplex of every slave on every pass; the refactored code did not do so. The 802.3ad and balance-alb/tlb modes utilize the speed and duplex information, and require it to be kept up to date. This patch adds a notifier check to perform the appropriate updating when the slave device speed changes. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
72e2240f |
|
05-Mar-2009 |
Patrick McHardy <kaber@trash.net> |
bonding: Fix device passed into ->ndo_neigh_setup(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
54b87323 |
|
14-Feb-2009 |
Hannes Eder <hannes@hanneseder.net> |
drivers/net/bonding: fix sparse warning: symbol shadows an earlier one Impact: Rename function scope variable. Fix this sparse warning: drivers/net/bonding/bond_main.c:4704:13: warning: symbol 'mode' shadows an earlier one drivers/net/bonding/bond_main.c:95:13: originally declared here Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f78d9f9 |
|
14-Feb-2009 |
Hannes Eder <hannes@hanneseder.net> |
drivers/net/bonding: fix sparse warnings: context imbalance Impact: Attribute functions with __acquires(...) and/or __releases(...). Fix this sparse warnings: drivers/net/bonding/bond_alb.c:1675:9: warning: context imbalance in 'bond_alb_handle_active_change' - unexpected unlock drivers/net/bonding/bond_alb.c:1742:9: warning: context imbalance in 'bond_alb_set_mac_address' - unexpected unlock drivers/net/bonding/bond_main.c:1025:17: warning: context imbalance in 'bond_do_fail_over_mac' - unexpected unlock drivers/net/bonding/bond_main.c:3195:13: warning: context imbalance in 'bond_info_seq_start' - wrong count at exit drivers/net/bonding/bond_main.c:3234:13: warning: context imbalance in 'bond_info_seq_stop' - unexpected unlock Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4101dec9 |
|
14-Jan-2009 |
Jan Engelhardt <jengelh@medozas.de> |
net: constify VFTs Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
53a3294e |
|
06-Jan-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
bonding: use net_device_ops Use the correct pointer in debug message. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b06715b7 |
|
26-Dec-2008 |
Hannes Eder <hannes@hanneseder.net> |
drivers/net/bonding: fix sparse warnings: move decls to header file Fix this sparse warnings: drivers/net/bonding/bond_main.c:104:20: warning: symbol 'bonding_defaults' was not declared. Should it be static? drivers/net/bonding/bond_main.c:204:22: warning: symbol 'ad_select_tbl' was not declared. Should it be static? drivers/net/bonding/bond_sysfs.c:60:21: warning: symbol 'bonding_rwsem' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e97fd7c6 |
|
10-Dec-2008 |
Holger Eitzenberger <holger@eitzenberger.org> |
bonding: turn all bond_parm_tbls const Turn all bond_parm_tbls const. Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
325dcf7a |
|
10-Dec-2008 |
Holger Eitzenberger <holger@eitzenberger.org> |
bonding: make tbl argument to bond_parse_parm() const bond_parse_parm() parses a parameter table for a particular value and is therefore not modifying the table at all. Therefore make the 2nd argument const, thus allowing to make the tables const later. Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a03cdb7 |
|
10-Dec-2008 |
Holger Eitzenberger <holger@eitzenberger.org> |
bonding: use pr_debug instead of own macros Use pr_debug() instead of own macros. Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
77afc92b |
|
10-Dec-2008 |
Holger Eitzenberger <holger@eitzenberger.org> |
bonding: use table for mode names Use a small array in bond_mode_name() for the names, thus saving some space: before text data bss dec hex filename 57736 9372 344 67452 1077c drivers/net/bonding/bonding.ko after text data bss dec hex filename 57441 9372 344 67157 10655 drivers/net/bonding/bonding.ko Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
58402054 |
|
10-Dec-2008 |
Holger Eitzenberger <holger@eitzenberger.org> |
bonding: add and use bond_is_lb() Introduce and use bond_is_lb(), it is usefull to shorten the repetitive check for either ALB or TLB mode. Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
244ef9b9 |
|
03-Dec-2008 |
Wang Chen <wangchen@cn.fujitsu.com> |
bond: Kill directly reference of netdev->priv Simply replace netdev->priv with netdev_priv(). Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
00829823 |
|
20-Nov-2008 |
Stephen Hemminger <shemminger@vyatta.com> |
netdev: add more functions to netdevice ops This patch moves neigh_setup and hard_start_xmit into the network device ops structure. For bisection, fix all the previously converted drivers as well. Bonding driver took the biggest hit on this. Added a prefetch of the hard_start_xmit in the fast path to try and reduce any impact this would have. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb7cc59a |
|
19-Nov-2008 |
Stephen Hemminger <shemminger@vyatta.com> |
bonding: convert to net_device_ops Convert to net_device_ops table. Note: for some operations move error checking into generic networking layer (rather than looking at pointers in bonding). A couple of gratituous style cleanups to get rid of extra {} Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eeda3fd6 |
|
19-Nov-2008 |
Stephen Hemminger <shemminger@vyatta.com> |
netdev: introduce dev_get_stats() In order for the network device ops get_stats call to be immutable, the handling of the default internal network device stats block has to be changed. Add a new helper function which replaces the old use of internal_get_stats. Note: change return code to make it clear that the caller should not go changing the returned statistics. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
454d7c9b |
|
13-Nov-2008 |
Wang Chen <wangchen@cn.fujitsu.com> |
netdevice: safe convert to netdev_priv() #part-1 We have some reasons to kill netdev->priv: 1. netdev->priv is equal to netdev_priv(). 2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously netdev_priv() is more flexible than netdev->priv. But we cann't kill netdev->priv, because so many drivers reference to it directly. This patch is a safe convert for netdev->priv to netdev_priv(netdev). Since all of the netdev->priv is only for read. But it is too big to be sent in one mail. I split it to 4 parts and make every part smaller than 100,000 bytes, which is max size allowed by vger. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fd989c83 |
|
04-Nov-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: alternate agg selection policies for 802.3ad This patch implements alternative aggregator selection policies for 802.3ad. The existing policy, now termed "stable," selects the active aggregator by greatest bandwidth, and only reselects a new aggregator if the active aggregator is entirely disabled (no more ports or all ports down). This patch adds two new policies: bandwidth and count, selecting the active aggregator by total bandwidth (like the stable policy) or by the number of ports in the aggregator, respectively. These two policies also differ from the stable policy in that they will reselect the active aggregator when availability-related changes occur in the bond (e.g., link state change). This permits "gang failover" within 802.3ad, allowing redundant aggregators along parallel paths to always maintain the "best" aggregator as the active aggregator (rather than having to wait for the active to entirely fail). This patch also updates the driver version to 3.5.0. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
305d552a |
|
04-Nov-2008 |
Brian Haley <brian.haley@hp.com> |
bonding: send IPv6 neighbor advertisement on failover This patch adds better IPv6 failover support for bonding devices, especially when in active-backup mode and there are only IPv6 addresses configured, as reported by Alex Sidorenko. - Creates a new file, net/drivers/bonding/bond_ipv6.c, for the IPv6-specific routines. Both regular bonds and VLANs over bonds are supported. - Adds a new tunable, num_unsol_na, to limit the number of unsolicited IPv6 Neighbor Advertisements that are sent on a failover event. Default is 1. - Creates two new IPv6 neighbor discovery functions: ndisc_build_skb() ndisc_send_skb() These were required to support VLANs since we have to be able to add the VLAN id to the skb since ndisc_send_na() and friends shouldn't be asked to do this. These two routines are basically __ndisc_send() split into two pieces, in a slightly different order. - Updates Documentation/networking/bonding.txt and bumps the rev of bond support to 3.4.0. On failover, this new code will generate one packet: - An unsolicited IPv6 Neighbor Advertisement, which helps the switch learn that the address has moved to the new slave. Testing has shown that sending just the NA results in pretty good behavior when in active-back mode, I saw no lost ping packets for example. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
6cf3f41e |
|
03-Nov-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding, net: Move last_rx update into bonding recv logic The only user of the net_device->last_rx field is bonding. This patch adds a conditional update of last_rx to the bonding special logic in skb_bond_should_drop, causing last_rx to only be updated when the ARP monitor is running. This frees network device drivers from the necessity of updating last_rx, which can have cache line thrash issues. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
63779436 |
|
31-Oct-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
drivers: replace NIPQUAD() Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a434e43f |
|
30-Oct-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Clean up resource leaks This patch reworks the resource free logic performed at the time a bonding device is released. This (a) closes two resource leaks, one for workqueues and one for multicast lists, and (b) improves commonality of code between the "destroy one" and "destroy all" paths by performing final free activity via destructor instead of explicitly (and differently) in each path. "Sean E. Millichamp" <sean@bruenor.org> reported the workqueue leak, and included a different patch. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
fba4acda |
|
30-Oct-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: fix miimon failure counter During the rework of the mii monitor for: commit f0c76d61779b153dbfb955db3f144c62d02173c2 Author: Jay Vosburgh <fubar@us.ibm.com> Date: Wed Jul 2 18:21:58 2008 -0700 bonding: refactor mii monitor I left out the increment of the link failure counter. This patch corrects that omission. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
8cf14e38 |
|
29-Oct-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: easy removals of HIPQUAD using %pI4 format As a bonus, removes some unnecessary byteswapping. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e174961c |
|
27-Oct-2008 |
Johannes Berg <johannes@sipsolutions.net> |
net: convert print_mac to %pM This converts pretty much everything to print_mac. There were a few things that had conflicts which I have just dropped for now, no harm done. I've built an allyesconfig with this and looked at the files that weren't built very carefully, but it's a huge patch. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b63365a2 |
|
23-Oct-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
net: Fix disjunct computation of netdev features My change commit e2a6b85247aacc52d6ba0d9b37a99b8d1a3e0d83 net: Enable TSO if supported by at least one device didn't do what was intended because the netdev_compute_features function was designed for conjunctions. So what happened was that it would simply take the TSO status of the last constituent device. This patch extends it to support both conjunctions and disjunctions under the new name of netdev_increment_features. It also adds a new function netdev_fix_features which does the sanity checking that usually occurs upon registration. This ensures that the computation doesn't result in an illegal combination since this checking is absent when the change is initiated via ethtool. The two users of netdev_compute_features have been converted. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fa53ebac |
|
13-Sep-2008 |
Stephen Hemminger <shemminger@vyatta.com> |
bonding: add more ethtool support This patch allows reporting the link, checksum, and feature settings of bonded device by using generic hooks. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
f14c4e4e |
|
02-Sep-2008 |
Brian Haley <brian.haley@hp.com> |
bonding: change some __constant_htons() to htons() Resending since I didn't see any responses from the first try. Change __constant_htons() to htons() in the bonding driver, it should only be used for initializers. -Brian Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
f0c76d61 |
|
02-Jul-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: refactor mii monitor Refactor mii monitor. As with the previous ARP monitor refactor, the motivation for this is to handle locking rationally (in this case, removing conditional locking) and generally clean up the code. This patch breaks up the monolithic mii monitor into two phases: an inspection phase, followed by an optional commit phase. The commit phase is the only portion that requires RTNL or makes changes to state, and is only called when inspection finds something to change. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
cf508b12 |
|
22-Jul-2008 |
David S. Miller <davem@davemloft.net> |
netdev: Handle ->addr_list_lock just like ->_xmit_lock for lockdep. The new address list lock needs to handle the same device layering issues that the _xmit_lock one does. This integrates work done by Patrick McHardy. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e8a0464c |
|
17-Jul-2008 |
David S. Miller <davem@davemloft.net> |
netdev: Allocate multiple queues for TX. alloc_netdev_mq() now allocates an array of netdev_queue structures for TX, based upon the queue_count argument. Furthermore, all accesses to the TX queues are now vectored through the netdev_get_tx_queue() and netdev_for_each_tx_queue() interfaces. This makes it easy to grep the tree for all things that want to get to a TX queue of a net device. Problem spots which are not really multiqueue aware yet, and only work with one queue, can easily be spotted by grepping for all netdev_get_tx_queue() calls that pass in a zero index. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b9e40857 |
|
15-Jul-2008 |
David S. Miller <davem@davemloft.net> |
netdev: Do not use TX lock to protect address lists. Now that we have a specific lock to protect the network device unicast and multicast lists, remove extraneous grabs of the TX lock in cases where the code only needs address list protection. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e308a5d8 |
|
15-Jul-2008 |
David S. Miller <davem@davemloft.net> |
netdev: Add netdev->addr_list_lock protection. Add netif_addr_{lock,unlock}{,_bh}() helpers. Use them to protect operations that operate on or read the network device unicast and multicast address lists. Also use them in cases where the code simply wants to block calls into the driver's ->set_rx_mode() and ->set_multicast_list() methods. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7e1a1ac1 |
|
14-Jul-2008 |
Wang Chen <wangchen@cn.fujitsu.com> |
bonding: Check return of dev_set_promiscuity/allmulti dev_set_promiscuity/allmulti might overflow. Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes dev_set_promiscuity/allmulti return error number if overflow happened. In bond_alb and bond_main, we check all positive increment for promiscuity and allmulti to get error return. But there are still two problems left. 1. Some code path has no mechanism to signal errors upstream. 2. If there are multi slaves, it's hard to tell which slaves increment promisc/allmulti successfully and which failed. So I left these problems to be FIXME. Fortunately, the overflow is very rare case. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c773e847 |
|
09-Jul-2008 |
David S. Miller <davem@davemloft.net> |
netdev: Move _xmit_lock and xmit_lock_owner into netdev_queue. Accesses are mostly structured such that when there are multiple TX queues the code transformations will be a little bit simpler. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b8a9787e |
|
13-Jun-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Allow setting max_bonds to zero Permit bonding to function rationally if max_bonds is set to zero. This will load the module, but create no master devices (which can be created via sysfs). Requires some change to bond_create_sysfs; currently, the netdev sysfs directory is determined from the first bonding device created, but this is no longer possible. Instead, an interface from net/core is created to create and destroy files in net_class. Based on a patch submitted by Phil Oester <kernel@linuxaces.com>. Modified by Jay Vosburgh to fix the sysfs issue mentioned above and to update the documentation. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
b59f9f74 |
|
13-Jun-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Rework / fix multiple gratuitous ARP support Support for sending multiple gratuitous ARPs during failovers was added by commit: commit 7893b2491a2d5f716540ac5643d78d37a7f6628b Author: Moni Shoua <monis@voltaire.com> Date: Sat May 17 21:10:12 2008 -0700 bonding: Send more than one gratuitous ARP when slave takes over This change modifies that support to remove duplicated code, add support for ARP monitor (the original only supported miimon), clear the grat ARP counter in bond_close (lest a later "ifconfig up" immediately start spewing ARPs), and add documentation for the module parameter. Also updated driver version to 3.3.0. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
01f3109d |
|
13-Jun-2008 |
Or Gerlitz <ogerlitz@voltaire.com> |
bonding: deliver netdev event for fail-over under the active-backup mode under active-backup mode and when there's actual new_active slave, have bond_change_active_slave() call the networking core to deliver NETDEV_BONDING_FAILOVER event such that the fail-over can be notable by code outside of the bonding driver such as the RDMA stack and monitoring tools. As the correct context of locking appropriate for notifier calls is RTNL and nothing else, bond->curr_slave_lock and bond->lock are unlocked and later locked again. This is ensured by the rest of the code to be safe under backup-mode AND when new_active is not NULL. Jay Vosburgh modified the original patch for formatting and fixed a compiler error. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
709f8a45 |
|
13-Jun-2008 |
Or Gerlitz <ogerlitz@voltaire.com> |
bonding: bond_change_active_slave() cleanup under active-backup simplified the code of bond_change_active_slave() such that under active-backup mode there's one "if (new_active)" test and the rest of the code only does extra checks on top of it. This removed an unneeded "if (bond->send_grat_arp > 0)" check and avoid calling bond_send_gratuitous_arp when there's no active slave. Jay Vosburgh made minor coding style changes to the orignal patch. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
3915c1e8 |
|
17-May-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Add "follow" option to fail_over_mac Add a "follow" selection for fail_over_mac. This option causes the MAC address to move from slave to slave as the active slave changes. This is in addition to the existing fail_over_mac option that causes the bond's MAC address to change during failover. This new option is useful for devices that cannot tolerate multiple ports using the same MAC address simultaneously, either because it confuses them or incurs a performance penalty (as is the case with some LPAR-aware multiport devices). Because the MAC of the bond itself does not change, the "follow" option is slightly more reliable during failover and doesn't change the MAC of the bond during operation. This patch requires a previous ARP monitor change to properly handle RTNL during failovers. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
b2220cad |
|
17-May-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: refactor ARP active-backup monitor Refactor ARP monitor for active-backup mode. The motivation for this is to take care of locking issues in a clear manner (particularly to correctly handle RTNL vs. the bonding locks). Currently, the a-b ARP monitor does not hold RTNL at all, but future changes will require RTNL during ARP monitor failovers. Rather than using conditional locking, this patch instead breaks up the ARP monitor into three discrete steps: inspection, commit changes, and probe. The inspection phase marks slaves that require link state changes. The commit phase is only called if inspection detects that changes are needed, and is called with RTNL. Lastly, the probe phase issues the ARP probes that the inspection phase uses to determine link state. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
7893b249 |
|
17-May-2008 |
Moni Shoua <monis@voltaire.com> |
bonding: Send more than one gratuitous ARP when slave takes over With IPoIB, reception of gratuitous ARP by neighboring hosts is essential for a successful change of slaves in case of failure. Otherwise, they won't learn about the HW address change and need to wait a long time until the neighboring system gives up and sends an ARP request to learn the new HW address. This patch decreases the chance for a lost of a gratuitous ARP packet by sending it more than once. The number retries is configurable and can be set with a module param. Signed-off-by: Moni Shoua <monis@voltaire.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
8047637c |
|
17-May-2008 |
Pavel Emelyanov <xemul@openvz.org> |
bonding: Remove unneeded list_empty checks. Some places iterate over the checked list right after the check itself, so even if the list is empty, the list_for_each_xxx iterator will make everything right by himself. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
0883beca |
|
17-May-2008 |
Pavel Emelyanov <xemul@openvz.org> |
bonding: Relax unneeded _safe lists iterations. Many places either do not modify the list under the list_for_each_xxx, or break out of the loop as soon as the first element is removed. Thus, this _safe iteration just occupies some unneeded .text space and requires an additional variable. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
0dd646fe |
|
17-May-2008 |
Pavel Emelyanov <xemul@openvz.org> |
bonding: Remove redundant argument from bond_create. While we're fixing the bond_create, I hope it's OK to polish it a bit after the fixes. The third argument is NULL at the first caller and is ignored by the second one, so remove it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
4b8a9239 |
|
17-May-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: remove test for IP in ARP monitor Remove bond_has_ip and all references to it. With this change, the ARP monitor will always send ARP probes if the master is up and has at least one slave. If the bond has an IP address, it is used in the ARP probe; if not, the probes are sent with all zeros in the sender's IP address (which is consistent with an RFC 2131 4.4.1 duplicate address probe). This is useful for cases when bonding itself is hidden underneath a layer of virtual devices, e.g., with Xen. Change suggested by Tsutomu Fujii <t-fujii@nb.jp.nec.com>, who included a one-line patch that only affected active-backup mode. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
5ce0da8f |
|
17-May-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Use msecs_to_jiffies, eliminate panic Convert bonding to use msecs_to_jiffies instead of doing the math. For the ARP monitor, there was an underflow problem that could result in an infinite loop. The miimon already had that worked around, but this is cleaner. Originally by Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Jay Vosburgh corrected a math error in the original; Nicolas' original commit message is: When setting arp_interval parameter to a very low value, delta_in_ticks for next arp might become 0, causing an infinite loop. See http://bugzilla.kernel.org/show_bug.cgi?id=10680 Same problem for miimon parameter already fixed, but fix might be enhanced, by using msecs_to_jiffies() function. Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
569f0c4d |
|
02-May-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: fix enslavement error unwinds As part of: commit c2edacf80e155ef54ae4774379d461b60896bc2e Author: Jay Vosburgh <fubar@us.ibm.com> Date: Mon Jul 9 10:42:47 2007 -0700 bonding / ipv6: no addrconf for slaves separately from master two steps were rearranged in the enslavement process: netdev_set_master is now before the call to dev_open to open the slave. This patch updates the error cases and unwind process at the end of bond_enslave to match the new order. Without this patch, it is possible for the enslavement to fail, but leave the slave with IFF_SLAVE set in its flags. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
ae68c398 |
|
02-May-2008 |
Pavel Emelyanov <xemul@openvz.org> |
bonding: Deadlock between bonding_store_bonds and bond_destroy_sysfs. The sysfs layer has an internal protection, that ensures, that all the process sitting inside ->sore/->show callback exits before the appropriate entry is unregistered (the calltraces are rather big, but I can provide them if required). On the other hand, bonding takes rtnl_lock in a) the bonding_store_bonds, i.e. in ->store callback, b) module exit before calling the sysfs unregister routines. Thus, the classical AB-BA deadlock may occur. To reproduce run # while :; do modprobe bonding; rmmod bonding; done and # while :; do echo '+bond%d' > /sys/class/net/bonding_masters ; done in parallel. The fix is to move the bond_destroy_sysfs out of the rtnl_lock, but _before_ bond_free_all to make sure no bonding devices exist after module unload. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
822973ba |
|
02-May-2008 |
Pavel Emelyanov <xemul@openvz.org> |
bonding: Do not call free_netdev for already registered device. If the call to bond_create_sysfs_entry in bond_create fails, the proper rollback is to call unregister_netdevice, not free_netdev. Otherwise - kernel BUG at net/core/dev.c:4057! Checked with artificial failures injected into bond_create_sysfs_entry. Pavel's original patch modified by Jay Vosburgh to move code around for clarity (remove goto-hopping within the unwind block). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
a95609cb |
|
29-Apr-2008 |
Denis V. Lunev <den@openvz.org> |
netdev: use non-racy method for proc entries creation Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data be setup before gluing PDE to main tree. Signed-off-by: Denis V. Lunev <den@openvz.org> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
dc13b385 |
|
10-Apr-2008 |
Joe Perches <joe@perches.com> |
drivers/net/bonding/bond_main.c - remove unnecessary #define bond_main.c already #includes <linux/seq_file.h> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
#
92b41daa |
|
21-Mar-2008 |
Libor Pechacek <lpechacek@suse.cz> |
bonding: Fix sysfs attribute handling For bonding interfaces any attempt to read the sysfs directory contents after module removal results in an oops. The fix is to release sysfs attributes for the interfaces upon module unload. Signed-off-by: Libor Pechacek <lpechacek@suse.cz> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
966bc6f4 |
|
21-Mar-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: fix two compiler warnings Fix two compiler warnings that are new with recent versions of gcc (apparently 4.2 and up). One is fixed by refactoring; this change was supplied by Stephen Hemminger. The other was fixed by labelling the variable as uninitialized_var() after confirming via inspection that it cannot actually be used uninitialized. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
c346dca1 |
|
25-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
988b7050 |
|
03-Mar-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[ARP]: Introduce the arp_hdr_len helper. There are some place, that calculate the ARP header length. These calculations are correct, but a) some operate with "magic" constants, b) enlarge the code length (sometimes at the cost of coding style), c) are not informative from the first glance. The proposal is to introduce a helper, that includes all the good sides of these calculations. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6133fb1a |
|
28-Feb-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Disable inetaddr notifiers in namespaces other than initial. ip_fib_init is kept enabled. It is already namespace-aware. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
21c9d8d7 |
|
29-Jan-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: do not acquire rtnl in ARP monitor The ARP monitor functions currently acquire RTNL when performing failover operations, but do so incorrectly (out of order). This causes various warnings from might_sleep. The ARP monitor isn't supported for any of the bonding modes that actually require RTNL, so it is safe to not hold RTNL when failing over in the ARP monitor. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2439f9eb |
|
29-Jan-2008 |
Andy Gospodarek <andy@greyhouse.net> |
bonding: fix race that causes invalid statistics I've seen reports of invalid stats in /proc/net/dev for bonding interfaces, and found it's a pretty easy problem to reproduce. Since the current code zeros the bonding stats when a read is requested and a pointer to that data is returned to the caller we cannot guarantee that the caller has completely accessed the data before a successive call to request the stats zeroes the stats again. This patch creates a new stack variable to keep track of the updated stats and copies the data from that variable into the bonding stats structure. This ensures that the value for any of the bonding stats should not incorrectly return zero for any of the bonding statistics. This does use more stack space and require an extra memcpy, but it seems like a fair trade-off for consistently correct bonding statistics. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Chris Snook <csnook@redhat.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4fe4763c |
|
29-Jan-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: fix NULL pointer deref in startup processing Fix the "are we creating a duplicate" check to not compare the name if the name is NULL (meaning that the system should select a name). Bug reported by Benny Amorsen <benny+usenet@amorsen.dk>. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
80ee5ad2 |
|
29-Jan-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: fix set_multicast_list locking This patch eliminates a problem (reported by lockdep) in the bond_set_multicast_list function. It first reduces the locking on bond->lock to a simple read_lock, and second, adds netif_tx locking around the bonding mc_list manipulations that occur outside of the set_multicast_list function. The original problem was related to IPv6 addrconf activity. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a42e534f |
|
29-Jan-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: fix parameter parsing My last fix (commit ece95f7fefe3afae19e641e1b3f5e64b00d5b948) didn't handle one case correctly. This resolves that, and it will now correctly parse parameters with arbitrary white space, and either text names or mode values. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f206351a |
|
22-Jan-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Add namespace parameter to ip_route_output_key. Needed to propagate it down to the ip_route_output_flow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5655662d |
|
17-Jan-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Don't hold lock when calling rtnl_unlock Change bond_mii_monitor to not hold any locks when calling rtnl_unlock, as rtnl_unlock can sleep (when acquring another mutex in netdev_run_todo). Bug reported by Makito SHIOKAWA <mshiokawa@miraclelinux.com>, who included a different patch. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
027ea041 |
|
17-Jan-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: fix lock ordering for rtnl and bonding_rwsem Fix the handling of rtnl and the bonding_rwsem to always be acquired in a consistent order (rtnl, then bonding_rwsem). The existing code sometimes acquired them in this order, and sometimes in the opposite order, which opens a window for deadlock between ifenslave and sysfs. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
ece95f7f |
|
17-Jan-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Fix up parameter parsing A recent change to add an additional hash policy modified bond_parse_parm, but it now does not correctly match parameters passed in via sysfs. Rewrote bond_parse_parm to handle (a) parameter matches that are substrings of one another and (b) user input with whitespace (e.g., sysfs input often has a trailing newline). Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
3b96c858 |
|
17-Jan-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: release slaves when master removed via sysfs Add a call to bond_release_all in the bonding netdev event handler for the master. This releases the slaves for the case of, e.g., "echo -bond0 > /sys/class/net/bonding_masters", which otherwise will spin forever waiting for references to be released. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
2543331d |
|
17-Jan-2008 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: fix locking during alb failover and slave removal alb_fasten_mac_swap (actually rlb_teach_disabled_mac_on_primary) requries RTNL and no other locks. This could cause dev_set_promiscuity and/or dev_set_mac_address to be called with improper locking. Changed callers to hold only RTNL during calls to alb_fasten_mac_swap or functions calling it. Updated header comments in affected functions to reflect proper reality of locking requirements. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
fdaea7a9 |
|
07-Dec-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Fix race at module unload Fixes a race condition in module unload. Without this change, workqueue events may fire while bonding data structures are partially freed but before bond_close() is invoked by unregister_netdevice(). Update version to 3.2.3. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
6f6652be |
|
07-Dec-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Add new layer2+3 hash for xor/802.3ad modes Add new hash for balance-xor and 802.3ad modes. Originally submitted by "Glenn Griffin" <ggriffin.kernel@gmail.com>; modified by Jay Vosburgh to move setting of hash policy out of line, tweak the documentation update and add version update to 3.2.2. Glenn's original comment follows: Included is a patch for a new xmit_hash_policy for the bonding driver that selects slaves based on MAC and IP information. This is a middle ground between what currently exists in the layer2 only policy and the layer3+4 policy. This policy strives to be fully 802.3ad compliant by transmitting every packet of any particular flow over the same link. As documented the layer3+4 policy is not fully compliant for extreme cases such as ip fragmentation, so this policy is a nice compromise for environments that require full compliance but desire more than the layer2 only policy. Signed-off-by: "Glenn Griffin" <ggriffin.kernel@gmail.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
b63bb739 |
|
07-Dec-2007 |
David Sterba <dsterba@suse.cz> |
bonding: Fix time comparison From: David Sterba <dsterba@suse.cz> Use macros for comparing jiffies. Jiffies' wrap caused missed events and hangs. Module reinsert was needed to make bonding work again. Signed-off-by: David Sterba <dsterba@suse.cz> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
8cbdeec6 |
|
13-Nov-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
[BONDING]: Fix resource use after free Fix bond_destroy and bond_free_all to not reference the struct net_device after calling unregister_netdevice. Bug and offending change reported by Moni Shoua <monis@voltaire.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3a1521b7 |
|
06-Nov-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: don't validate address at device open The standard validate_addr handler refuses to accept the all zeroes address as valid. However, it's common historical practice for the bonding master to be configured up prior to having any slaves, at which time the master will have a MAC address of all zeroes. Resolved by setting the dev->validate_addr to NULL. The master still can't end up with an invalid address, as the set_mac_address function tests for validity. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
a40745f5 |
|
24-Oct-2007 |
Adrian Bunk <bunk@kernel.org> |
bonding/bond_main.c: fix cut'n'paste error This patch fixes a cut'n'paste error in commit 1b76b31693d4a6088dec104ff6a6ead54081a3c2. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
c50b85d0 |
|
24-Oct-2007 |
Adrian Bunk <bunk@kernel.org> |
make bonding/bond_main.c:bond_deinit() static bond_deinit() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
6603a6f2 |
|
17-Oct-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Convert more locks to _bh, acquire rtnl, for new locking Convert more lock acquisitions to _bh flavor to avoid deadlock with workqueue activity and add acquisition of RTNL in appropriate places. Affects ALB mode, as well as core bonding functions and sysfs. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
059fe7a5 |
|
17-Oct-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Convert locks to _bh, rework alb locking for new locking Convert locking-related activity to new & improved system. Convert some lock acquisitions to _bh and rework parts of ALB mode, both to avoid deadlocks with workqueue activity. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
0b0eef66 |
|
17-Oct-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Convert miimon to new locking Convert mii (link state) monitor to acquire correct locks for failover events. In particular, failovers generally require RTNL at a low level (when manipulating device MAC addresses, for example) and no other locks. The high level monitor is responsible for acquiring a known set of locks, RTNL, the bond->lock for read and the slave_lock for write, and the low level failover processing can then release appropriate locks as needed. This patch provides the high level portion. As it is undesirable to acquire RTNL for every monitor pass (which may occur as often as every 10 ms), the miimon has been converted to do conditional locking. A first pass inspects all slaves to determine if any action is required, and if so, a second pass (after acquring RTNL) is done to perform any actions (doing a complete rescan, as the situation may have changed when all locks were released). Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
cf5f9044 |
|
17-Oct-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Convert balance-rr transmit to new locking Change locking in balance-rr transmit processing to use a free running counter to determine which slave to transmit on. Instead, a free-running counter is maintained, and modulo arithmetic used to select a slave for transmit. This removes lock operations from the TX path, and eliminates a deadlock introduced by the conversion to work queues. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
1b76b316 |
|
17-Oct-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
Convert bonding timers to workqueues Convert bonding timers to workqueues. This converts the various monitor functions to run in periodic work queues instead of timers. This patch introduces the framework and convers the calls, but does not resolve various locking issues, and does not stand alone. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
1284cd3a |
|
15-Oct-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: two small fixes for IPoIB support Two small fixes to IPoIB support for bonding: 1- copy header_ops from slave to bonding for IPoIB slaves 2- move release and destroy logic to UNREGISTER from GOING_DOWN notifier to avoid double release Set bonding to version 3.2.1. Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
dd957c57 |
|
09-Oct-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
net/bonding: Optionally allow ethernet slaves to keep own MAC Update the "don't change MAC of slaves" functionality added in previous changes to be a generic option, rather than something tied to IB devices, as it's occasionally useful for regular ethernet devices as well. Adds "fail_over_mac" option (which is automatically enabled for IB slaves), applicable only to active-backup mode. Includes documentation update. Updates bonding driver version to 3.2.0. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
d90a162a |
|
09-Oct-2007 |
Moni Shoua <monis@voltaire.com> |
net/bonding: Destroy bonding master when last slave is gone When bonding enslaves non Ethernet devices it takes pointers to functions in the module that owns the slaves. In this case it becomes unsafe to keep the bonding master registered after last slave was unenslaved because we don't know if the pointers are still valid. Destroying the bond when slave_cnt is zero ensures that these functions be used anymore. Signed-off-by: Moni Shoua <monis at voltaire.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
1053f62c |
|
09-Oct-2007 |
Moni Shoua <monis@voltaire.com> |
net/bonding: Delay sending of gratuitous ARP to avoid failure Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit in dev->state field is on. This improves the chances for the arp packet to be transmitted. Signed-off-by: Moni Shoua <monis at voltaire.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
3158bf7d |
|
09-Oct-2007 |
Moni Shoua <monis@voltaire.com> |
net/bonding: Handlle wrong assumptions that slave is always an Ethernet device bonding sometimes uses Ethernet constants (such as MTU and address length) which are not good when it enslaves non Ethernet devices (such as InfiniBand). Signed-off-by: Moni Shoua <monis at voltaire.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
6b1bf096 |
|
09-Oct-2007 |
Moni Shoua <monis@voltaire.com> |
net/bonding: Enable IP multicast for bonding IPoIB devices Allow to enslave devices when the bonding device is not up. Over the discussion held at the previous post this seemed to be the most clean way to go, where it is not expected to cause instabilities. Normally, the bonding driver is UP before any enslavement takes place. Once a netdevice is UP, the network stack acts to have it join some multicast groups (eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called where for multicast joins taking place after the enslavement another ip_xxx_mc_map() is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND) Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
2ab82852 |
|
09-Oct-2007 |
Moni Shoua <monis@voltaire.com> |
net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address() This patch allows for enslaving netdevices which do not support the set_mac_address() function. In that case the bond mac address is the one of the active slave, where remote peers are notified on the mac address (neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs (this is already done by the bonding code). Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
872254dd |
|
09-Oct-2007 |
Moni Shoua <monis@voltaire.com> |
net/bonding: Enable bonding to enslave non ARPHRD_ETHER This patch changes some of the bond netdevice attributes and functions to be that of the active slave for the case of the enslaved device not being of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(), which are netdevice **type** dependent and hence might be not appropriate for devices of other types. It also enforces mutual exclusion on bonding slaves from dissimilar ether types, as was concluded over the v1 discussion. IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this IPoIB device is bounded to. The QP is a resource created by the IB HW and the GID is an identifier burned into the HCA (i have omitted here some details which are not important for the bonding RFC). Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
d3bb52b0 |
|
22-Aug-2007 |
Al Viro <viro@zeniv.linux.org.uk> |
endianness annotations drivers/net/bonding/ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
0795af57 |
|
03-Oct-2007 |
Joe Perches <joe@perches.com> |
[NET]: Introduce and use print_mac() and DECLARE_MAC_BUF() This is nicer than the MAC_FMT stuff. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
88d3aafd |
|
15-Sep-2007 |
Jeff Garzik <jeff@garzik.org> |
[ETHTOOL] Provide default behaviors for a few ethtool sub-ioctls For the operations get-tx-csum get-sg get-tso get-ufo the default ethtool_op_xxx behavior is fine for all drivers, so we permit op==NULL to imply the default behavior. This provides a more uniform behavior across all drivers, eliminating ethtool(8) "ioctl not supported" errors on older drivers that had not been updated for the latest sub-ioctls. The ethtool_op_xxx() functions are left exported, in case anyone wishes to call them directly from a driver-private implementation -- a not-uncommon case. Should an ethtool_op_xxx() helper remain unused for a while, except by net/core/ethtool.c, we can un-export it at a later date. [ Resolved conflicts with set/get value ethtool patch... -DaveM ] Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
10d024c1 |
|
17-Sep-2007 |
Ralf Baechle <ralf@linux-mips.org> |
[NET]: Nuke SET_MODULE_OWNER macro. It's been a useless no-op for long enough in 2.6 so I figured it's time to remove it. The number of people that could object because they're maintaining unified 2.4 and 2.6 drivers is probably rather small. [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
881d966b |
|
17-Sep-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e9dc8653 |
|
12-Sep-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Make device event notification network namespace safe Every user of the network device notifiers is either a protocol stack or a pseudo device. If a protocol stack that does not have support for multiple network namespaces receives an event for a device that is not in the initial network namespace it quite possibly can get confused and do the wrong thing. To avoid problems until all of the protocol stacks are converted this patch modifies all netdev event handlers to ignore events on devices that are not in the initial network namespace. As the rest of the code is made network namespace aware these checks can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e730c155 |
|
17-Sep-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Make packet reception network namespace safe This patch modifies every packet receive function registered with dev_add_pack() to drop packets if they are not from the initial network namespace. This should ensure that the various network stacks do not receive packets in a anything but the initial network namespace until the code has been converted and is ready for them. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
457c4cbc |
|
11-Sep-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Make /proc/net per network namespace This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7f353bf2 |
|
10-Aug-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[NET]: Share correct feature code between bridging and bonding http://bugzilla.kernel.org/show_bug.cgi?id=8797 shows that the bonding driver may produce bogus combinations of the checksum flags and SG/TSO. For example, if you bond devices with NETIF_F_HW_CSUM and NETIF_F_IP_CSUM you'll end up with a bonding device that has neither flag set. If both have TSO then this produces an illegal combination. The bridge device on the other hand has the correct code to deal with this. In fact, the same code can be used for both. So this patch moves that logic into net/core/dev.c and uses it for both bonding and bridging. In the process I've made small adjustments such as only setting GSO_ROBUST if at least one constituent device supports it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
61a44b9c |
|
31-Jul-2007 |
Matthew Wilcox <willy@infradead.org> |
[NET]: ethtool ops are the only way During the transition to the ethtool_ops way of doing things, we supported calling the device's ->do_ioctl method to allow unconverted drivers to continue working. Those days are long behind us, all in-tree drivers use the ethtool_ops way, and so we no longer need to support this. The bonding driver is the biggest beneficiary of this; it no longer needs to call ioctl() as a fallback if ethtool_ops aren't supported. Also put a proper copyright statement on ethtool.c. Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4ad072c9 |
|
09-Jul-2007 |
Adrian Bunk <bunk@stusta.de> |
bonding/bond_main.c: make 2 functions static Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Chad Tindel <ctindel@users.sourceforge.net> Cc: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
c2edacf8 |
|
09-Jul-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding / ipv6: no addrconf for slaves separately from master At present, when a device is enslaved to bonding, if ipv6 is active then addrconf will be initated on the slave (because it is closed then opened during the enslavement processing). This causes DAD and RS packets to be sent from the slave. These packets in turn can confuse switches that perform ipv6 snooping, causing them to incorrectly update their forwarding tables (if, e.g., the slave being added is an inactve backup that won't be used right away) and direct traffic away from the active slave to a backup slave (where the incoming packets will be dropped). This patch alters the behavior so that addrconf will only run on the master device itself. I believe this is logically correct, as it prevents slaves from having an IPv6 identity independent from the master. This is consistent with the IPv4 behavior for bonding. This is accomplished by (a) having bonding set IFF_SLAVE sooner in the enslavement processing than currently occurs (before open, not after), and (b) having ipv6 addrconf ignore UP and CHANGE events on slave devices. The eql driver also uses the IFF_SLAVE flag. I inspected eql, and I believe this change is reasonable for its usage of IFF_SLAVE, but I did not test it. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
3201e656 |
|
19-Jun-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Fix use after free in unregister path The following patch (based on a patch from Stephen Hemminger <shemminger@linux-foundation.org>) removes use after free conditions in the unregister path for the bonding master. Without this patch, an operation of the form "echo -bond0 > /sys/class/net/bonding_masters" would trigger a NULL pointer dereference in sysfs. I was not able to induce the failure with the non-sysfs code path, but for consistency I updated that code as well. I also did some testing of the bonding /proc file being open while the bond is being deleted, and didn't see any problems there. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
59c51591 |
|
09-May-2007 |
Michael Opdenacker <michael@free-electrons.com> |
Fix occurrences of "the the " Signed-off-by: Michael Opdenacker <michael@free-electrons.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
5a1b5898 |
|
28-Apr-2007 |
Rusty Russell <rusty@rustcorp.com.au> |
[NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats. Herbert Xu conviced me that a new flag was overkill; every driver currently overrides get_stats, so we might as well make the internal one the default. If someone did fail to set get_stats, they would now get all 0 stats instead of "No statistics available". Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c45d286e |
|
28-Mar-2007 |
Rusty Russell <rusty@rustcorp.com.au> |
[NET]: Inline net_device_stats Network drivers which keep stats allocate their own stats structure then write a get_stats() function to return them. It would be nice if this were done by default. 1) Add a new "stats" field to "struct net_device". 2) Add a new feature field to say "this driver uses the internal one" 3) Have a default "get_stats" which returns NULL if that feature not set. 4) Change callers to check result of get_stats call for NULL, not if ->get_stats is set. This should not break backwards compatibility with older drivers, yet allow modern drivers to shed some boilerplate code. Lightly tested: works for a modified lguest network driver. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d0a92be0 |
|
12-Mar-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce arp_hdr(), remove skb->nh.arph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eddc9ec5 |
|
20-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a816c7c7 |
|
28-Feb-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: Improve IGMP join processing In active-backup mode, the current bonding code duplicates IGMP traffic to all slaves, so that switches are up to date in case of a failover from an active to a backup interface. If bonding then fails back to the original active interface, it is likely that the "active slave" switch's IGMP forwarding for the port will be out of date until some event occurs to refresh the switch (e.g., a membership query). This patch alters the behavior of bonding to no longer flood IGMP to all ports, and to issue IGMP JOINs to the newly active port at the time of a failover. This insures that switches are kept up to date for all cases. "GOELLESCH Niels" <niels.goellesch@eurocontrol.int> originally reported this problem, and included a patch. His original patch was modified by Jay Vosburgh to additionally remove the existing IGMP flood behavior, use RCU, streamline code paths, fix trailing white space, and adjust for style. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
e245cb71 |
|
28-Feb-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: only receive ARPs for us The ARP validation code only needs ARPs for the bonding device. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
c4f283b1 |
|
28-Feb-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: fix double dev_add_pack Bonding can erroneously register the same packet_type to receive ARPs (for use by ARP validation): once at device open time, and once via sysfs. Since sysfs can change the validate setting (and thus register or unregister) at any time, a flag is needed to synchronize with device open in order to avoid double registrations, and the simplest place is within the packet_type structure itself. Double unregister is not an issue. Bug reported by Ulrich Oelmann <ulrich.oelmann@web.de>. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
5c15bdec |
|
02-Mar-2007 |
Dan Aloni <da-x@monatomic.org> |
[VLAN]: Avoid a 4-order allocation. This patch splits the vlan_group struct into a multi-allocated struct. On x86_64, the size of the original struct is a little more than 32KB, causing a 4-order allocation, which is prune to problems caused by buddy-system external fragmentation conditions. I couldn't just use vmalloc() because vfree() cannot be called in the softirq context of the RCU callback. Signed-off-by: Dan Aloni <da-x@monatomic.org> Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cd354f1a |
|
14-Feb-2007 |
Tim Schmielau <tim@physik3.uni-rostock.de> |
[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d54b1fdb |
|
12-Feb-2007 |
Arjan van de Ven <arjan@linux.intel.com> |
[PATCH] mark struct file_operations const 5 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
243cb4e5 |
|
06-Feb-2007 |
Joe Jin <lkmaillist@gmail.com> |
[BONDING]: Replace kmalloc() + memset() pairs with the appropriate kzalloc() calls Replace kmalloc() + memset() pairs with the appropriate kzalloc() calls in the bonding driver. Signed-off-by: Joe Jin <lkmaillist@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
09c89279 |
|
19-Jan-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: fix error check in sysfs creation The existing code did not correctly handle failures to create the per-interface sysfs group for bonding. Modified code to notice errors, and correctly unwind. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
e4b91c48 |
|
19-Jan-2007 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: fix device name allocation error The code to select names for the bonding interfaces was, for the non-sysfs creation case, always using a hard-coded set of bond0, bond1, etc, up to max_bonds. This caused conflicts for the second or subsequent loads of the module. Changed the code to obtain device names from dev_alloc_name(). Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
4e140079 |
|
04-Dec-2006 |
Andy Gospodarek <andy@greyhouse.net> |
[PATCH] bonding: incorrect bonding state reported via ioctl This is a small fix-up to finish out the work done by Jay Vosburgh to add carrier-state support for bonding devices. The output in /proc/net/bonding/bondX was correct, but when collecting the same info via an iotcl it could still be incorrect. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Cc: Jeff Garzik <jeff@garzik.org> Cc: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
418e8f3d |
|
17-Nov-2006 |
Laurent Riffard <laurent.riffard@free.fr> |
[PATCH] bonding: fix an oops when slave device does not provide get_stats Bonding driver unconditionnaly dereference get_stats function pointer for each of its slave device. This patch - adds a check for NULL dev->get_stats pointer in bond_get_stats - prints a notice when the bonding device enslave a device without get_stats function. Signed-off-by: Laurent Riffard <laurent.riffard@free.fr> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
0daa2303 |
|
08-Nov-2006 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
[PATCH] bonding: lockdep annotation ============================================= [ INFO: possible recursive locking detected ] 2.6.17-1.2600.fc6 #1 Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
a144ea4b |
|
28-Sep-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[IPV4]: annotate struct in_ifaddr ifa_local, ifa_address, ifa_mask, ifa_broadcast and ifa_anycast are net-endian. Annotated them and variables that are inferred to be net-endian. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a8e447b |
|
22-Sep-2006 |
Jay Vosburgh <fubar@us.ibm.com> |
[PATCH] bonding: Fix primary selection error at enslavement time At enslavement time, the primary slave might not be activated if there is already an active slave and the new slave is the primary. Replaced complicated logic with a call to bond_select_active_slave(), which does the right thing. Fixes http://bugzilla.kernel.org/show_bug.cgi?id=6378 Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
f5b2b966 |
|
22-Sep-2006 |
Jay Vosburgh <fubar@us.ibm.com> |
[PATCH] bonding: Validate probe replies in ARP monitor Add logic to check ARP request / reply packets used for ARP monitor link integrity checking. The current method simply examines the slave device to see if it has sent and received traffic; this can be fooled by extraneous traffic. For example, if multiple hosts running bonding are behind a common switch, the probe traffic from the multiple instances of bonding will update the tx/rx times on each other's slave devices. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
70298705 |
|
22-Sep-2006 |
jamal <hadi@cyberus.ca> |
[PATCH] bonding: Don't release slaves when master is admin down When a bonding netdevice is admin-ed down it loses the slaves attributes (set via ifenslave). This is not consistent with other behavior of netdevices (example a qdisc attached to a netdevice doesnt disappear or an attached IP address etc). The included patch fixes this. Ive tested by ifenslaving, downing the bond, checking /proc and making sure it still has the slaves, up-ing the bond and making sure things continue to work. Jay/Bonding folks if you are ok with it, just ACK it or include it in your tree etc. Otherwise we can discuss. Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
0b680e75 |
|
22-Sep-2006 |
Jay Vosburgh <fubar@us.ibm.com> |
[PATCH] bonding: Add priv_flag to avoid event mishandling Add priv_flag to specifically identify bonding-involved devices. Needed because IFF_MASTER is an unreliable identifier (vlan interfaces above bonding will inherit IFF_MASTER). Misidentification of devices would cause notifier events for other devices to be erroneously processed by bonding, causing various havoc. Bug discovered by Martin Papik <martin.papik@ipsec.info>; this patch is modified from his original. Signed-off-by: Martin Papik <martin.papik@ipsec.info> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
54ef3137 |
|
22-Sep-2006 |
Jay Vosburgh <fubar@us.ibm.com> |
[PATCH] bonding: Handle large hard_header_len The bonding driver fails to adjust its hard_header_len when enslaving interfaces. Whenever an interface with a hard_header_len greater than the ETH_HLEN default is enslaved, the potential for an oops exists, and if the oops happens while responding to an arp request, for example, the system panics. GIANFAR devices may use an extended hard_header for VLAN or hardware checksumming. Enslaving such a device and then transmitting over it causes a kernel panic. Patch modified from submitter's original, but submitter agreed with this patch in private email. Signed-off-by: Mark Huth <mhuth@mvista.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
65509645 |
|
22-Sep-2006 |
Kenzo Iwami <k-iwami@cj.jp.nec.com> |
[PATCH] bonding: Format fix in seq_printf call Though link_failure_count is type unsigned int, this value is outputted to /proc/net/bonding/bondX file using "%d" instead of "%u". The attached patch fixes this problem. Signed-off-by: Kenzo Iwami <k-iwami@cj.jp.nec.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
94dbffd5 |
|
22-Sep-2006 |
Jay Vosburgh <fubar@us.ibm.com> |
[PATCH] bonding: Allow bonding to enslave a 10 Gig adapter Allow channel bonding to enslave a 10 Gig adapter without errors. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
7282d491 |
|
13-Sep-2006 |
Jeff Garzik <jeff@garzik.org> |
drivers/net: const-ify ethtool_ops declarations Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
6ab3d562 |
|
30-Jun-2006 |
Jörn Engel <joern@wohnheim.fh-wedel.de> |
Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
8648b305 |
|
17-Jun-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[NET]: Add NETIF_F_GEN_CSUM and NETIF_F_ALL_CSUM The current stack treats NETIF_F_HW_CSUM and NETIF_F_NO_CSUM identically so we test for them in quite a few places. For the sake of brevity, I'm adding the macro NETIF_F_GEN_CSUM for these two. We also test the disjunct of NETIF_F_IP_CSUM and the other two in various places, for that purpose I've added NETIF_F_ALL_CSUM. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
932ff279 |
|
09-Jun-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[NET]: Add netif_tx_lock Various drivers use xmit_lock internally to synchronise with their transmission routines. They do so without setting xmit_lock_owner. This is fine as long as netpoll is not in use. With netpoll it is possible for deadlocks to occur if xmit_lock_owner isn't set. This is because if a printk occurs while xmit_lock is held and xmit_lock_owner is not set can cause netpoll to attempt to take xmit_lock recursively. While it is possible to resolve this by getting netpoll to use trylock, it is suboptimal because netpoll's sole objective is to maximise the chance of getting the printk out on the wire. So delaying or dropping the message is to be avoided as much as possible. So the only alternative is to always set xmit_lock_owner. The following patch does this by introducing the netif_tx_lock family of functions that take care of setting/unsetting xmit_lock_owner. I renamed xmit_lock to _xmit_lock to indicate that it should not be used directly. I didn't provide irq versions of the netif_tx_lock functions since xmit_lock is meant to be a BH-disabling lock. This is pretty much a straight text substitution except for a small bug fix in winbond. It currently uses netif_stop_queue/spin_unlock_wait to stop transmission. This is unsafe as an IRQ can potentially wake up the queue. So it is safer to use netif_tx_disable. The hamradio bits used spin_lock_irq but it is unnecessary as xmit_lock must never be taken in an IRQ handler. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ff59c456 |
|
27-Mar-2006 |
Jay Vosburgh <fubar@us.ibm.com> |
[PATCH] bonding: support carrier state for master Add support for the bonding master to specify its carrier state based upon the state of the slaves. For 802.3ad, the bond is up if there is an active, parterned aggregator. For other modes, the bond is up if any slaves are up. Updates driver version to 3.0.3. Based on a patch by jamal <hadi@cyberus.ca>. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
e041c683 |
|
27-Mar-2006 |
Alan Stern <stern@rowland.harvard.edu> |
[PATCH] Notifier chain update: API changes The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f71e1309 |
|
03-Mar-2006 |
Arjan van de Ven <arjan@infradead.org> |
Massive net driver const-ification.
|
#
8f903c70 |
|
21-Feb-2006 |
Jay Vosburgh <fubar@us.ibm.com> |
[PATCH] bonding: suppress duplicate packets Originally submitted by Kenzo Iwami; his original description is: The current bonding driver receives duplicate packets when broadcast/ multicast packets are sent by other devices or packets are flooded by the switch. In this patch, new flags are added in priv_flags of net_device structure to let the bonding driver discard duplicate packets in dev.c:skb_bond(). Modified by Jay Vosburgh to change a define name, update some comments, rearrange the new skb_bond() for clarity, clear all bonding priv_flags on slave release, and update the driver version. Signed-off-by: Kenzo Iwami <k-iwami@cj.jp.nec.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
f5e2a7b2 |
|
07-Feb-2006 |
Jay Vosburgh <fubar@us.ibm.com> |
[PATCH] bonding: fix a locking bug in bond_release bond_release returns EINVAL without releasing the bond lock if the slave device is not being bonded by the bond. The following patch ensures that the lock is released in this case. Signed-off-by: Stephen J. Bevan <stephen@dino.dnsalias.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
|
#
a0de3adf |
|
30-Jan-2006 |
Jay Vosburgh <fubar@us.ibm.com> |
[PATCH] bonding: allow bond to use TSO if slaves support it Add NETIF_F_TSO (NETIF_F_UFO) to BOND_INTERSECT_FEATURES so that it can be used by a bonding device iff all its slave devices support TSO (UFO). Signed-off-by: Arthur Kepner <akepner@sgi.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
|
#
6a986ce4 |
|
20-Jan-2006 |
Eric Sesterhenn <snakebyte@gmx.de> |
[PATCH] bonding: fix ->get_settings error checking Since get_settings() returns a signed int and it gets checked for < 0 to catch an error, res should be a signed int too. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
|
#
2e06cb58 |
|
28-Nov-2005 |
Jeff Garzik <jgarzik@pobox.com> |
[bonding] Remove superfluous changelog. No need to record this information in source code, its all in the git repository, and kernel archives.
|
#
691b73b1 |
|
09-Nov-2005 |
Mitch Williams <mitch.a.williams@intel.com> |
[PATCH] bonding: comments and changelog Bonding source files still have changelogs in the comments. This, then, is an update to that changelog. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
e944ef79 |
|
09-Nov-2005 |
Mitch Williams <mitch.a.williams@intel.com> |
[PATCH] bonding: spelling and whitespace corrections Minor spelling and whitespace corrections. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
b76cdba9 |
|
09-Nov-2005 |
Mitch Williams <mitch.a.williams@intel.com> |
[PATCH] bonding: add sysfs functionality to bonding (large) This large patch adds sysfs functionality to the channel bonding module. Bonds can be added, removed, and reconfigured at runtime without having to reload the module. Multiple bonds with different configurations are easily configured, and ifenslave is no longer required to configure bonds. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
4756b02f |
|
09-Nov-2005 |
Mitch Williams <mitch.a.williams@intel.com> |
[PATCH] bonding: add ARP entries to /proc Make the /proc files show which ARP targets are in use by each bond. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
6b780567 |
|
09-Nov-2005 |
Mitch Williams <mitch.a.williams@intel.com> |
[PATCH] bonding: Allow ARP target table to have empty entries With the sysfs interface, the user can remove entries from the ARP table at runtime. The ARP monitor code now allows for empty entries in the table. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
3c535952 |
|
09-Nov-2005 |
Mitch Williams <mitch.a.williams@intel.com> |
[PATCH] bonding: make bond_init not __init The sysfs interface can create bonds at runtime, and __init code goes away after module init. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
dfe60397 |
|
09-Nov-2005 |
Mitch Williams <mitch.a.williams@intel.com> |
[PATCH] bonding: move bond creation into separate function The sysfs interface can create bonds at runtime, so we need a separate function to do this, instead of just doing it in the module init code. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
a77b5325 |
|
09-Nov-2005 |
Mitch Williams <mitch.a.williams@intel.com> |
[PATCH] bonding: make functions not static The sysfs code needs access these functions, so make them not static, and move the protos to the header file. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
12479f9a |
|
09-Nov-2005 |
Mitch Williams <mitch.a.williams@intel.com> |
[PATCH] bonding: expose some structs The sysfs code needs to know what these structs look like, so make them not static, and move the definition to the header. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
0f418b2a |
|
09-Nov-2005 |
Mitch Williams <mitch.a.williams@intel.com> |
[PATCH] bonding: get slave name from actual slave instead of param list Take the primary slave name shown in /proc from the actual slave dev instead of from the command-line parameter, which won't be present if the bond is created via sysfs. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
c61b75ad |
|
09-Nov-2005 |
Mitch Williams <mitch.a.williams@intel.com> |
[PATCH] bonding: Add transmit policy to /proc Adds information about the recently-added transmit policy setting to each bond's /proc file. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
2ac47660 |
|
09-Nov-2005 |
Mitch Williams <mitch.a.williams@intel.com> |
[PATCH] bonding: expand module param descriptions Expand and correct the parameter descriptions shown by modinfo. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
4e0952c7 |
|
09-Nov-2005 |
Mitch Williams <mitch.a.williams@intel.com> |
[PATCH] bonding: add bond name to all error messages Add the bond name to all error messages so we can tell which one is complaining. Also reformats some error messages to be more consistent. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
8e3babcd |
|
04-Nov-2005 |
Jay Vosburgh <fubar@us.ibm.com> |
[PATCH] bonding: fix feature consolidation This should resolve http://bugzilla.kernel.org/show_bug.cgi?id=5519 The current feature computation loses bits that it doesn't know about, resulting in an inability to add VLANs and possibly other havoc. Rewrote function to preserve bits it doesn't know about, remove an unneeded state variable, and simplify the code. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
#
df49898a |
|
18-Oct-2005 |
John W. Linville <linville@tuxdriver.com> |
[PATCH] bonding: cleanup comment for mode 1 IGMP xmit hack Expand comment explaining MAC address selection for replicated IGMP frames transmitted in bonding mode 1 (active-backup). Also, a small whitespace cleanup. Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
|
#
dd0fc66f |
|
07-Oct-2005 |
Al Viro <viro@ftp.linux.org.uk> |
[PATCH] gfp flags annotations - part 1 - added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
de54f390 |
|
04-Oct-2005 |
Randy Dunlap <rdunlap@infradead.org> |
[BONDING]: fix sparse gfp nocast warnings Fix implicit nocast warnings in bonding code: drivers/net/bonding/bond_main.c:1302:49: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
075897ce |
|
28-Sep-2005 |
John W. Linville <linville@tuxdriver.com> |
[PATCH] bonding: replicate IGMP traffic in activebackup mode Replicate IGMP frames across all slaves in activebackup mode. This ensures fail-over is rapid for multicast traffic as well. Otherwise, multicast traffic will be lost until the next IGMP membership report poll timeout. This is conceptually similar to the treatment of IGMP traffic in bond_alb_xmit. In that case, IGMP traffic transmitted on any slave is re-routed to the active slave in order to ensure that multicast traffic continues to be directed to the active receiver. Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
|
#
217df670 |
|
26-Sep-2005 |
Jay Vosburgh <fubar@us.ibm.com> |
[PATCH] fix bonding crash, remove old ABI support David S. Miller <davem@davemloft.net> wrote: >I think removing support for older ifenslave binaries is >the least painful solution to this problem. This patch removes backwards compatibility for old ifenslave binaries (ifenslave prior to verison 1.0.0). I did not similarly modify ifenslave itself; with sysfs on the horizon, I don't see that as being worthwhile. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
|
#
e5ed6399 |
|
03-Oct-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl The following patch renames __in_dev_get() to __in_dev_get_rtnl() and introduces __in_dev_get_rcu() to cover the second case. 1) RCU with refcnt should use in_dev_get(). 2) RCU without refcnt should use __in_dev_get_rcu(). 3) All others must hold RTNL and use __in_dev_get_rtnl(). There is one exception in net/ipv4/route.c which is in fact a pre-existing race condition. I've marked it as such so that we remember to fix it. This patch is based on suggestions and prior work by Suzanne Wood and Paul McKenney. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
552709d5 |
|
21-Sep-2005 |
nsxfreddy@gmail.com <nsxfreddy@gmail.com> |
[PATCH] bonding: Fix link monitor capability check (was skge: set mac address oops with bonding) Fix bond_enslave link monitoring warning to check use_carrier status and ethtool_ops in addition to do_ioctl. This version checks ethtool_ops as well as do_ioctl, and also uses the per-bond params.use_carrier instead of the global use_carrier. Signed-off-by: Jason R. Martin <nsxfreddy@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
|
#
40abc270 |
|
18-Sep-2005 |
Florin Malita <fmalita@gmail.com> |
[BOND]: Fix bond_init() error path handling. From: Florin Malita <fmalita@gmail.com> bond_init() is not releasing rtnl_sem after register_netdevice() and before calling unregister_netdevice() (from bond_free_all()) in the exception path. As the device registration is not completed (dev->reg_state == NETREG_REGISTERING), the call to unregister_netdevice() triggers BUG_ON(dev->reg_state != NETREG_REGISTERED). Signed-off-by: Florin Malita <fmalita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ed4b9f80 |
|
14-Sep-2005 |
Jay Vosburgh <fubar@us.ibm.com> |
[PATCH] bonding: plug reference count leak Bonding leaks route structures when the ARP monitor is configured to send probes over VLANs. Originally reported by Ian Abel <ian.abel@mxtelecom.com>; his original fix was modified by Jay Vosburgh to correct coding style and to close a leak it missed. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
|
#
8531c5ff |
|
22-Aug-2005 |
Arthur Kepner <akepner@sgi.com> |
[PATCH] bonding: inherit zero-copy flags of slaves This change allows a bonding device to inherit the "zero-copy" features of its slave devices. It was inspired by a couple of previous postings on this topic: http://marc.theaimsgroup.com/?l=bonding-devel&m=111924607327794&w=2 http://marc.theaimsgroup.com/?l=bonding-devel&m=111925242706297&w=2 and it's largely a combination of the patches that appear in those emails. Signed-off-by: Arthur Kepner <akepner@sgi.com>
|
#
169a3e66 |
|
26-Jun-2005 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: xor/802.3ad improved slave hash Add support for alternate slave selection algorithms to bonding balance-xor and 802.3ad modes. Default mode (what we have now: xor of MAC addresses) is "layer2", new choice is "layer3+4", using IP and port information for hashing to select peer. Originally submitted by Jason Gabler for balance-xor mode; modified by Jay Vosburgh to additionally support 802.3ad mode. Jason's original comment is as follows: The attached patch to the Linux Etherchannel Bonding driver modifies the driver's "balance-xor" mode as follows: - alternate hashing policy support for mode 2 * Added kernel parameter "xmit_policy" to allow the specification of different hashing policies for mode 2. The original mode 2 policy is the default, now found in xmit_hash_policy_layer2(). * Added xmit_hash_policy_layer34() This patch was inspired by hashing policies implemented by Cisco, Foundry and IBM, which are explained in Foundry documentation found at: http://www.foundrynet.com/services/documentation/sribcg/Trunking.html#112750 Signed-off-by: Jason Gabler <jygabler@lbl.gov> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
|
#
c3ade5ca |
|
26-Jun-2005 |
Jay Vosburgh <fubar@us.ibm.com> |
bonding: gratuitous ARP Add support for generating gratuitous ARPs in bonding active-backup mode when failovers occur. Includes support for VLAN tagging the ARPs as needed. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
|
#
2f872f04 |
|
26-May-2005 |
Jay Vosburgh <fubar@us.ibm.com> |
[BONDING]: bonding using arp_ip_target may stay down with active path Correcting the list traversal makes the problem go away. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|