#
64854534 |
|
11-Jan-2024 |
Christian Heusel <christian@heusel.eu> |
RDMA/ipoib: Print symbolic error name instead of error code Utilize the %pe print specifier to get the symbolic error name as a string (i.e "-ENOMEM") in the log message instead of the error code to increase its readability. This change was suggested in https://lore.kernel.org/all/92972476-0b1f-4d0a-9951-af3fc8bc6e65@suswa.mountain/ Signed-off-by: Christian Heusel <christian@heusel.eu> Link: https://lore.kernel.org/r/20240111141311.987098-1-christian@heusel.eu Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
4f973e21 |
|
12-Dec-2023 |
Daniel Vacek <neelx@redhat.com> |
IB/ipoib: Fix mcast list locking Releasing the `priv->lock` while iterating the `priv->multicast_list` in `ipoib_mcast_join_task()` opens a window for `ipoib_mcast_dev_flush()` to remove the items while in the middle of iteration. If the mcast is removed while the lock was dropped, the for loop spins forever resulting in a hard lockup (as was reported on RHEL 4.18.0-372.75.1.el8_6 kernel): Task A (kworker/u72:2 below) | Task B (kworker/u72:0 below) -----------------------------------+----------------------------------- ipoib_mcast_join_task(work) | ipoib_ib_dev_flush_light(work) spin_lock_irq(&priv->lock) | __ipoib_ib_dev_flush(priv, ...) list_for_each_entry(mcast, | ipoib_mcast_dev_flush(dev = priv->dev) &priv->multicast_list, list) | ipoib_mcast_join(dev, mcast) | spin_unlock_irq(&priv->lock) | | spin_lock_irqsave(&priv->lock, flags) | list_for_each_entry_safe(mcast, tmcast, | &priv->multicast_list, list) | list_del(&mcast->list); | list_add_tail(&mcast->list, &remove_list) | spin_unlock_irqrestore(&priv->lock, flags) spin_lock_irq(&priv->lock) | | ipoib_mcast_remove_list(&remove_list) (Here, `mcast` is no longer on the | list_for_each_entry_safe(mcast, tmcast, `priv->multicast_list` and we keep | remove_list, list) spinning on the `remove_list` of | >>> wait_for_completion(&mcast->done) the other thread which is blocked | and the list is still valid on | it's stack.) Fix this by keeping the lock held and changing to GFP_ATOMIC to prevent eventual sleeps. Unfortunately we could not reproduce the lockup and confirm this fix but based on the code review I think this fix should address such lockups. crash> bc 31 PID: 747 TASK: ff1c6a1a007e8000 CPU: 31 COMMAND: "kworker/u72:2" -- [exception RIP: ipoib_mcast_join_task+0x1b1] RIP: ffffffffc0944ac1 RSP: ff646f199a8c7e00 RFLAGS: 00000002 RAX: 0000000000000000 RBX: ff1c6a1a04dc82f8 RCX: 0000000000000000 work (&priv->mcast_task{,.work}) RDX: ff1c6a192d60ac68 RSI: 0000000000000286 RDI: ff1c6a1a04dc8000 &mcast->list RBP: ff646f199a8c7e90 R8: ff1c699980019420 R9: ff1c6a1920c9a000 R10: ff646f199a8c7e00 R11: ff1c6a191a7d9800 R12: ff1c6a192d60ac00 mcast R13: ff1c6a1d82200000 R14: ff1c6a1a04dc8000 R15: ff1c6a1a04dc82d8 dev priv (&priv->lock) &priv->multicast_list (aka head) ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ff646f199a8c7e00] ipoib_mcast_join_task+0x1b1 at ffffffffc0944ac1 [ib_ipoib] #6 [ff646f199a8c7e98] process_one_work+0x1a7 at ffffffff9bf10967 crash> rx ff646f199a8c7e68 ff646f199a8c7e68: ff1c6a1a04dc82f8 <<< work = &priv->mcast_task.work crash> list -hO ipoib_dev_priv.multicast_list ff1c6a1a04dc8000 (empty) crash> ipoib_dev_priv.mcast_task.work.func,mcast_mutex.owner.counter ff1c6a1a04dc8000 mcast_task.work.func = 0xffffffffc0944910 <ipoib_mcast_join_task>, mcast_mutex.owner.counter = 0xff1c69998efec000 crash> b 8 PID: 8 TASK: ff1c69998efec000 CPU: 33 COMMAND: "kworker/u72:0" -- #3 [ff646f1980153d50] wait_for_completion+0x96 at ffffffff9c7d7646 #4 [ff646f1980153d90] ipoib_mcast_remove_list+0x56 at ffffffffc0944dc6 [ib_ipoib] #5 [ff646f1980153de8] ipoib_mcast_dev_flush+0x1a7 at ffffffffc09455a7 [ib_ipoib] #6 [ff646f1980153e58] __ipoib_ib_dev_flush+0x1a4 at ffffffffc09431a4 [ib_ipoib] #7 [ff646f1980153e98] process_one_work+0x1a7 at ffffffff9bf10967 crash> rx ff646f1980153e68 ff646f1980153e68: ff1c6a1a04dc83f0 <<< work = &priv->flush_light crash> ipoib_dev_priv.flush_light.func,broadcast ff1c6a1a04dc8000 flush_light.func = 0xffffffffc0943820 <ipoib_ib_dev_flush_light>, broadcast = 0x0, The mcast(s) on the `remove_list` (the remaining part of the ex `priv->multicast_list`): crash> list -s ipoib_mcast.done.done ipoib_mcast.list -H ff646f1980153e10 | paste - - ff1c6a192bd0c200 done.done = 0x0, ff1c6a192d60ac00 done.done = 0x0, Reported-by: Yuya Fujita-bishamonten <fj-lsoft-rh-driver@dl.jp.fujitsu.com> Signed-off-by: Daniel Vacek <neelx@redhat.com> Link: https://lore.kernel.org/all/20231212080746.1528802-1-neelx@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
753fff78 |
|
21-Nov-2023 |
Jack Wang <jinpu.wang@ionos.com> |
RDMA/IPoIB: Fix error code return in ipoib_mcast_join Return the error code in case of ib_sa_join_multicast fail. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20231121130316.126364-2-jinpu.wang@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
633d6102 |
|
28-Jan-2021 |
Christoph Lameter <cl@linux.com> |
RDMA/ipoib: Remove racy Subnet Manager sendonly join checks When a system receives a REREG event from the SM, then the SM information in the kernel is marked as invalid and a request is sent to the SM to update the information. The SM information is invalid in that time period. However, receiving a REREG also occurs simultaneously in user space applications that are now trying to rejoin the multicast groups. Some of those may be sendonly multicast groups which are then failing. If the SM information is invalid then ib_sa_sendonly_fullmem_support() returns false. That is wrong because it just means that we do not know yet if the potentially new SM supports sendonly joins. Sendonly join was introduced in 2015 and all the Subnet managers have supported it ever since. So there is no point in checking if a subnet manager supports it. Should an old opensm get a request for a sendonly join then the request will fail. The code that is removed here accomodated that situation and fell back to a full join. Falling back to a full join is problematic in itself. The reason to use the sendonly join was to reduce the traffic on the Infiniband fabric otherwise one could have just stayed with the regular join. So this patch may cause users of very old opensms to discover that lots of traffic needlessly crosses their IB fabrics. Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2101281845160.13303@www.lameter.com Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
90eef9f7 |
|
14-Dec-2020 |
Zheng Yongjun <zhengyongjun3@huawei.com> |
RDMA: Convert comma to semicolon Replace a comma between expression statements by a semicolon. Link: https://lore.kernel.org/r/20201214134118.4349-1-zhengyongjun3@huawei.com Link: https://lore.kernel.org/r/20201214134146.4456-1-zhengyongjun3@huawei.com Link: https://lore.kernel.org/r/20201214134218.4510-1-zhengyongjun3@huawei.com Link: https://lore.kernel.org/r/20201214134243.4563-1-zhengyongjun3@huawei.com Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
90cdff90 |
|
21-Jun-2020 |
Kamal Heib <kamalheib1@gmail.com> |
RDMA/ipoib: Return void from ipoib_mcast_stop_thread() The return value from ipoib_mcast_stop_thread() is always 0 - change it to be void. Link: https://lore.kernel.org/r/20200622092256.6931-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
ebd6e96b |
|
25-May-2020 |
Kamal Heib <kamalheib1@gmail.com> |
RDMA/ipoib: Remove can_sleep parameter from iboib_mcast_alloc can_sleep is always 0 when iboib_mcast_alloc() is called, so remove it and use GFP_ATOMIC instead of GFP_KERNEL. Link: https://lore.kernel.org/r/20200525130305.171509-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
b7e159eb |
|
10-May-2020 |
Gary Leshner <Gary.S.Leshner@intel.com> |
IB/{hfi1, ipoib, rdma}: Broadcast ping sent packets which exceeded mtu size When in connected mode ipoib sent broadcast pings which exceeded the mtu size for broadcast addresses. Add an mtu attribute to the rdma_netdev structure which ipoib sets to its mcast mtu size. The RDMA netdev uses this value to determine if the skb length is too long for the mtu specified and if it is, drops the packet and logs an error about the errant packet. Link: https://lore.kernel.org/r/20200511160655.173205.14546.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
6d72344c |
|
10-May-2020 |
Kaike Wan <kaike.wan@intel.com> |
IB/ipoib: Increase ipoib Datagram mode MTU's upper limit Currently the ipoib UD mtu is restricted to 4K bytes. Remove this limitation so that the IPOIB module can potentially use an MTU (in UD mode) that is bounded by the MTU of the underlying device. A field is added to the ib_port_attr structure to indicate the maximum physical MTU the underlying device supports. Link: https://lore.kernel.org/r/20200511160618.173205.23053.stgit@awfm-01.aw.intel.com Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
75da9606 |
|
29-Jul-2018 |
Denis Drozdov <denisd@mellanox.com> |
IB/IPoIB: Set ah valid flag in multicast send flow The change of ipoib_ah data structure with adding "valid" flag and checks of ah->valid in ipoib_start_xmit affected multicast packet flow. Since the multicast flow doesn't invoke path_rec_start, "ah->valid" flag remains unset, so that ipoib_start_xmit end up with neigh_refresh_path instead of sending the packet using neigh. "ah->valid" has to be set in multicast send flow. As a result IPoIB starts sending packets via neigh immediately and eliminates 60sec delay of neigh keep alive interval. The typical example of this issue are two sequential arpings: arping 11.134.208.9 -> got response (mcast_send) arping 11.134.208.9 -> no response (ah->valid = 0) Fixes: fa9391dbad4b ("RDMA/ipoib: Update paths on CLIENT_REREG/SM_CHANGE events") Signed-off-by: Denis Drozdov <denisd@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
b1b63970 |
|
04-Jul-2018 |
Kamal Heib <kamalheib1@gmail.com> |
RDMA/ipoib: Fix use of sizeof() Make sure to use sizeof(...) instead of sizeof ... which is more preferred. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
112f5c81 |
|
16-May-2018 |
Sebastian Andrzej Siewior <bigeasy@linutronix.de> |
IB/ipoib: replace local_irq_disable() with proper locking In ipoib_mcast_restart_task() the netif_addr_lock() is invoked prior local_irq_save(). netif_addr_lock() should not be invoked in interrupt disabled section, only in BH disabled sections. The priv->lock is always acquired with disabled interrupts. The only place where netif_addr_lock() and priv->lock nest ist ipoib_mcast_restart_task(). Drop the local_irq_save() and acquire priv->lock with spin_lock_irq() inside the netif_addr locked section. It's safe to do so because the caller is either a worker function or __ipoib_ib_dev_flush() which are both calling with interrupts enabled (and since BH is enabled here, too so netif_addr_lock_bh() needs to be used). Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
16ba3def |
|
31-Dec-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Fix race condition in neigh creation When using enhanced mode for IPoIB, two threads may execute xmit in parallel to two different TX queues while the target is the same. In this case, both of them will add the same neighbor to the path's neigh link list and we might see the following message: list_add double add: new=ffff88024767a348, prev=ffff88024767a348... WARNING: lib/list_debug.c:31__list_add_valid+0x4e/0x70 ipoib_start_xmit+0x477/0x680 [ib_ipoib] dev_hard_start_xmit+0xb9/0x3e0 sch_direct_xmit+0xf9/0x250 __qdisc_run+0x176/0x5d0 __dev_queue_xmit+0x1f5/0xb10 __dev_queue_xmit+0x55/0xb10 Analysis: Two SKB are scheduled to be transmitted from two cores. In ipoib_start_xmit, both gets NULL when calling ipoib_neigh_get. Two calls to neigh_add_path are made. One thread takes the spin-lock and calls ipoib_neigh_alloc which creates the neigh structure, then (after the __path_find) the neigh is added to the path's neigh link list. When the second thread enters the critical section it also calls ipoib_neigh_alloc but in this case it gets the already allocated ipoib_neigh structure, which is already linked to the path's neigh link list and adds it again to the list. Which beside of triggering the list, it creates a loop in the linked list. This loop leads to endless loop inside path_rec_completion. Solution: Check list_empty(&neigh->list) before adding to the list. Add a similar fix in "ipoib_multicast.c::ipoib_mcast_send" Fixes: b63b70d87741 ('IPoIB: Use a private hash table for path lookup in xmit path') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
a08e1120 |
|
12-Jul-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Make sure no in-flight joins while leaving that mcast While cleaning neighs and there is a send-only mcast neigh, the driver should wait to finish its join process before trying to remove it. Without this patch, we will see messages like: "ipoib_mcast_leave on an in-flight join" and unexpected results in the join_complete. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
6bdc8de2 |
|
12-Jul-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Use cancel_delayed_work_sync when needed The work mcast_task can re-queue itself, so instead of doing cancel && flush_workqueue, that still can leave a queued task on the air, use cancel_delayed_work_sync. Also, no need to use lock over the cancel, the original lock was due to bit assignment setting (IPOIB_MCAST_RUN) that is not in use anymore. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
edf3f301 |
|
10-Jul-2017 |
Feras Daoud <ferasda@mellanox.com> |
IB/ipoib: Fix race between light events and interface restart A potential race between light_event and interface restart may attach multicast group to an already attached QP. Scenario: light_event flow goes through ipoib_mcast_dev_flush function, if a context switch occurs before calling ipoib_mcast_remove_list, then we may face a situation where the broadcast of the priv is null and the corresponding QP is not detached yet. If an "interface restart" runs during the previous context switch, the following scenario occurs: When the device goes up, ipoib_ib_dev_up function will be called, it will send a new registration request to the broadcast group and then attach the group to the QP that was not detached before. IPOIB_FLUSH_LIGHT INTERFACE RESTART __ipoib_ib_dev_flush | | | | | | | ipoib_mcast_dev_flush | Move mcast list and broadcast to remove_list | | | | | Context Switch--> | | ipoib_ib_dev_down | | | | | ipoib_ib_dev_up | | | | | ipoib_mcast_join_task | allocate new broadcast | | | | | Attach QP to multicast group | | | | | <--Context Switch ipoib_mcast_leave Detach QP from multicast group Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
44c58487 |
|
29-Apr-2017 |
Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> |
IB/core: Define 'ib' and 'roce' rdma_ah_attr types rdma_ah_attr can now be either ib or roce allowing core components to use one type or the other and also to define attributes unique to a specific type. struct ib_ah is also initialized with the type when its first created. This ensures that calls such as modify_ah dont modify the type of the address handle attribute. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
d8966fcd |
|
29-Apr-2017 |
Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> |
IB/core: Use rdma_ah_attr accessor functions Modify core and driver components to use accessor functions introduced to access individual fields of rdma_ah_attr Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
90898850 |
|
29-Apr-2017 |
Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> |
IB/core: Rename struct ib_ah_attr to rdma_ah_attr This patch simply renames struct ib_ah_attr to rdma_ah_attr as these fields specify attributes that are not necessarily specific to IB. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
cfd51935 |
|
29-Apr-2017 |
Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> |
IB/IPoIB: Remove 'else' when the 'if' has a return. This patch fixes a checkpatch issue related to not having to use an 'else' if the 'if' path returns from the function. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ee1c60b1 |
|
20-Mar-2017 |
Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> |
IB/SA: Modify SA to implicitly cache Class Port info SA will query and cache class port info as part of its initialization. SA will also invalidate and refresh the cache based on specific events. Callers such as IPoIB and CM can query the SA to get the classportinfo information. Apart from making the caller code much simpler, this change puts the onus on the SA to query and maintain classportinfo much like how it maitains the address handle to the SM. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
3e31a490 |
|
19-Mar-2017 |
Feras Daoud <ferasda@mellanox.com> |
IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow Before calling ipoib_stop, rtnl_lock should be taken, then the flow clears the IPOIB_FLAG_ADMIN_UP and IPOIB_FLAG_OPER_UP flags, and waits for mcast completion if IPOIB_MCAST_FLAG_BUSY is set. On the other hand, the flow of multicast join task initializes a mcast completion, sets the IPOIB_MCAST_FLAG_BUSY and calls ipoib_mcast_join. If IPOIB_FLAG_OPER_UP flag is not set, this call returns EINVAL without setting the mcast completion and leads to a deadlock. ipoib_stop | | | clear_bit(IPOIB_FLAG_ADMIN_UP) | | | Context Switch | | ipoib_mcast_join_task | | | spin_lock_irq(lock) | | | init_completion(mcast) | | | set_bit(IPOIB_MCAST_FLAG_BUSY) | | | Context Switch | | clear_bit(IPOIB_FLAG_OPER_UP) | | | spin_lock_irqsave(lock) | | | Context Switch | | ipoib_mcast_join | return (-EINVAL) | | | spin_unlock_irq(lock) | | | Context Switch | | ipoib_mcast_dev_flush | wait_for_completion(mcast) | ipoib_stop will wait for mcast completion for ever, and will not release the rtnl_lock. As a result panic occurs with the following trace: [13441.639268] Call Trace: [13441.640150] [<ffffffff8168b579>] schedule+0x29/0x70 [13441.641038] [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0 [13441.641914] [<ffffffff810bc017>] ? complete+0x47/0x50 [13441.642765] [<ffffffff810a690d>] ? flush_workqueue_prep_pwqs+0x16d/0x200 [13441.643580] [<ffffffff8168b956>] wait_for_completion+0x116/0x170 [13441.644434] [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20 [13441.645293] [<ffffffffa05af170>] ipoib_mcast_dev_flush+0x150/0x190 [ib_ipoib] [13441.646159] [<ffffffffa05ac967>] ipoib_ib_dev_down+0x37/0x60 [ib_ipoib] [13441.647013] [<ffffffffa05a4805>] ipoib_stop+0x75/0x150 [ib_ipoib] Fixes: 08bc327629cb ("IB/ipoib: fix for rare multicast join race condition") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
cd565b4b |
|
10-Apr-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/IPoIB: Support acceleration options callbacks IPoIB driver now uses the new set of callback functions. If the hardware provider supports the new ipoib_options implementation, the driver uses the callbacks in its data path flows, otherwise it uses the driver default implementation for all data flows in its code. The default implementation wasn't change and it is exactly as it was before introduction of acceleration support. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
c1048aff |
|
10-Apr-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/IPoIB: Use defined function for netdev_priv function Make ipoib_priv point to netdev_priv where the code calls netdev_priv. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
5c37077f |
|
18-Jan-2017 |
Zhu Yanjun <yanjun.zhu@oracle.com> |
IB/ipoib: Remove the unnecessary error check The function ipoib_mcast_start_thread/ipoib_ib_dev_up always return zero. As such, in the function ipoib_open, err_stop will never be reached. So remove this err_stop and change the return type of the function ipoib_mcast_start_thread/ipoib_ib_dev_up to void. Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
d32b9a81 |
|
28-Dec-2016 |
Feras Daoud <ferasda@mellanox.com> |
IB/ipoib: Add detailed error message to dev_queue_xmit call Add a detailed return code to dev_queue_xmit function when calling to requeue packet via __skb_dequeue. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
11b642b8 |
|
21-Nov-2016 |
Bart Van Assche <bvanassche@acm.org> |
IPoIB: Avoid reading an uninitialized member variable This patch avoids that Coverity reports the following: Using uninitialized value port_attr.state when calling printk Fixes: commit 94232d9ce817 ("IPoIB: Start multicast join process only on active ports") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: <stable@vger.kernel.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
fc791b63 |
|
13-Oct-2016 |
Paolo Abeni <pabeni@redhat.com> |
IB/ipoib: move back IB LL address into the hard header After the commit 9207f9d45b0a ("net: preserve IP control block during GSO segmentation"), the GSO CB and the IPoIB CB conflict. That destroy the IPoIB address information cached there, causing a severe performance regression, as better described here: http://marc.info/?l=linux-kernel&m=146787279825501&w=2 This change moves the data cached by the IPoIB driver from the skb control lock into the IPoIB hard header, as done before the commit 936d7de3d736 ("IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses"). In order to avoid GRO issue, on packet reception, the IPoIB driver stash into the skb a dummy pseudo header, so that the received packets have actually a hard header matching the declared length. To avoid changing the connected mode maximum mtu, the allocated head buffer size is increased by the pseudo header length. After this commit, IPoIB performances are back to pre-regression value. v2 -> v3: rebased v1 -> v2: avoid changing the max mtu, increasing the head buf size Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b29953b |
|
04-Jun-2016 |
Mark Bloch <markb@mellanox.com> |
IB/IPoIB: Disable bottom half when dealing with device address Align locking usage when touching device address with rest of the kernel. Lock the bottom half when doing so using netif_addr_lock_bh. This also solves the following case as reported by lockdep: CPU0 CPU1 ---- ---- lock(_xmit_INFINIBAND); local_irq_disable(); lock(&(&mc->mca_lock)->rlock); lock(_xmit_INFINIBAND); <Interrupt> lock(&(&mc->mca_lock)->rlock); *** DEADLOCK *** Fixes: 492a7e67ff83 ("IB/IPoIB: Allow setting the device address") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
492a7e67 |
|
18-May-2016 |
Mark Bloch <markb@mellanox.com> |
IB/IPoIB: Allow setting the device address In IB networks, and specifically in IPoIB/rdmacm traffic, the device address of an IPoIB interface is used as a means to exchange information between nodes needed for communication. Currently an IPoIB interface will always be created with a device address based on its node GUID without a way to change that. This change adds the ability to set the device address of an IPoIB interface by value. We use the set mac address ndo to do that. The flow should be broken down to two: 1) The GID value is already in the GID table, in this case the interface will be able to set carrier up. 2) The GID value is not yet in the GID table, in this case the interface won't try to join the multicast group and will wait (listen on GID_CHANGE event) until the GID is inserted. In order to track those changes, we add a new flag: * IPOIB_FLAG_DEV_ADDR_SET. When set, it means the dev_addr is a based on a value in the gid table. this bit will be cleared upon a dev_addr change triggered by the user and set after validation. Per IB spec the port GUID can't change if the module is loaded. port GUID is the basis for GID at index 0 which is the basis for the default device address of a ipoib interface. The issue is that there are devices that don't follow the spec, they change the port GUID while HCA is powered on, so in order not to break userspace applications. We need to check if the user wanted to control the device address and we assume that if he sets the device address back to be based on GID index 0, he no longer wishs to control it. In order to track this, we add an additional flag: * IPOIB_FLAG_DEV_ADDR_CTRL When setting the device address, there is no validation of the upper twelve bytes of the device address (flags, qpn, subnet prefix) as those bytes are not under the control of the user. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
3b561130 |
|
25-May-2016 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Support SendOnlyFullMember MCG for SendOnly join Check (via an SA query) if the SM supports the new option for SendOnly multicast joins. If the SM supports that option it will use the new join state to create such multicast group. If SendOnlyFullMember is supported, we wouldn't use faked FullMember state join for SendOnly MCG, use the correct state if supported. This check is performed at every invocation of mcast_restart task, to be sure that the driver stays in sync with the current state of the SM. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
08bc3276 |
|
11-Feb-2016 |
Alex Estrin <alex.estrin@intel.com> |
IB/ipoib: fix for rare multicast join race condition A narrow window for race condition still exist between multicast join thread and *dev_flush workers. A kernel crash caused by prolong erratic link state changes was observed (most likely a faulty cabling): [167275.656270] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [167275.665973] IP: [<ffffffffa05f8f2e>] ipoib_mcast_join+0xae/0x1d0 [ib_ipoib] [167275.674443] PGD 0 [167275.677373] Oops: 0000 [#1] SMP ... [167275.977530] Call Trace: [167275.982225] [<ffffffffa05f92f0>] ? ipoib_mcast_free+0x200/0x200 [ib_ipoib] [167275.992024] [<ffffffffa05fa1b7>] ipoib_mcast_join_task+0x2a7/0x490 [ib_ipoib] [167276.002149] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [167276.010754] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [167276.019088] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [167276.027737] [<ffffffff810a5aef>] kthread+0xcf/0xe0 Here was a hit spot: ipoib_mcast_join() { .............. rec.qkey = priv->broadcast->mcmember.qkey; ^^^^^^^ ..... } Proposed patch should prevent multicast join task to continue if link state change is detected. Signed-off-by: Alex Estrin <alex.estrin@intel.com> Changes from v4: - as suggested by Doug Ledford, optimized spinlock usage, i.e. ipoib_mcast_join() is called with lock held. Changes from v3: - sync with priv->lock before flag check. Chages from v2: - Move check for OPER_UP flag state to mcast_join() to ensure no event worker is in progress. - minor style fixes. Changes from v1: - No need to lock again if error detected. Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
50be28de |
|
07-Jan-2016 |
Erez Shitrit <erezsh@mellanox.com> |
IB/IPoIB: Fix kernel panic on multicast flow ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the parameter mcast->dev. That mcast is a temporary (used as an iterator) variable that may be uninitialized. There is no need to send the variable dev to the function, as each mcast has its dev as a member in the mcast struct. This causes the next panic: RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib] RSP: 0018: EFLAGS: 00010246 RAX: f0201 RBX: 24e00 RCX: 00000 .... .... Stack: Call Trace: ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib] ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib] process_one_work+0x164/0x470 worker_thread+0x11d/0x420 ... Fixes: 5a0e81f6f483 ('IB/IPoIB: factor out common multicast list removal code') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reported-by: Doron Tsur <doront@mellanox.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
432c55ff |
|
21-Dec-2015 |
Christoph Lameter <cl@linux.com> |
IB/IPoIB: Move multicast specific code out of ipoib_main.c Code cleanup to move multicast specific code that checks for a sendonly join to ipoib_multicast.c. This allows the removal of the export of __ipoib_mcast_find(). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
5a0e81f6 |
|
21-Dec-2015 |
Christoph Lameter <cl@linux.com> |
IB/IPoIB: factor out common multicast list removal code Code cleanup to remove multicast specific code from ipoib_main.c The removal of a list of multicast groups occurs in three places. Create a new function ipoib_mcast_remove_list(). Use this new function in ipoib_main.c too. That in turn allows the dropping of two functions that were exported from ipoib_multicast.c for expiration of mc groups. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
55ee3ab2 |
|
15-Oct-2015 |
Matan Barak <matanb@mellanox.com> |
IB/core: Add netdev and gid attributes paramteres to cache Adding an ability to query the IB cache by a netdev and get the attributes of a GID. These parameters are necessary in order to successfully resolve the required GID (when the netdevice is known) and get the Ethernet L2 attributes from a GID. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
0b5c9279 |
|
11-Oct-2015 |
Christoph Lameter <cl@linux.com> |
IB/ipoib: For sendonly join free the multicast group on leave When we leave the multicast group on expiration of a neighbor we do not free the mcast structure. This results in a memory leak that causes ib_dealloc_pd to fail and print a WARN_ON message and backtrace. Fixes: bd99b2e05c4d (IB/ipoib: Expire sendonly multicast joins) Signed-off-by: Christoph Lameter <cl@linux.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e622f2f4 |
|
08-Oct-2015 |
Christoph Hellwig <hch@lst.de> |
IB: split struct ib_send_wr This patch split up struct ib_send_wr so that all non-trivial verbs use their own structure which embedds struct ib_send_wr. This dramaticly shrinks the size of a WR for most common operations: sizeof(struct ib_send_wr) (old): 96 sizeof(struct ib_send_wr): 48 sizeof(struct ib_rdma_wr): 64 sizeof(struct ib_atomic_wr): 96 sizeof(struct ib_ud_wr): 88 sizeof(struct ib_fast_reg_wr): 88 sizeof(struct ib_bind_mw_wr): 96 sizeof(struct ib_sig_handover_wr): 80 And with Sagi's pending MR rework the fast registration WR will also be down to a reasonable size: sizeof(struct ib_fastreg_wr): 64 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt] Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc] Tested-by: Haggai Eran <haggaie@mellanox.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Steve Wise <swise@opengridcomputing.com>
|
#
c3852ab0 |
|
25-Sep-2015 |
Doug Ledford <dledford@redhat.com> |
IB/ipoib: Make sendonly multicast joins create the mcast group Since IPoIB should, as much as possible, emulate how multicast sends work on Ethernet for regular TCP/IP apps, there should be no requirement to subscribe to a multicast group before your sends are properly sent. However, due to the difference in how multicast is handled on InfiniBand, we must join the appropriate multicast group before we can send to it. Previously we tried not to trigger the auto-create feature of the subnet manager when doing this because we didn't have tracking of these sendonly groups and the auto-creation might never get undone. The previous patch added timing to these sendonly joins and allows us to leave them after a reasonable idle expiration time. So supply all of the information needed to auto-create group. Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
bd99b2e0 |
|
23-Sep-2015 |
Christoph Lameter <cl@linux.com> |
IB/ipoib: Expire sendonly multicast joins On neighbor expiration, check to see if the neighbor was actually a sendonly multicast join, and if so, leave the multicast group as we expire the neighbor. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
d1178cbc |
|
21-Aug-2015 |
Jason Gunthorpe <jgg@ziepe.ca> |
IB/ipoib: Suppress warning for send only join failures We expect send only joins to fail, it just means there are no listeners for the group. The correct thing to do is silently drop the packet at source. Eg avahi will full join 224.0.0.251 which causes a send only IGMP packet to 224.0.0.22, and then a warning level kmessage like this: ib0: sendonly multicast join failed for ff12:401b:ffff:0000:0000:0000:0000:0016, status -22 If there is no IP router listening to IGMP. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
c3acdc06 |
|
03-Sep-2015 |
Doug Ledford <dledford@redhat.com> |
IB/ipoib: Clean up send-only multicast joins Even though we don't expect the group to be created by the SM we sill need to provide all the parameters to force the SM to validate they are correct. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
0e5544d9 |
|
02-Apr-2015 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Remove IPOIB_MCAST_RUN bit After Doug Ledford's changes there is no need in that bit, it's semantic becomes subset of the IPOIB_FLAG_OPER_UP bit. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
3fd0605c |
|
02-Apr-2015 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Update broadcast record values after each successful join request Update the cached broadcast record in the priv object after every new join of this broadcast domain group. These values are needed for the port configuration (MTU size) and to all the new multicast (non-broadcast) join requests initial parameters. For example, SM starts with 2K MTU for all the fabric, and after that it restarts (or handover to new SM) with new port configuration of 4K MTU. Without using the new values, the driver will keep its old configuration of 2K and will not apply the new configuration of 4K. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
1c0453d6 |
|
21-Feb-2015 |
Doug Ledford <dledford@redhat.com> |
IB/ipoib: drop mcast_mutex usage We needed the mcast_mutex when we had to prevent the join completion callback from having the value it stored in mcast->mc overwritten by a delayed return from ib_sa_join_multicast. By storing the return of ib_sa_join_multicast in an intermediate variable, we prevent a delayed return from ib_sa_join_multicast overwriting the valid contents of mcast->mc, and we no longer need a mutex to force the join callback to run after the return of ib_sa_join_multicast. This allows us to do away with the mutex entirely and protect our critical sections with a just a spinlock instead. This is highly desirable as there were some places where we couldn't use a mutex because the code was not allowed to sleep, and so we were currently using a mix of mutex and spinlock to protect what we needed to protect. Now we only have a spin lock and the locking complexity is greatly reduced. Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
d2fe937c |
|
21-Feb-2015 |
Doug Ledford <dledford@redhat.com> |
IB/ipoib: deserialize multicast joins Allow the ipoib layer to attempt to join all outstanding multicast groups at once. The ib_sa layer will serialize multiple attempts to join the same group, but will process attempts to join different groups in parallel. Take advantage of that. In order to make this happen, change the mcast_join_thread to loop through all needed joins, sending a join request for each one that we still need to join. There are a few special cases we handle though: 1) Don't attempt to join anything but the broadcast group until the join of the broadcast group has succeeded. 2) No longer restart the join task at the end of completion handling. If we completed successfully, we are done. The join task now needs kicked either by mcast_send or mcast_restart_task or mcast_start_thread, but should not need started anytime else except when scheduling a backoff attempt to rejoin. 3) No longer use separate join/completion routines for regular and sendonly joins, pass them all through the same routine and just do the right thing based on the SENDONLY join flag. 4) Only try to join a SENDONLY join twice, then drop the packets and quit trying. We leave the mcast group in the list so that if we get a new packet, all that we have to do is queue up the packet and restart the join task and it will automatically try to join twice and then either send or flush the queue again. Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
69911416 |
|
21-Feb-2015 |
Doug Ledford <dledford@redhat.com> |
IB/ipoib: fix MCAST_FLAG_BUSY usage Commit a9c8ba5884 ("IPoIB: Fix usage of uninitialized multicast objects") added a new flag MCAST_JOIN_STARTED, but was not very strict in how it was used. We didn't always initialize the completion struct before we set the flag, and we didn't always call complete on the completion struct from all paths that complete it. And when we did complete it, sometimes we continued to touch the mcast entry after the completion, opening us up to possible use after free issues. This made it less than totally effective, and certainly made its use confusing. And in the flush function we would use the presence of this flag to signal that we should wait on the completion struct, but we never cleared this flag, ever. In order to make things clearer and aid in resolving the rtnl deadlock bug I've been chasing, I cleaned this up a bit. 1) Remove the MCAST_JOIN_STARTED flag entirely 2) Change MCAST_FLAG_BUSY so it now only means a join is in-flight 3) Test mcast->mc directly to see if we have completed ib_sa_join_multicast (using IS_ERR_OR_NULL) 4) Make sure that before setting MCAST_FLAG_BUSY we always initialize the mcast->done completion struct 5) Make sure that before calling complete(&mcast->done), we always clear the MCAST_FLAG_BUSY bit 6) Take the mcast_mutex before we call ib_sa_multicast_join and also take the mutex in our join callback. This forces ib_sa_multicast_join to return and set mcast->mc before we process the callback. This way, our callback can safely clear mcast->mc if there is an error on the join and we will do the right thing as a result in mcast_dev_flush. 7) Because we need the mutex to synchronize mcast->mc, we can no longer call mcast_sendonly_join directly from mcast_send and instead must add sendonly join processing to the mcast_join_task 8) Make MCAST_RUN mean that we have a working mcast subsystem, not that we have a running task. We know when we need to reschedule our join task thread and don't need a flag to tell us. 9) Add a helper for rescheduling the join task thread A number of different races are resolved with these changes. These races existed with the old MCAST_FLAG_BUSY usage, the MCAST_JOIN_STARTED flag was an attempt to address them, and while it helped, a determined effort could still trip things up. One race looks something like this: Thread 1 Thread 2 ib_sa_join_multicast (as part of running restart mcast task) alloc member call callback ifconfig ib0 down wait_for_completion callback call completes wait_for_completion in mcast_dev_flush completes mcast->mc is PTR_ERR_OR_NULL so we skip ib_sa_leave_multicast return from callback return from ib_sa_join_multicast set mcast->mc = return from ib_sa_multicast We now have a permanently unbalanced join/leave issue that trips up the refcounting in core/multicast.c Another like this: Thread 1 Thread 2 Thread 3 ib_sa_multicast_join ifconfig ib0 down priv->broadcast = NULL join_complete wait_for_completion mcast->mc is not yet set, so don't clear return from ib_sa_join_multicast and set mcast->mc complete return -EAGAIN (making mcast->mc invalid) call ib_sa_multicast_leave on invalid mcast->mc, hang forever By holding the mutex around ib_sa_multicast_join and taking the mutex early in the callback, we force mcast->mc to be valid at the time we run the callback. This allows us to clear mcast->mc if there is an error and the join is going to fail. We do this before we complete the mcast. In this way, mcast_dev_flush always sees consistent state in regards to mcast->mc membership at the time that the wait_for_completion() returns. Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
efc82eee |
|
21-Feb-2015 |
Doug Ledford <dledford@redhat.com> |
IB/ipoib: No longer use flush as a parameter Various places in the IPoIB code had a deadlock related to flushing the ipoib workqueue. Now that we have per device workqueues and a specific flush workqueue, there is no longer a deadlock issue with flushing the device specific workqueues and we can do so unilaterally. Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
0b39578b |
|
21-Feb-2015 |
Doug Ledford <dledford@redhat.com> |
IB/ipoib: Use dedicated workqueues per interface During my recent work on the rtnl lock deadlock in the IPoIB driver, I saw that even once I fixed the apparent races for a single device, as soon as that device had any children, new races popped up. It turns out that this is because no matter how well we protect against races on a single device, the fact that all devices use the same workqueue, and flush_workqueue() flushes *everything* from that workqueue means that we would also have to prevent all races between different devices (for instance, ipoib_mcast_restart_task on interface ib0 can race with ipoib_mcast_flush_dev on interface ib0.8002, resulting in a deadlock on the rtnl_lock). There are several possible solutions to this problem: Make carrier_on_task and mcast_restart_task try to take the rtnl for some set period of time and if they fail, then bail. This runs the real risk of dropping work on the floor, which can end up being its own separate kind of deadlock. Set some global flag in the driver that says some device is in the middle of going down, letting all tasks know to bail. Again, this can drop work on the floor. Or the method this patch attempts to use, which is when we bring an interface up, create a workqueue specifically for that interface, so that when we take it back down, we are flushing only those tasks associated with our interface. In addition, keep the global workqueue, but now limit it to only flush tasks. In this way, the flush tasks can always flush the device specific work queues without having deadlock issues. Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
894021a7 |
|
21-Feb-2015 |
Doug Ledford <dledford@redhat.com> |
IB/ipoib: Make the carrier_on_task race aware We blindly assume that we can just take the rtnl lock and that will prevent races with downing this interface. Unfortunately, that's not the case. In ipoib_mcast_stop_thread() we will call flush_workqueue() in an attempt to clear out all remaining instances of ipoib_join_task. But, since this task is put on the same workqueue as the join task, the flush_workqueue waits on this thread too. But this thread is deadlocked on the rtnl lock. The better thing here is to use trylock and loop on that until we either get the lock or we see that FLAG_OPER_UP has been cleared, in which case we don't need to do anything anyway and we just return. While investigating which flag should be used, FLAG_ADMIN_UP or FLAG_OPER_UP, it was determined that FLAG_OPER_UP was the more appropriate flag to use. However, there was a mix of these two flags in use in the existing code. So while we check for that flag here as part of this race fix, also cleanup the two places that had used the less appropriate flag for their tests. Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
c84ca6d2 |
|
21-Feb-2015 |
Doug Ledford <dledford@redhat.com> |
IB/ipoib: Consolidate rtnl_lock tasks in workqueue The ipoib_mcast_flush_dev routine is called with the rtnl_lock held and needs to keep it held. It also needs to call flush_workqueue() to flush out any outstanding work. In the past, we've had to try and make sure that we didn't flush out any outstanding join completions because they also wanted to grab rtnl_lock() and that would deadlock. It turns out that the only thing in the join completion handler that needs this lock can be safely moved to our carrier_on_task, thereby reducing the potential for the join completion code and the flush code to deadlock against each other. Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
4143a951 |
|
30-Jan-2015 |
Roland Dreier <roland@purestorage.com> |
Revert "IPoIB: Consolidate rtnl_lock tasks in workqueue" This reverts commit afe1de664ef3cb756e70938d99417dcbc6b1379a. The series of IPoIB bug fixes that went into 3.19-rc1 introduce regressions, and after trying to sort things out, we decided to revert to 3.18's IPoIB driver and get things right for 3.20. Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
c6a7ec7a |
|
30-Jan-2015 |
Roland Dreier <roland@purestorage.com> |
Revert "IPoIB: Make the carrier_on_task race aware" This reverts commit 67d7209e1f481cbaed37f9a224a328a3f83d0482. The series of IPoIB bug fixes that went into 3.19-rc1 introduce regressions, and after trying to sort things out, we decided to revert to 3.18's IPoIB driver and get things right for 3.20. Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
e7a623d2 |
|
30-Jan-2015 |
Roland Dreier <roland@purestorage.com> |
Revert "IPoIB: fix MCAST_FLAG_BUSY usage" This reverts commit 016d9fb25cd9817ea9c723f4f7ecd978636b4489. The series of IPoIB bug fixes that went into 3.19-rc1 introduce regressions, and after trying to sort things out, we decided to revert to 3.18's IPoIB driver and get things right for 3.20. Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
962121b4 |
|
30-Jan-2015 |
Roland Dreier <roland@purestorage.com> |
Revert "IPoIB: fix mcast_dev_flush/mcast_restart_task race" This reverts commit e5d1dcf1b0951f4ba00d93653942dda6196109d8. The series of IPoIB bug fixes that went into 3.19-rc1 introduce regressions, and after trying to sort things out, we decided to revert to 3.18's IPoIB driver and get things right for 3.20. Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
0306eda2 |
|
30-Jan-2015 |
Roland Dreier <roland@purestorage.com> |
Revert "IPoIB: Use dedicated workqueues per interface" This reverts commit 5141861cd5e17eac9676ff49c5abfafbea2b0e98. The series of IPoIB bug fixes that went into 3.19-rc1 introduce regressions, and after trying to sort things out, we decided to revert to 3.18's IPoIB driver and get things right for 3.20. Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
4e0ab200 |
|
30-Jan-2015 |
Roland Dreier <roland@purestorage.com> |
Revert "IPoIB: Make ipoib_mcast_stop_thread flush the workqueue" This reverts commit bb42a6dd02fb2901a69dbec2358810735b14b186. The series of IPoIB bug fixes that went into 3.19-rc1 introduce regressions, and after trying to sort things out, we decided to revert to 3.18's IPoIB driver and get things right for 3.20. Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
bb42a6dd |
|
10-Dec-2014 |
Doug Ledford <dledford@redhat.com> |
IPoIB: Make ipoib_mcast_stop_thread flush the workqueue We used to pass a flush variable to mcast_stop_thread to indicate if we should flush the workqueue or not. This was due to some code trying to flush a workqueue that it was currently running on which is a no-no. Now that we have per-device work queues, and now that ipoib_mcast_restart_task has taken the fact that it is queued on a single thread workqueue with all of the ipoib_mcast_join_task's and therefore has no need to stop the join task while it runs, we can do away with the flush parameter and unilaterally flush always. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
5141861c |
|
10-Dec-2014 |
Doug Ledford <dledford@redhat.com> |
IPoIB: Use dedicated workqueues per interface During my recent work on the rtnl lock deadlock in the IPoIB driver, I saw that even once I fixed the apparent races for a single device, as soon as that device had any children, new races popped up. It turns out that this is because no matter how well we protect against races on a single device, the fact that all devices use the same workqueue, and flush_workqueue() flushes *everything* from that workqueue, we can have one device in the middle of a down and holding the rtnl lock and another totally unrelated device needing to run mcast_restart_task, which wants the rtnl lock and will loop trying to take it unless is sees its own FLAG_ADMIN_UP flag go away. Because the unrelated interface will never see its own ADMIN_UP flag drop, the interface going down will deadlock trying to flush the queue. There are several possible solutions to this problem: Make carrier_on_task and mcast_restart_task try to take the rtnl for some set period of time and if they fail, then bail. This runs the real risk of dropping work on the floor, which can end up being its own separate kind of deadlock. Set some global flag in the driver that says some device is in the middle of going down, letting all tasks know to bail. Again, this can drop work on the floor. I suppose if our own ADMIN_UP flag doesn't go away, then maybe after a few tries on the rtnl lock we can queue our own task back up as a delayed work and return and avoid dropping work on the floor that way. But I'm not 100% convinced that we won't cause other problems. Or the method this patch attempts to use, which is when we bring an interface up, create a workqueue specifically for that interface, so that when we take it back down, we are flushing only those tasks associated with our interface. In addition, keep the global workqueue, but now limit it to only flush tasks. In this way, the flush tasks can always flush the device specific work queues without having deadlock issues. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
e5d1dcf1 |
|
10-Dec-2014 |
Doug Ledford <dledford@redhat.com> |
IPoIB: fix mcast_dev_flush/mcast_restart_task race Our mcast_dev_flush routine and our mcast_restart_task can race against each other. In particular, they both hold the priv->lock while manipulating the rbtree and while removing mcast entries from the multicast_list and while adding entries to the remove_list, but they also both drop their locks prior to doing the actual removes. The mcast_dev_flush routine is run entirely under the rtnl lock and so has at least some locking. The actual race condition is like this: Thread 1 Thread 2 ifconfig ib0 up start multicast join for broadcast multicast join completes for broadcast start to add more multicast joins call mcast_restart_task to add new entries ifconfig ib0 down mcast_dev_flush mcast_leave(mcast A) mcast_leave(mcast A) As mcast_leave calls ib_sa_multicast_leave, and as member in core/multicast.c is ref counted, we run into an unbalanced refcount issue. To avoid stomping on each others removes, take the rtnl lock specifically when we are deleting the entries from the remove list. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
016d9fb2 |
|
10-Dec-2014 |
Doug Ledford <dledford@redhat.com> |
IPoIB: fix MCAST_FLAG_BUSY usage Commit a9c8ba5884 ("IPoIB: Fix usage of uninitialized multicast objects") added a new flag MCAST_JOIN_STARTED, but was not very strict in how it was used. We didn't always initialize the completion struct before we set the flag, and we didn't always call complete on the completion struct from all paths that complete it. This made it less than totally effective, and certainly made its use confusing. And in the flush function we would use the presence of this flag to signal that we should wait on the completion struct, but we never cleared this flag, ever. This is further muddied by the fact that we overload the MCAST_FLAG_BUSY flag to mean two different things: we have a join in flight, and we have succeeded in getting an ib_sa_join_multicast. In order to make things clearer and aid in resolving the rtnl deadlock bug I've been chasing, I cleaned this up a bit. 1) Remove the MCAST_JOIN_STARTED flag entirely 2) Un-overload MCAST_FLAG_BUSY so it now only means a join is in-flight 3) Test on mcast->mc directly to see if we have completed ib_sa_join_multicast (using IS_ERR_OR_NULL) 4) Make sure that before setting MCAST_FLAG_BUSY we always initialize the mcast->done completion struct 5) Make sure that before calling complete(&mcast->done), we always clear the MCAST_FLAG_BUSY bit 6) Take the mcast_mutex before we call ib_sa_multicast_join and also take the mutex in our join callback. This forces ib_sa_multicast_join to return and set mcast->mc before we process the callback. This way, our callback can safely clear mcast->mc if there is an error on the join and we will do the right thing as a result in mcast_dev_flush. 7) Because we need the mutex to synchronize mcast->mc, we can no longer call mcast_sendonly_join directly from mcast_send and instead must add sendonly join processing to the mcast_join_task A number of different races are resolved with these changes. These races existed with the old MCAST_FLAG_BUSY usage, the MCAST_JOIN_STARTED flag was an attempt to address them, and while it helped, a determined effort could still trip things up. One race looks something like this: Thread 1 Thread 2 ib_sa_join_multicast (as part of running restart mcast task) alloc member call callback ifconfig ib0 down wait_for_completion callback call completes wait_for_completion in mcast_dev_flush completes mcast->mc is PTR_ERR_OR_NULL so we skip ib_sa_leave_multicast return from callback return from ib_sa_join_multicast set mcast->mc = return from ib_sa_multicast We now have a permanently unbalanced join/leave issue that trips up the refcounting in core/multicast.c Another like this: Thread 1 Thread 2 Thread 3 ib_sa_multicast_join ifconfig ib0 down priv->broadcast = NULL join_complete wait_for_completion mcast->mc is not yet set, so don't clear return from ib_sa_join_multicast and set mcast->mc complete return -EAGAIN (making mcast->mc invalid) call ib_sa_multicast_leave on invalid mcast->mc, hang forever By holding the mutex around ib_sa_multicast_join and taking the mutex early in the callback, we force mcast->mc to be valid at the time we run the callback. This allows us to clear mcast->mc if there is an error and the join is going to fail. We do this before we complete the mcast. In this way, mcast_dev_flush always sees consistent state in regards to mcast->mc membership at the time that the wait_for_completion() returns. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
67d7209e |
|
10-Dec-2014 |
Doug Ledford <dledford@redhat.com> |
IPoIB: Make the carrier_on_task race aware We blindly assume that we can just take the rtnl lock and that will prevent races with downing this interface. Unfortunately, that's not the case. In ipoib_mcast_stop_thread() we will call flush_workqueue() in an attempt to clear out all remaining instances of ipoib_join_task. But, since this task is put on the same workqueue as the join task, the flush_workqueue waits on this thread too. But this thread is deadlocked on the rtnl lock. The better thing here is to use trylock and loop on that until we either get the lock or we see that FLAG_ADMIN_UP has been cleared, in which case we don't need to do anything anyway and we just return. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
afe1de66 |
|
10-Dec-2014 |
Doug Ledford <dledford@redhat.com> |
IPoIB: Consolidate rtnl_lock tasks in workqueue Setting the MTU can safely be moved to the carrier_on_task, which keeps us from needing to take the rtnl lock in the join_finish section. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
68f9d83c |
|
20-Aug-2014 |
Alex Estrin <alex.estrin@intel.com> |
IPoIB: Remove unnecessary port query There are two queries for port attributes one after another. A second call is not needed since port_attr structure already holds the data. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
94232d9c |
|
16-Oct-2013 |
Erez Shitrit <erezsh@mellanox.com> |
IPoIB: Start multicast join process only on active ports The driver starts the mcast_join task whenever the netdev interface is UP without relation to the underlying IB port state. Until the port state is ACTIVE all the join requests are irrelevant, and the IB core returns -EINVAL. So the user will see errors such as: "multicast join failed for ff12:401b:... , status -22". Instead, have ipoib_mcast_join_task() return when the port is not active. It will be called again when the port state is changed and the low-level driver triggers the IB_EVENT_PORT_ACTIVE event or the IB_EVENT_CLIENT_REREGISTER event. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
a9c8ba58 |
|
16-Oct-2013 |
Erez Shitrit <erezsh@mellanox.com> |
IPoIB: Fix usage of uninitialized multicast objects The driver should avoid calling ib_sa_free_multicast on the mcast->mc object until it finishes its initialization state. Otherwise we can crash when ipoib_mcast_dev_flush() attempts to use the uninitialized multicast object. Instead, only call wait_for_completion() for multicast entries that started the join process, meaning that ib_sa_join_multicast() finished. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
bea1e22d |
|
30-Aug-2012 |
Patrick McHardy <kaber@trash.net> |
IPoIB: Fix use-after-free of multicast object Fix a crash in ipoib_mcast_join_task(). (with help from Or Gerlitz) Commit c8c2afe360b7 ("IPoIB: Use rtnl lock/unlock when changing device flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which is run from the ipoib_workqueue, and hence the workqueue can't be flushed from the context of ipoib_stop(). In the current code, ipoib_stop() (which doesn't flush the workqueue) calls ipoib_mcast_dev_flush(), which goes and deletes all the multicast entries. This takes place without any synchronization with a possible running instance of ipoib_mcast_join_task() for the same ipoib device, leading to a crash due to NULL pointer dereference. Fix this by making sure that the workqueue is flushed before ipoib_mcast_dev_flush() is called. To make that possible, we move the RTNL-lock wrapped code to ipoib_mcast_join_finish(). Signed-off-by: Patrick McHardy <kaber@trash.net> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
b5120a6e |
|
29-Aug-2012 |
Shlomo Pongratz <shlomop@mellanox.com> |
IPoIB: Fix AB-BA deadlock when deleting neighbours Lockdep points out a circular locking dependency betwwen the ipoib device priv spinlock (priv->lock) and the neighbour table rwlock (ntbl->rwlock). In the normal path, ie neigbour garbage collection task, the neigh table rwlock is taken first and then if the neighbour needs to be deleted, priv->lock is taken. However in some error paths, such as in ipoib_cm_handle_tx_wc(), priv->lock is taken first and then ipoib_neigh_free routine is called which in turn takes the neighbour table ntbl->rwlock. The solution is to get rid the neigh table rwlock completely and use only priv->lock. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
b63b70d8 |
|
24-Jul-2012 |
Shlomo Pongratz <shlomop@mellanox.com> |
IPoIB: Use a private hash table for path lookup in xmit path Dave Miller <davem@davemloft.net> provided a detailed description of why the way IPoIB is using neighbours for its own ipoib_neigh struct is buggy: Any time an ipoib_neigh is changed, a sequence like the following is made: spin_lock_irqsave(&priv->lock, flags); /* * It's safe to call ipoib_put_ah() inside * priv->lock here, because we know that * path->ah will always hold one more reference, * so ipoib_put_ah() will never do more than * decrement the ref count. */ if (neigh->ah) ipoib_put_ah(neigh->ah); list_del(&neigh->list); ipoib_neigh_free(dev, neigh); spin_unlock_irqrestore(&priv->lock, flags); ipoib_path_lookup(skb, n, dev); This doesn't work, because you're leaving a stale pointer to the freed up ipoib_neigh in the special neigh->ha pointer cookie. Yes, it even fails with all the locking done to protect _changes_ to *ipoib_neigh(n), and with the code in ipoib_neigh_free() that NULLs out the pointer. The core issue is that read side calls to *to_ipoib_neigh(n) are not being synchronized at all, they are performed without any locking. So whether we hold the lock or not when making changes to *ipoib_neigh(n) you still can have threads see references to freed up ipoib_neigh objects. cpu 1 cpu 2 n = *ipoib_neigh() *ipoib_neigh() = NULL kfree(n) n->foo == OOPS [..] Perhaps the ipoib code can have a private path database it manages entirely itself, which holds all the necessary information and is looked up by some generic key which is available easily at transmit time and does not involve generic neighbour entries. See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and <http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b> for the full discussion. This patch aims to solve the race conditions found in the IPoIB driver. The patch removes the connection between the core networking neighbour structure and the ipoib_neigh structure. In addition to avoiding the race described above, it allows us to handle SKBs carrying IP packets that don't have any associated neighbour. We add an ipoib_neigh hash table with N buckets where the key is the destination hardware address. The ipoib_neigh is fetched from the hash table and instead of the stashed location in the neighbour structure. The hash table uses both RCU and reference counting to guarantee that no ipoib_neigh instance is ever deleted while in use. Fetching the ipoib_neigh structure instance from the hash also makes the special code in ipoib_start_xmit that handles remote and local bonding failover redundant. Aged ipoib_neigh instances are deleted by a garbage collection task that runs every M seconds and deletes every ipoib_neigh instance that was idle for at least 2*M seconds. The deletion is safe since the ipoib_neigh instances are protected using RCU and reference count mechanisms. The number of buckets (N) and frequency of running the GC thread (M), are taken from the exported arb_tbl. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
700db99d |
|
05-Jul-2012 |
David S. Miller <davem@davemloft.net> |
ipoib: Need to do dst_neigh_lookup_skb() outside of priv->lock. Otherwise local_bh_enable() complains. Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
178709bb |
|
02-Jul-2012 |
David S. Miller <davem@davemloft.net> |
ipoib: Convert over to dev_lookup_neigh_skb(). Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
936d7de3 |
|
07-Feb-2012 |
Roland Dreier <roland@purestorage.com> |
IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses Commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") made it possible for a netdev driver to use skb->cb between its header_ops.create method and its .ndo_start_xmit method. Use this in ipoib_hard_header() to stash away the LL address (GID + QPN), instead of the "ipoib_pseudoheader" hack. This allows IPoIB to stop lying about its hard_header_len, which will let us fix the L2 check for GRO. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
377cb4f9 |
|
07-Feb-2012 |
Roland Dreier <roland@purestorage.com> |
IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses Commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") made it possible for a netdev driver to use skb->cb between its header_ops.create method and its .ndo_start_xmit method. Use this in ipoib_hard_header() to stash away the LL address (GID + QPN), instead of the "ipoib_pseudoheader" hack. This allows IPoIB to stop lying about its hard_header_len, which will let us fix the L2 check for GRO. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27217455 |
|
02-Dec-2011 |
David Miller <davem@davemloft.net> |
net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}. To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
|
#
580da35a |
|
29-Nov-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
IB: Fix RCU lockdep splats Commit f2c31e32b37 ("net: fix NULL dereferences in check_peer_redir()") forgot to take care of infiniband uses of dst neighbours. Many thanks to Marc Aurele who provided a nice bug report and feedback. Reported-by: Marc Aurele La France <tsi@ualberta.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
3874397c |
|
21-Nov-2011 |
Mike Marciniszyn <mike.marciniszyn@qlogic.com> |
IB/ipoib: Prevent hung task or softlockup processing multicast response This following can occur with ipoib when processing a multicast reponse: BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982] Modules linked in: ... CPU 0: Modules linked in: ... Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5 RIP: 0010:[<ffffffff814ddb27>] [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20 RSP: 0018:ffff8802119ed860 EFLAGS: 00000246 0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299 RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246 RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000 R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860 R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib] [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib] [<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib] [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0 [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0 [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0 [<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib] [<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib] [<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa] [<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad] [<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core] [<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa] [<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa] [<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa] [<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad] [<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad] [<ffffffff81088830>] ? worker_thread+0x170/0x2a0 [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40 [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0 [<ffffffff8108ddf6>] ? kthread+0x96/0xa0 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 Coinciding with stack trace is the following message: ib0: ib_address_create failed The code below in ipoib_mcast_join_finish() will note the above failure in the address handle but otherwise continue: ah = ipoib_create_ah(dev, priv->pd, &av); if (!ah) { ipoib_warn(priv, "ib_address_create failed\n"); } else { The while loop at the bottom of ipoib_mcast_join_finish() will attempt to send queued multicast packets in mcast->pkt_queue and eventually end up in ipoib_mcast_send(): if (!mcast->ah) { if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE) skb_queue_tail(&mcast->pkt_queue, skb); else { ++dev->stats.tx_dropped; dev_kfree_skb_any(skb); } My read is that the code will requeue the packet and return to the ipoib_mcast_join_finish() while loop and the stage is set for the "hung" task diagnostic as the while loop never sees a non-NULL ah, and will do nothing to resolve. There are GFP_ATOMIC allocates in the provider routines, so this is possible and should be dealt with. The test that induced the failure is associated with a host SM on the same server during a shutdown. This patch causes ipoib_mcast_join_finish() to exit with an error which will flush the queued mcast packets. Nothing is done to unwind the QP attached state so that subsequent sends from above will retry the join. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Reviewed-by: Gary Leshner <gary.leshner@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
fec14d2f |
|
29-Aug-2011 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
infiniband: add moduleparam.h to drivers/infiniband as required These files were getting the moduleparam infrastructure from the implicit presence of module.h being everywhere, but that is going away soon. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
69cce1d1 |
|
18-Jul-2011 |
David S. Miller <davem@davemloft.net> |
net: Abstract dst->neighbour accesses behind helpers. dst_{get,set}_neighbour() Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
22bedad3 |
|
01-Apr-2010 |
Jiri Pirko <jpirko@redhat.com> |
net: convert multicast list to list_head Converts the list and the core manipulating with it to be the same as uc_list. +uses two functions for adding/removing mc address (normal and "global" variant) instead of a function parameter. +removes dev_mcast.c completely. +exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for manipulation with lists on a sandbox (used in bonding and 80211 drivers) Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
3e4aa12f |
|
21-Mar-2010 |
Jiri Pirko <jpirko@redhat.com> |
ipoib: remove addrlen check for mc addresses Finally this bit can be removed. Currently, after the bonding driver is changed/fixed (32a806c194ea112cfab00f558482dd97bee5e44e net-next-2.6), that's not possible for an addr with different length than dev->addr_len to be present in list. Removing this check as in new mc_list there will be no addrlen in the record. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c74651c |
|
26-Feb-2010 |
Jiri Pirko <jpirko@redhat.com> |
ipoib: returned back addrlen check for mc addresses Apparently bogus mc address can break IPOIB multicast processing. Therefore returning the check for addrlen back until this is resolved in bonding (I don't see any other point from where mc address with non-dev->addr_len length can came from). Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fbf219f1 |
|
23-Feb-2010 |
Jiri Pirko <jpirko@redhat.com> |
infiniband: convert to use netdev_for_each_mc_addr Due to the loop complexicity in nes_nic.c, I'm using char* to copy mc addresses to it. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5ee95120 |
|
24-Sep-2009 |
Moni Shoua <monis@Voltaire.COM> |
IPoIB: Don't turn on carrier for a non-active port Multicast joins can succeed even if the IB port is down. This happens when the SM runs on the same port with the requesting port. However, IPoIB calls netif_carrier_on() when the join of the broadcast group succeeds, without caring about the state of the IB port. The result is an IPoIB interface in RUNNING state but without an active IB port to support it. If a bonding interface uses this IPoIB interface as a slave it might not detect that this slave is almost useless and failover functionality will be damaged. The fix checks the state of the IB port in the carrier_task before calling netif_carrier_on(). Adresses: https://bugs.openfabrics.org/show_bug.cgi?id=1726 Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
5e47596b |
|
05-Sep-2009 |
Jason Gunthorpe <jgg@ziepe.ca> |
IPoIB: Check multicast address format Check that the format of multicast link addresses is correct before taking them from dev->mc_list to priv->multicast_list. This way we never try to send a bogus address to the SA, which prevents badness from erronous 'ip maddr addr add', broken bonding drivers, etc. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
721d67cd |
|
05-Sep-2009 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Drop priv->lock before calling ipoib_send() IPoIB currently must use irqsave locking for priv->lock, since it is taken from interrupt context in one path. However, ipoib_send() does skb_orphan(), and the network stack locking is not IRQ-safe. Therefore we need to make sure we don't hold priv->lock when calling ipoib_send() to avoid lockdep warnings (the code was almost certainly safe in practice, since the only code path that takes priv->lock from interrupt context would never call into the network stack). Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757 Reported-by: Bart Van Assche <bart.vanassche@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
adf30907 |
|
01-Jun-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: skb->dst accessors Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3c209620 |
|
16-Jan-2009 |
Yossi Etigin <yosefe@Voltaire.COM> |
IPoIB: Do not print error messages for multicast join retries When IPoIB tries to join a multicast group, and the SA module's SM address handle is NULL (because of an SM change, etc), the join returns with -EAGAIN status. In that case, don't print an error message unless multicast debugging is enabled. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
50df48f5 |
|
12-Jan-2009 |
Yossi Etigin <yosefe@Voltaire.COM> |
IPoIB: Do not join broadcast group if interface is brought down Because the ipoib_workqueue is not flushed when ipoib interface is brought down, ipoib_mcast_join() may trigger a join to the broadcast group after priv->broadcast was set to NULL (during cleanup). This will cause the system to be a member of the broadcast group when interface is down. As a side effect, this breaks the optimization of setting the Q_key only when joining the broadcast group. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
5b095d989 |
|
29-Oct-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: replace %p6 with %pI6 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fcace2fe |
|
28-Oct-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
infiniband: ipoib replace IPOIB_GID_FMT with %p6 Replace all uses of IPOIB_GID_FMT, IPOIB_GID_RAW_ARG() and IPOIB_GID_ARG() Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
943c246e |
|
30-Sep-2008 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling tx_lock. Not only do we want to get rid of LLTX, this actually causes problems because of the skb_orphan() done with this tx_lock held: some skb destructors expect to be run with interrupts enabled. The simplest fix for this is to get rid of the driver-private tx_lock and stop using LLTX. We kill off priv->tx_lock and use netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit tricky because we need to update places that take priv->lock inside the tx_lock to disable IRQs, rather than relying on tx_lock having already disabled IRQs. Also, there are a couple of places where we need to disable BHs to make sure we have a consistent context to call netif_tx_lock() (since we no longer can use _irqsave() variants), and we also have to change ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather than directly, because ipoib_send_comp_handler() runs in interrupt context and drain_tx_cq() must run in BH context so it can call netif_tx_lock(). Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
e8224e4b |
|
16-Sep-2008 |
Yossi Etigin <yossi.openib@gmail.com> |
IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. This works because we only flush the ipoib_workqueue with the RTNL not held. The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop(). This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on RTNL in ipoib_stop()"). Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
a77a57a1 |
|
19-Aug-2008 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Fix deadlock on RTNL in ipoib_stop() Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which is run from the ipoib_workqueue. However, ipoib_stop() (which is run inside rtnl_lock()) flushes this workqueue, which leads to a deadlock if the join task is pending. Fix this by simply not flushing the workqueue from ipoib_stop(). It turns out that we really don't care about workqueue tasks running during or after ipoib_stop(), as long as we make sure to flush the workqueue before unregistering a netdev. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
b9e40857 |
|
15-Jul-2008 |
David S. Miller <davem@davemloft.net> |
netdev: Do not use TX lock to protect address lists. Now that we have a specific lock to protect the network device unicast and multicast lists, remove extraneous grabs of the TX lock in cases where the code only needs address list protection. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e308a5d8 |
|
15-Jul-2008 |
David S. Miller <davem@davemloft.net> |
netdev: Add netdev->addr_list_lock protection. Add netif_addr_{lock,unlock}{,_bh}() helpers. Use them to protect operations that operate on or read the network device unicast and multicast address lists. Also use them in cases where the code simply wants to block calls into the driver's ->set_rx_mode() and ->set_multicast_list() methods. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bd360671 |
|
15-Jul-2008 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Use dev_set_mtu() to change mtu When the driver sets the MTU of the net device outside of its change_mtu method, it should make use of dev_set_mtu() instead of directly setting the mtu field of struct netdevice. Otherwise functions registered to be called upon MTU change will not get called (this is done through call_netdevice_notifiers() in dev_set_mtu()). Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
c8c2afe3 |
|
15-Jul-2008 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Use rtnl lock/unlock when changing device flags Use of this lock is required to synchronize changes to the netdvice's data structs. Also move the call to ipoib_flush_paths() after the modification of the netdevice flags in set_mode(). Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
9eae554c |
|
15-Jul-2008 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Get rid of ipoib_mcast_detach() wrapper ipoib_mcast_detach() does nothing except call ib_detach_mcast(), so just use the core API in the one place that does a multicast group detach. add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-105 (-105) function old new delta ipoib_mcast_leave 357 319 -38 ipoib_mcast_detach 67 - -67 Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
d0de1362 |
|
15-Jul-2008 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Only set Q_Key once: after joining broadcast group The current code will set the Q_Key for any join of a non-sendonly multicast group. The operation involves a modify QP operation, which is fairly heavyweight, and is only really required after the join of the broadcast group. Fix this by adding a parameter to ipoib_mcast_attach() to control when the Q_Key is set. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
c03d4731 |
|
15-Jul-2008 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Remove unused IPOIB_MCAST_STARTED code The IPOIB_MCAST_STARTED flag is not used at all since commit b3e2749b ("IPoIB: Don't drop multicast sends when they can be queued"), so remove it. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
f3781d2e |
|
15-Jul-2008 |
Roland Dreier <rolandd@cisco.com> |
RDMA: Remove subversion $Id tags They don't get updated by git and so they're worse than useless. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
e1d50dce |
|
20-May-2008 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish() We saw a kernel oops in our regression testing when a multicast "join finish" occurred just after the interface was -- this is <https://bugs.openfabrics.org/show_bug.cgi?id=1040>. The test randomly causes the HCA physical port to go down then up. The cause of this is that ipoib_mcast_join_finish() processing happen just after ipoib_mcast_dev_flush() was invoked (in which case the broadcast pointer is NULL). This patch tests for and handles the case where priv->broadcast is NULL. Cc: <stable@kernel.org> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
bc7b3a36 |
|
23-Apr-2008 |
Shirley Ma <mashirle@us.ibm.com> |
IPoIB: Handle 4K IB MTU for UD (datagram) mode This patch enables IPoIB to use 4K UD messages (when the underlying device and fabrics support a 4K MTU) by using two scatter buffers when PAGE_SIZE is less than or equal to thhe HCA IB MTU size. The first buffer is for IPoIB header + GRH header, and the second buffer is the IPoIB payload, which is 4K-4. Signed-off-by: Shirley Ma <xma@us.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
b3e2749b |
|
11-Mar-2008 |
Or Gerlitz <ogerlitz@voltaire.com> |
IPoIB: Don't drop multicast sends when they can be queued When set_multicast_list() is called the multicast task is restarted and the IPOIB_MCAST_STARTED bit is cleared. As a result for some window of time, multicast packets are not transmitted nor queued but rather dropped by ipoib_mcast_send(). These dropped packets are painful in two cases: - bonding fail-over which both calls set_multicast_list() on the new active slave and sends Gratuitous ARP through that slave. - IP_DROP_MEMBERSHIP code which both calls set_multicast_list() on the device and issues IGMP leave. In both these cases, depending on the scheduling of the IPoIB multicast task, the packets would be dropped. As a result, in the bonding case, the failover would not be detected by the peers until their neighbour is renewed the neighbour (which takes a few tens of seconds). In the IGMP case, the IP router doesn't get an IGMP leave and would only learn on that from further probes on the group (also a delay of at least a few tens of seconds). Fix this by allowing transmission (or queuing) depending on the IPOIB_FLAG_OPER_UP flag instead of the IPOIB_MCAST_STARTED flag. Signed-off-by: Olga Shern <olgas@voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
a9e527e3 |
|
10-Dec-2007 |
Rolf Manderscheid <rvm@obsidianresearch.com> |
IPoIB: improve IPv4/IPv6 to IB mcast mapping functions An IPoIB subnet on an IB fabric that spans multiple IB subnets can't use link-local scope in multicast GIDs. The existing routines that map IP/IPv6 multicast addresses into IB link-level addresses hard-code the scope to link-local, and they also leave the partition key field uninitialised. This patch adds a parameter (the link-level broadcast address) to the mapping routines, allowing them to initialise both the scope and the P_Key appropriately, and fixes up the call sites. The next step will be to add a way to configure the scope for an IPoIB interface. Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
2337f809 |
|
23-Oct-2007 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Trivial formatting cleanups Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
732a2170 |
|
09-Oct-2007 |
Moni Shoua <monis@voltaire.com> |
IB/ipoib: Bound the net device to the ipoib_neigh structue IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb->dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Under this scheme, ipoib_neigh_destructor assumption that for each struct neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev) can be casted to struct ipoib_dev_priv is buggy. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one, when n->dev->flags has the IFF_MASTER bit set. Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: Roland Dreier <rdreier@cisco.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
de903512 |
|
28-Sep-2007 |
Roland Dreier <rolandd@cisco.com> |
[IPoIB]: Convert to netdevice internal stats Use the stats member of struct netdevice in IPoIB, so we can save memory by deleting the stats member of struct ipoib_dev_priv, and save code by deleting ipoib_get_stats(). Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
335a64a5a |
|
08-Oct-2007 |
Or Gerlitz <ogerlitz@voltaire.com> |
IPoIB: Allow setting policy to ignore multicast groups The kernel IB stack allows (through the RDMA CM) userspace applications to join and use multicast groups from the IPoIB MGID range. This allows multicast traffic to be handled directly from userspace QPs, without going through the kernel stack, which gives better performance for some applications. However, to fully interoperate with IP multicast, such userspace applications need to participate in IGMP reports and queries, or else routers may not forward the multicast traffic to the system where the application is running. The simplest way to do this is to share the kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to join multicast groups that are being handled directly in userspace. However, in such cases, the actual multicast traffic should not also be handled by the IPoIB interface, because that would burn resources handling multicast packets that will just be discarded in the kernel. To handle this, this patch adds lookup on the database used for IB multicast group reference counting when IPoIB is joining multicast groups, and if a multicast group is already handled by user space, then the IPoIB kernel driver ignores the group. This is controlled by a per-interface policy flag. When the flag is set, IPoIB will not join and attach its QP to a multicast group which already has an entry in the database; when the flag is cleared, IPoIB will behave as before this change. For each IPoIB interface, the /sys/class/net/$intf/umcast attribute controls the policy flag. The default value is off/0. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
81668838 |
|
02-Aug-2007 |
Sean Hefty <sean.hefty@intel.com> |
IPoIB: Specify Traffic Class with path record queries for QoS support To support QoS within and between subnets, modify IPoIB to request specific Traffic Class values with path record queries, using the value associated with the IPoIB broadcast group. Signed-off-by: Sean Hefty <sean.hefty@intel.com> [ See some comments I made on this at v1 and v2 of the posts <http://lists.openfabrics.org/pipermail/general/2007-August/039275.html> <http://lists.openfabrics.org/pipermail/general/2007-September/040312.html> ] Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
24bd1e4e |
|
18-May-2007 |
Michael S. Tsirkin <mst@dev.mellanox.co.il> |
IB/ipoib: Fix typos in error messages Trivial error message fixups. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
e07832b6 |
|
19-Mar-2007 |
Sean Hefty <sean.hefty@intel.com> |
IPoIB: Fix race in detaching from mcast group before attaching There's a race between ipoib_mcast_leave() and ipoib_mcast_join_finish() where we can try to detach from a multicast group before we've attached to it. Fix this by reordering the code in ipoib_mcast_leave to free the multicast group first, which waits for the multicast callback thread (which calls ipoib_mcast_join_finish()) to complete before detaching from the group. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
55c9adde |
|
08-Mar-2007 |
Shirley Ma <xma@us.ibm.com> |
IPoIB: Turn on interface's carrier after broadcast group is joined Do netif_carrier_on() right after the IPv4 broadcast multicast group is joined, rather than waiting for all of the initial set of multicast group joins to finish. This allows at least IPv4 traffic to limp along on broken fabrics where not all multicast groups can be joined. Signed-off-by: Shirley Ma <xma@us.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
658bcef6 |
|
21-Feb-2007 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Remove unused local_rate tracking Now that low-level drivers handle the conversion from an absolute rate to a relative rate, there's no need for the IPoIB driver to keep track of the local port's data rate. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
faec2f7b |
|
15-Feb-2007 |
Sean Hefty <sean.hefty@intel.com> |
IB/sa: Track multicast join/leave requests The IB SA tracks multicast join/leave requests on a per port basis and does not do any reference counting: if two users of the same port join the same group, and one leaves that group, then the SA will remove the port from the group even though there is one user who wants to stay a member left. Therefore, in order to support multiple users of the same multicast group from the same port, we need to perform reference counting locally. To do this, add an multicast submodule to ib_sa to perform reference counting of multicast join/leave operations. Modify ib_ipoib (the only in-kernel user of multicast) to use the new interface. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
839fcaba |
|
05-Feb-2007 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: Connected mode experimental support The following patch adds experimental support for IPoIB connected mode, as defined by the draft from the IETF ipoib working group. The idea is to increase performance by increasing the MTU from the maximum of 2K (theoretically 4K) supported by IPoIB on top of UD. With this code, I'm able to get 800MByte/sec or more with netperf without options on a Mellanox 4x back-to-back DDR system. Some notes on code: 1. SRQ is used for scalability to large cluster sizes 2. Only RC connections are used (UC does not support SRQ now) 3. Retry count is set to 0 since spec draft warns against retries 4. Each connection is used for data transfers in only 1 direction, so each connection is either active(TX) or passive (RX). 2 sides that want to communicate create 2 connections. 5. Each active (TX) connection has a separate CQ for send completions - this keeps the code simple without CQ resize and other tricks 6. To detect stale passive side connections (where the remote side is down), we keep an LRU list of passive connections (updated once per second per connection) and destroy a connection after it has been unused for several seconds. The LRU rule makes it possible to avoid scanning connections that have recently been active. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
2745b5b7 |
|
16-Nov-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: Fix skb leak when freeing neighbour ipoib_neigh_free() is sometimes called while neighbour is still alive, so it might still have queued skbs. Fix skb leak in this case. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
c4028958 |
|
22-Nov-2006 |
David Howells <dhowells@redhat.com> |
WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>
|
#
d0df6d6d |
|
22-Sep-2006 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Create MCGs with all attributes required by RFC RFC 4391 ("Transmission of IP over InfiniBand (IPoIB)") says: If the IB multicast group does not already exist, one must be created first with the IPoIB link MTU. The MGID MUST use the same P_Key, Q_Key, SL, MTU, and HopLimit as those used in the broadcast-GID. The rest of attributes SHOULD follow the values used in the broadcast-GID as well. However, the current IPoIB driver is only setting the attributes required by the InfiniBand spec to create a multicast group, so in particular the MTU and HopLimit are not being set. Add these attributes when creating MCGs, and also set the Rate attribute, since IPoIB pays attention to that attribute as well. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
c1a0b23b |
|
21-Aug-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IB/sa: Require SA registration Require users to register with SA module, to prevent the sa_query module text from going away while an SA query callback is still running. Update all in-tree users for the new interface. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
3cd96564 |
|
22-Sep-2006 |
Roland Dreier <rolandd@cisco.com> |
IB: Whitespace fixes Remove some trailing whitespace that has snuck in despite the best efforts of whitespace=error-all. Also fix a few other whitespace bogosities. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
c11bd42a |
|
14-Sep-2006 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Retry failed send-only multicast group joins When a send-only multicast group join fails, mcast->query must be set to NULL. Otherwise, IPoIB will never retry the join and the multicast group will never be reachable. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
624d01f8 |
|
24-Jul-2006 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/ipoib: Fix oops with ipoib_debug_mcast set Need to set mcast->ah before debug code dereferences it. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
179e0917 |
|
26-Jun-2006 |
Akinobu Mita <mita@miraclelinux.com> |
[PATCH] drivers: use list_move() This patch converts the combination of list_del(A) and list_add(A, B) to list_move(A, B) under drivers/. Acked-by: Corey Minyard <minyard@mvista.com> Cc: Ben Collins <bcollins@debian.org> Acked-by: Roland Dreier <rolandd@cisco.com> Cc: Alasdair Kergon <dm-devel@redhat.com> Cc: Gerd Knorr <kraxel@bytesex.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Frank Pavlic <fpavlic@de.ibm.com> Acked-by: Matthew Wilcox <matthew@wil.cx> Cc: Andrew Vasquez <linux-driver@qlogic.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
932ff279 |
|
09-Jun-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[NET]: Add netif_tx_lock Various drivers use xmit_lock internally to synchronise with their transmission routines. They do so without setting xmit_lock_owner. This is fine as long as netpoll is not in use. With netpoll it is possible for deadlocks to occur if xmit_lock_owner isn't set. This is because if a printk occurs while xmit_lock is held and xmit_lock_owner is not set can cause netpoll to attempt to take xmit_lock recursively. While it is possible to resolve this by getting netpoll to use trylock, it is suboptimal because netpoll's sole objective is to maximise the chance of getting the printk out on the wire. So delaying or dropping the message is to be avoided as much as possible. So the only alternative is to always set xmit_lock_owner. The following patch does this by introducing the netif_tx_lock family of functions that take care of setting/unsetting xmit_lock_owner. I renamed xmit_lock to _xmit_lock to indicate that it should not be used directly. I didn't provide irq versions of the netif_tx_lock functions since xmit_lock is meant to be a BH-disabling lock. This is pretty much a straight text substitution except for a small bug fix in winbond. It currently uses netif_stop_queue/spin_unlock_wait to stop transmission. This is unsafe as an IRQ can potentially wake up the queue. So it is safer to use netif_tx_disable. The hamradio bits used spin_lock_irq but it is unnecessary as xmit_lock must never be taken in an IRQ handler. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
37c22a77 |
|
29-May-2006 |
Jack Morgenstein <jackm@mellanox.co.il> |
IPoIB: Fix kernel unaligned access on ia64 Fix misaligned access faults on ia64: never cast a misaligned neighbour->ha + 4 pointer to union ib_gid type; pass a void * pointer instead. The memcpy was being optimized to use full word accesses because the compiler thought that union ib_gid is always aligned. The cast in IPOIB_GID_ARG is safe, since it is fixed to access each byte separately. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
f2de3b06 |
|
05-Apr-2006 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Wait for join to finish before freeing mcast struct ipoib_mcast_restart_task() might free an mcast object while a join request is still outstanding, leading to an oops when the query completes. Fix this by waiting for query to complete, similar to what ipoib_stop_thread() is doing. The wait for mcast completion code is consolidated in wait_for_mcast_join(). Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
bf6a9e31 |
|
10-Apr-2006 |
Jack Morgenstein <jackm@mellanox.co.il> |
IB: simplify static rate encoding Push translation of static rate to HCA format into low-level drivers, where it belongs. For static rate encoding, use encoding of rate field from IB standard PathRecord, with addition of value 0, for backwards compatibility with current usage. The changes are: - Add enum ib_rate to midlayer includes. - Get rid of static rate translation in IPoIB; just use static rate directly from Path and MulticastGroup records. - Update mthca driver to translate absolute static rate into the format used by hardware. This also fixes mthca's static rate handling for HCAs that are capable of 4X DDR. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
d2e0655e |
|
04-Apr-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: Consolidate private neighbour data handling Consolidate IPoIB's private neighbour data handling into ipoib_neigh_alloc() and ipoib_neigh_free(). This will make it easier to keep track of the neighbour structures that IPoIB is handling, and is a nice cleanup of the code: add/remove: 2/1 grow/shrink: 1/8 up/down: 100/-178 (-78) function old new delta ipoib_neigh_alloc - 61 +61 ipoib_neigh_free - 36 +36 ipoib_mcast_join_finish 1288 1291 +3 path_rec_completion 575 573 -2 ipoib_mcast_join_task 664 660 -4 ipoib_neigh_destructor 101 92 -9 ipoib_neigh_setup_dev 14 3 -11 ipoib_neigh_setup 17 - -17 path_free 238 215 -23 ipoib_mcast_free 329 306 -23 ipoib_mcast_send 718 684 -34 neigh_add_path 705 650 -55 Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
e35fc385 |
|
20-Mar-2006 |
Arnaldo Carvalho de Melo <acme@mandriva.com> |
[INFINIBAND] ipoib: Remove leftover use of neigh_ops->destructor Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8b9ab02b |
|
07-Mar-2006 |
Roland Dreier <rdreier@cisco.com> |
IPoIB: Fix build now that neighbour destructor is in neigh_params Fix the IPoIB build (which is broken in net-2.6.17 because of my screw-up, which left out this chunk in ipoib_multicast.c). The neighbour destructor is now in neigh_params, so we don't need to clear it in the ops structure. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
9acf6a85 |
|
02-Mar-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: Fix multicast race between canceling and completing ipoib_mcast_stop_thread currently tests mcast->query and if it is NULL, does not perform wait_for_completion on the mcast and frees the mcast object directly. However, since both operations are done without locking, it is possible that ipoib_mcast_join_complete is in progress on this mcast object and has set mcast->query to NULL already. Solve this by: - taking priv->lock before we change mcast->query in ipoib_mcast_join_complete, and keeping it until we no longer need the mcast object - taking priv->lock around mcast->query test in ipoib_mcast_stop_thread Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
7343b231 |
|
27-Feb-2006 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Close race in setting mcast->ah ipoib_mcast_send() tests mcast->ah twice. If this value is changed between these two points, we leak an skb. However, ipoib_mcast_join_finish() sets mcast->ah with no locking, so it could race against ipoib_mcast_send(). As a solution, take priv->lock around assignment to mcast->ah thus making sure ipoib_mcast_send() (which also takes priv->lock) is not in flight. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
20b83382 |
|
11-Feb-2006 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Yet another fix for send-only joins Even after the last fix, it's still possible for a send-only join to start before the join for the broadcast group has finished. This could cause us to create a multicast group using attributes from the broadcast group that haven't been initialized yet, so we would use garbage for the Q_Key, etc. Fix this by waiting until the broadcast group's attached flag is set before starting send-only joins. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
7bcb974e |
|
07-Feb-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: Fix another send-only join race Further, there's an additional issue that I saw in testing: ipoib_mcast_send may get called when priv->broadcast is NULL (e.g. if the device was downed and then upped internally because of a port event). If this happends and the send-only join request gets completed before priv->broadcast is set, we get an oops. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
479a0796 |
|
07-Feb-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: Don't start send-only joins while multicast thread is stopped Fix the following race scenario: - Device is up. - Port event or set mcast list triggers ipoib_mcast_stop_thread, this cancels the query and waits on mcast "done" completion. - Completion is called and "done" is set. - Meanwhile, ipoib_mcast_send arrives and starts a new query, re-initializing "done". Fix this by adding a "multicast started" bit and checking it before starting a send-only join. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
b36f170b6 |
|
17-Jan-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: Lock accesses to multicast packet queues Avoid corrupting mcast->pkt_queue by serializing access with priv->tx_lock. Also, update dropped packet statistics to count multicast packets removed from pkt_queue as dropped. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
95ed644f |
|
13-Jan-2006 |
Ingo Molnar <mingo@elte.hu> |
IB: convert from semaphores to mutexes semaphore to mutex conversion by Ingo and Arjan's script. Signed-off-by: Ingo Molnar <mingo@elte.hu> [ Sanity-checked on real IB hardware ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
988bd503 |
|
12-Jan-2006 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Fix memory leak of multicast group structures The current handling of multicast groups in IPoIB ends up never freeing send-only multicast groups. It turns out the logic was much more complicated than it needed to be; we can fix this bug and completely kill ipoib_mcast_dev_down() at the same time. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
78bfe0b5 |
|
11-Jan-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: Take dev->xmit_lock around mc_list accesses dev->mc_list accesses must be protected by dev->xmit_lock. Found by Eli Cohen <eli@mellanox.co.il>. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
97460df3 |
|
10-Jan-2006 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Fix address handle refcounting for multicast groups Multiple ipoib_neigh structures on mcast->neigh_list may point to the same ah. This means that ipoib_mcast_free() can't just make a list of ah structs to free, since this might end up trying to add the same ah to the list more than once. Handle this in ipoib_multicast.c in the same way as it is handled in ipoib_main.c for struct ipoib_path. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
70b4c8cd |
|
10-Jan-2006 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Fix error path in ipoib_mcast_dev_flush() Don't leak memory on allocation failure for broadcast mcast group. Also, print a warning to match handling for other mcast groups. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
14c85021 |
|
26-Dec-2005 |
Arnaldo Carvalho de Melo <acme@mandriva.com> |
[INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h To help in reducing the number of include dependencies, several files were touched as they were getting needed headers indirectly for stuff they use. Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had linux/dccp.h include twice. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e86541e |
|
29-Nov-2005 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: don't zero members after we allocate with kzalloc ipoib_mcast_alloc() uses kzalloc(), so there's no need to zero out members of the mcast struct after it's allocated. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
de922487 |
|
29-Nov-2005 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: reinitialize mcast structs' completions for every query Make sure mcast->done is initialized to uncompleted value before we submit a new query, so that it's safe to wait on. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
8c608a32 |
|
07-Nov-2005 |
Roland Dreier <rolandd@cisco.com> |
[IPoIB] no need to set skb->dev right before freeing skb For cut-and-paste reasons, the IPoIB driver was setting skb->dev right before calling dev_kfree_skb_any(). Get rid of this. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
1732b0ef |
|
07-Nov-2005 |
Roland Dreier <rolandd@cisco.com> |
[IPoIB] add path record information in debugfs Add ibX_path files to debugfs that contain information about the IPoIB path cache. IPoIB ARP only gives GIDs, which the IPoIB driver must resolve to real IB paths through the ib_sa module. For debugging, when the ARP table looks OK but traffic isn't flowing, it's useful to be able to see if the resolution from GID to path worked. Also clean up the formatting of the existing _mcg debugfs files. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
8ae5a8a2 |
|
02-Nov-2005 |
Roland Dreier <rolandd@cisco.com> |
[IPoIB] don't compile debug code if debugging isn't enabled Don't build ipoib_mcast_iter_ functions if CONFIG_INFINIBAND_IPOIB_DEBUG is not enabled -- their only callers will not be built either. Also move the prototype for ipoib_open() to ipoib.h to fix a sparse warning. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
de6eb66b |
|
02-Nov-2005 |
Roland Dreier <rolandd@cisco.com> |
[IB] kzalloc() conversions Replace kmalloc()+memset(,0,) with kzalloc(), for a net savings of 35 source lines and about 500 bytes of text. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
8d2cae06 |
|
20-Sep-2005 |
Roland Dreier <rolandd@cisco.com> |
[PATCH] IPoIB: Don't flush workqueue from within workqueue ipoib_mcast_restart_task() is always called from within the single-threaded IPoIB workqueue, so flushing the workqueue from within the function can lead to a recursion overflow. But since we're running in a single-threaded workqueue, we're already synchronized against other items in the workqueue, so just get rid of the flush in ipoib_mcast_restart_task(). Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
ce5b65cc |
|
18-Sep-2005 |
Hal Rosenstock <halr@voltaire.com> |
[PATCH] IPoIB: Fix SA client retransmission strategy We got a little mixed up with what the backoff member holds in the IPoIB multicast group structure: sometimes it was used as a number of seconds, and sometimes it was used as a number of jiffies. Fix the code so that backoff is always in seconds. Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
97f52eb4 |
|
13-Aug-2005 |
Sean Hefty <sean.hefty@intel.com> |
[PATCH] IB: sparse endianness cleanup Fix sparse warnings. Use __be* where appropriate. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
2a1d9b7f |
|
11-Aug-2005 |
Roland Dreier <roland@eddore.topspincom.com> |
[PATCH] IB: Add copyright notices Make some lawyers happy and add copyright notices for people who forgot to include them when they actually touched the code. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
e6ded99c |
|
16-Apr-2005 |
Roland Dreier <roland@topspin.com> |
[PATCH] IPoIB: fix static rate calculation Correct and simplify calculation of static rate. We need to round up the quotient of (local_rate - path_rate) / path_rate. To round up we add (path_rate - 1) to the numerator, so the quotient simplifies to (local_rate - 1) / path_rate. No idea how I came up with the old formula. Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|