#
e353ea9c |
|
22-Feb-2024 |
Eric Dumazet <edumazet@google.com> |
rtnetlink: prepare nla_put_iflink() to run under RCU We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. This patch prepares dev_get_iflink() and nla_put_iflink() to run either with RTNL or RCU held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
50af5d12 |
|
21-Nov-2023 |
Jack Wang <jinpu.wang@ionos.com> |
RDMA/IPoIB: Add tx timeout work to recover queue stop situation As we sometime run into TX timeout from IPoIB, queue seems stopped and can't recover. Diff with Mellanox OFED show Mellanox driver has timeout work to recover in such case. Add TX timeout work/NAPI work to recover such case. Also increase the watchdog_timeo to 10 seconds, so more tolerant to error. Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20231121130316.126364-3-jinpu.wang@ionos.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
7a1c2abf |
|
23-Oct-2023 |
Yang Li <yang.lee@linux.alibaba.com> |
RDMA/core: Remove NULL check before dev_{put, hold} The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there is no need to check before using dev_{put, hold}, remove it to silence the warning: ./drivers/infiniband/core/nldev.c:375:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7047 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20231024003815.89742-1-yang.lee@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
ccae0447 |
|
04-Jan-2023 |
Mark Zhang <markzhang@nvidia.com> |
RDMA/cma: Refactor the inbound/outbound path records process flow Refactors based on comments [1] of the multiple path records support patchset: - Return failure if not able to set inbound/outbound PRs; - Simplify the flow when receiving the PRs from netlink channel: When a good PR response is received, unpack it and call the path_query callback directly. This saves two memory allocations; - Define RDMA_PRIMARY_PATH_MAX_REC_NUM in a proper place. [1] https://lore.kernel.org/linux-rdma/Yyxp9E9pJtUids2o@nvidia.com/ Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> #srp Link: https://lore.kernel.org/r/7610025d57342b8b6da0f19516c9612f9c3fdc37.1672819376.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
e632291a |
|
24-Jan-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
IB/IPoIB: Fix legacy IPoIB due to wrong number of queues The cited commit creates child PKEY interfaces over netlink will multiple tx and rx queues, but some devices doesn't support more than 1 tx and 1 rx queues. This causes to a crash when traffic is sent over the PKEY interface due to the parent having a single queue but the child having multiple queues. This patch fixes the number of queues to 1 for legacy IPoIB at the earliest possible point in time. BUG: kernel NULL pointer dereference, address: 000000000000036b PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 4 PID: 209665 Comm: python3 Not tainted 6.1.0_for_upstream_min_debug_2022_12_12_17_02 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmem_cache_alloc+0xcb/0x450 Code: ce 7e 49 8b 50 08 49 83 78 10 00 4d 8b 28 0f 84 cb 02 00 00 4d 85 ed 0f 84 c2 02 00 00 41 8b 44 24 28 48 8d 4a 01 49 8b 3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b8 41 8b RSP: 0018:ffff88822acbbab8 EFLAGS: 00010202 RAX: 0000000000000070 RBX: ffff8881c28e3e00 RCX: 00000000064f8dae RDX: 00000000064f8dad RSI: 0000000000000a20 RDI: 0000000000030d00 RBP: 0000000000000a20 R08: ffff8882f5d30d00 R09: ffff888104032f40 R10: ffff88810fade828 R11: 736f6d6570736575 R12: ffff88810081c000 R13: 00000000000002fb R14: ffffffff817fc865 R15: 0000000000000000 FS: 00007f9324ff9700(0000) GS:ffff8882f5d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000036b CR3: 00000001125af004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> skb_clone+0x55/0xd0 ip6_finish_output2+0x3fe/0x690 ip6_finish_output+0xfa/0x310 ip6_send_skb+0x1e/0x60 udp_v6_send_skb+0x1e5/0x420 udpv6_sendmsg+0xb3c/0xe60 ? ip_mc_finish_output+0x180/0x180 ? __switch_to_asm+0x3a/0x60 ? __switch_to_asm+0x34/0x60 sock_sendmsg+0x33/0x40 __sys_sendto+0x103/0x160 ? _copy_to_user+0x21/0x30 ? kvm_clock_get_cycles+0xd/0x10 ? ktime_get_ts64+0x49/0xe0 __x64_sys_sendto+0x25/0x30 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f9374f1ed14 Code: 42 41 f8 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 68 41 f8 ff 48 8b RSP: 002b:00007f9324ff7bd0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f9324ff7cc8 RCX: 00007f9374f1ed14 RDX: 00000000000002fb RSI: 00007f93000052f0 RDI: 0000000000000030 RBP: 0000000000000000 R08: 00007f9324ff7d40 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 R13: 000000012a05f200 R14: 0000000000000001 R15: 00007f9374d57bdc </TASK> Fixes: dbc94a0fb817 ("IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/95eb6b74c7cf49fa46281f9d056d685c9fa11d38.1674584576.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
5a374949 |
|
08-Sep-2022 |
Mark Zhang <markzhang@nvidia.com> |
RDMA/cma: Multiple path records support with netlink channel Support receiving inbound and outbound IB path records (along with GMP PathRecord) from user-space service through the RDMA netlink channel. The LIDs in these 3 PRs can be used in this way: 1. GMP PR: used as the standard local/remote LIDs; 2. DLID of outbound PR: Used as the "dlid" field for outbound traffic; 3. DLID of inbound PR: Used as the "dlid" field for outbound traffic in responder side. This is aimed to support adaptive routing. With current IB routing solution when a packet goes out it's assigned with a fixed DLID per target, meaning a fixed router will be used. The LIDs in inbound/outbound path records can be used to identify group of routers that allow communication with another subnet's entity. With them packets from an inter-subnet connection may travel through any router in the set to reach the target. As confirmed with Jason, when sending a netlink request, kernel uses LS_RESOLVE_PATH_USE_ALL so that the service knows kernel supports multiple PRs. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/2fa2b6c93c4c16c8915bac3cfc4f27be1d60519d.1662631201.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
2157f5ca |
|
05-Jul-2022 |
Jakub Kicinski <kuba@kernel.org> |
ipoib: switch to netif_napi_add_weight() We want to remove the weight argument from the basic netif_napi_add() API and just default to 64. Switch ipoib to the new API for explicitly specifying the weight. Link: https://lore.kernel.org/r/20220705230208.924408-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
e945c653 |
|
03-Apr-2022 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA: Split kernel-only global device caps from uverbs device caps Split out flags from ib_device::device_cap_flags that are only used internally to the kernel into kernel_cap_flags that is not part of the uapi. This limits the device_cap_flags to being the same bitmap that will be copied to userspace. This cleanly splits out the uverbs flags from the kernel flags to avoid confusion in the flags bitmap. Add some short comments describing which each of the kernel flags is connected to. Remove unused kernel flags. Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
10f7b9bc |
|
19-Oct-2021 |
Jakub Kicinski <kuba@kernel.org> |
RDMA/ipoib: Use dev_addr_mod() Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Link: https://lore.kernel.org/r/20211019182604.1441387-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
8869574a |
|
10-Oct-2021 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
RDMA: Remove redundant 'flush_workqueue()' calls 'destroy_workqueue()' already drains the queue before destroying it, so there is no need to flush it explicitly. Remove the redundant 'flush_workqueue()' calls. This was generated with coccinelle: @@ expression E; @@ - flush_workqueue(E); destroy_workqueue(E); Link: https://lore.kernel.org/r/ca7bac6e6c9c5cc8d04eec3944edb13de0e381a3.1633874776.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
a7605370 |
|
27-Jul-2021 |
Arnd Bergmann <arnd@arndb.de> |
dev_ioctl: split out ndo_eth_ioctl Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP. Separate these from the few drivers that use ndo_do_ioctl to implement SIOCBOND, SIOCBR and SIOCWANDEV commands. This is a purely cosmetic change intended to help readers find their way through the implementation. Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bf194997 |
|
16-Jun-2021 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Fix kernel-doc warnings about wrong comment Compilation with W=1 produces warnings similar to the below. drivers/infiniband/ulp/ipoib/ipoib_main.c:320: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst All such occurrences were found with the following one line git grep -A 1 "\/\*\*" drivers/infiniband/ Link: https://lore.kernel.org/r/e57d5f4ddd08b7a19934635b44d6d632841b9ba7.1623823612.git.leonro@nvidia.com Reviewed-by: Jack Wang <jinpu.wang@ionos.com> #rtrs Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
a5e27fb6 |
|
28-May-2021 |
Weihang Li <liweihang@huawei.com> |
RDMA/ipoib: Use refcount_t instead of atomic_t for reference counting The refcount_t API will WARN on underflow and overflow of a reference counter, and avoid use-after-free risks. Link: https://lore.kernel.org/r/1622194663-2383-13-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
1f8f60f3 |
|
26-May-2021 |
YueHaibing <yuehaibing@huawei.com> |
IB/ipoib: Use DEVICE_ATTR_*() macros Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR, which makes the code a bit shorter and easier to read. Link: https://lore.kernel.org/r/20210526132753.3092-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
3ccbd933 |
|
08-Apr-2021 |
Jack Wang <jinpu.wang@ionos.com> |
RDMA/ipoib: Print a message if only child interface is UP When "enhanced IPoIB" is enabled for CX-5 devices it requires the parent device to be UP, otherwise the child devices won't work. Thus add a debug message to give admin a hint when only the child interface is UP but parent interface is not. Link: https://lore.kernel.org/r/20210408093215.24023-1-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
042a00f9 |
|
29-Mar-2021 |
Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> |
IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdev The current rdma_netdev handling in ipoib hooks the tx_timeout handler, but prints out a totally useless message that prevents effective debugging especially when multiple transmit queues are being used. Add a tx_timeout rdma_netdev hook and implement the callback in the hfi1 to print additional information. The existing non-helpful message is avoided when the driver has presented a callback. Link: https://lore.kernel.org/r/1617026056-50483-3-git-send-email-dennis.dalessandro@cornelisnetworks.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
1fb7f897 |
|
01-Mar-2021 |
Mark Bloch <mbloch@nvidia.com> |
RDMA: Support more than 255 rdma ports Current code uses many different types when dealing with a port of a RDMA device: u8, unsigned int and u32. Switch to u32 to clean up the logic. This allows us to make (at least) the core view consistent and use the same type. Unfortunately not all places can be converted. Many uverbs functions expect port to be u8 so keep those places in order not to break UAPIs. HW/Spec defined values must also not be changed. With the switch to u32 we now can support devices with more than 255 ports. U32_MAX is reserved to make control logic a bit easier to deal with. As a device with U32_MAX ports probably isn't going to happen any time soon this seems like a non issue. When a device with more than 255 ports is created uverbs will report the RDMA device as having 255 ports as this is the max currently supported. The verbs interface is not changed yet because the IBTA spec limits the port size in too many places to be u8 and all applications that relies in verbs won't be able to cope with this change. At this stage, we are extending the interfaces that are using vendor channel solely Once the limitation is lifted mlx5 in switchdev mode will be able to have thousands of SFs created by the device. As the only instance of an RDMA device that reports more than 255 ports will be a representor device and it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other ULPs aren't effected by this change and their sysfs/interfaces that are exposes to userspace can remain unchanged. While here cleanup some alignment issues and remove unneeded sanity checks (mainly in rdmavt), Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
633d6102 |
|
28-Jan-2021 |
Christoph Lameter <cl@linux.com> |
RDMA/ipoib: Remove racy Subnet Manager sendonly join checks When a system receives a REREG event from the SM, then the SM information in the kernel is marked as invalid and a request is sent to the SM to update the information. The SM information is invalid in that time period. However, receiving a REREG also occurs simultaneously in user space applications that are now trying to rejoin the multicast groups. Some of those may be sendonly multicast groups which are then failing. If the SM information is invalid then ib_sa_sendonly_fullmem_support() returns false. That is wrong because it just means that we do not know yet if the potentially new SM supports sendonly joins. Sendonly join was introduced in 2015 and all the Subnet managers have supported it ever since. So there is no point in checking if a subnet manager supports it. Should an old opensm get a request for a sendonly join then the request will fail. The code that is removed here accomodated that situation and fell back to a full join. Falling back to a full join is problematic in itself. The reason to use the sendonly join was to reduce the traffic on the Infiniband fabric otherwise one could have just stayed with the regular join. So this patch may cause users of very old opensms to discover that lots of traffic needlessly crosses their IB fabrics. Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2101281845160.13303@www.lameter.com Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
1c7fd726 |
|
07-Oct-2020 |
Joe Perches <joe@perches.com> |
RDMA: Convert sysfs device * show functions to use sysfs_emit() Done with cocci script: @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - sprintf(buf, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - strcpy(buf, chr); + sysfs_emit(buf, chr); ...> } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - sprintf(buf, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... - len += scnprintf(buf + len, PAGE_SIZE - len, + len += sysfs_emit_at(buf, len, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { ... - strcpy(buf, chr); - return strlen(buf); + return sysfs_emit(buf, chr); } Link: https://lore.kernel.org/r/7f406fa8e3aa2552c022bec680f621e38d1fe414.1602122879.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
5ce2dced |
|
04-Oct-2020 |
Kamal Heib <kamalheib1@gmail.com> |
RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces Report the "ipoib pkey", "mode" and "umcast" netlink attributes for every IPoiB interface type, not just children created with 'ip link add'. After setting the rtnl_link_ops for the parent interface, implement the dellink() callback to block users from trying to remove it. Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Link: https://lore.kernel.org/r/20201004132948.26669-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
eff74233 |
|
25-Sep-2020 |
Taehee Yoo <ap420073@gmail.com> |
net: core: introduce struct netdev_nested_priv for nested interface infrastructure Functions related to nested interface infrastructure such as netdev_walk_all_{ upper | lower }_dev() pass both private functions and "data" pointer to handle their own things. At this point, the data pointer type is void *. In order to make it easier to expand common variables and functions, this new netdev_nested_priv structure is added. In the following patch, a new member variable will be added into this struct to fix the lockdep issue. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df561f66 |
|
23-Aug-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
#
87fb5c1c |
|
23-Jun-2020 |
Michael Guralnik <michaelgur@mellanox.com> |
RDMA/ipoib: Handle user-supplied address when creating child Use the address supplied by user when creating a child interface. Previously, the address requested by the user was ignored and overridden with parent's GID and the random QP number assigned to the child. Link: https://lore.kernel.org/r/20200623110105.1225750-3-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
65936bf2 |
|
25-Jun-2020 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/ipoib: Fix ABBA deadlock with ipoib_reap_ah() ipoib_mcast_carrier_on_task() insanely open codes a rtnl_lock() such that the only time flush_workqueue() can be called is if it also clears IPOIB_FLAG_OPER_UP. Thus the flush inside ipoib_flush_ah() will deadlock if it gets unlucky enough, and lockdep doesn't help us to find it early: CPU0 CPU1 CPU2 __ipoib_ib_dev_flush() down_read(vlan_rwsem) ipoib_vlan_add() rtnl_trylock() down_write(vlan_rwsem) ipoib_mcast_carrier_on_task() while (!rtnl_trylock()) msleep(20); ipoib_flush_ah() flush_workqueue(priv->wq) Clean up the ah_reaper related functions and lifecycle to make sense: - Start/Stop of the reaper should only be done in open/stop NDOs, not in any other places - cancel and flush of the reaper should only happen in the stop NDO. cancel is only functional when combined with IPOIB_STOP_REAPER. - Non-stop places were flushing the AH's just need to flush out dead AH's synchronously and ignore the background task completely. It is fully locked and harmless to leave running. Which ultimately fixes the ABBA deadlock by removing the unnecessary flush_workqueue() from the problematic place under the vlan_rwsem. Fixes: efc82eeeae4e ("IB/ipoib: No longer use flush as a parameter") Link: https://lore.kernel.org/r/20200625174219.290842-1-kamalheib1@gmail.com Reported-by: Kamal Heib <kheib@redhat.com> Tested-by: Kamal Heib <kheib@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
1acba6a8 |
|
27-May-2020 |
Valentine Fatiev <valentinef@mellanox.com> |
IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode When connected mode is set, and we have connected and datagram traffic in parallel, ipoib might crash with double free of datagram skb. The current mechanism assumes that the order in the completion queue is the same as the order of sent packets for all QPs. Order is kept only for specific QP, in case of mixed UD and CM traffic we have few QPs (one UD and few CM's) in parallel. The problem: ---------------------------------------------------------- Transmit queue: ----------------- UD skb pointer kept in queue itself, CM skb kept in spearate queue and uses transmit queue as a placeholder to count the number of total transmitted packets. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127 ------------------------------------------------------------ NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ........... ------------------------------------------------------------ ^ ^ tail head Completion queue (problematic scenario) - the order not the same as in the transmit queue: 1 2 3 4 5 6 7 8 9 ------------------------------------ ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5 ------------------------------------ 1. CM1 'wc' processing - skb freed in cm separate ring. - tx_tail of transmit queue increased although UD2 is not freed. Now driver assumes UD2 index is already freed and it could be used for new transmitted skb. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127 ------------------------------------------------------------ NL NL UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ........... ------------------------------------------------------------ ^ ^ ^ (Bad)tail head (Bad - Could be used for new SKB) In this case (due to heavy load) UD2 skb pointer could be replaced by new transmitted packet UD_NEW, as the driver assumes its free. At this point we will have to process two 'wc' with same index but we have only one pointer to free. During second attempt to free the same skb we will have NULL pointer exception. 2. UD2 'wc' processing - skb freed according the index we got from 'wc', but it was already overwritten by mistake. So actually the skb that was released is the skb of the new transmitted packet and not the original one. 3. UD_NEW 'wc' processing - attempt to free already freed skb. NUll pointer exception. The fix: ----------------------------------------------------------------------- The fix is to stop using the UD ring as a placeholder for CM packets, the cyclic ring variables tx_head and tx_tail will manage the UD tx_ring, a new cyclic variables global_tx_head and global_tx_tail are introduced for managing and counting the overall outstanding sent packets, then the send queue will be stopped and waken based on these variables only. Note that no locking is needed since global_tx_head is updated in the xmit flow and global_tx_tail is updated in the NAPI flow only. A previous attempt tried to use one variable to count the outstanding sent packets, but it did not work since xmit and NAPI flows can run at the same time and the counter will be updated wrongly. Thus, we use the same simple cyclic head and tail scheme that we have today for the UD tx_ring. Fixes: 2c104ea68350 ("IB/ipoib: Get rid of the tx_outstanding variable in all modes") Link: https://lore.kernel.org/r/20200527134705.480068-1-leon@kernel.org Signed-off-by: Valentine Fatiev <valentinef@mellanox.com> Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
8f149b68 |
|
10-May-2020 |
Gary Leshner <Gary.S.Leshner@intel.com> |
IB/ipoib: Add capability to switch between datagram and connected mode This is the prerequisite modification to the ipoib ulp to allow a rdma netdev to obtain the default ndo ops for init/uninit/open/close. This is accomplished by setting the netdev ops field within the callback function passed to the netdev allocation routine which in turn was passed into the rdma netdev allocation routine. This allows the rdma netdev to call back into the ulp to create the resources required for connected mode operation. Additionally as the ulp is not re-entrant, when switching modes, the number of real tx queues is set to 1 for the connected mode. For datagram mode the number of real tx queues is set to the actual number of tx queues specified at the netdev's allocation. For the internal ulp netdev the number of tx queues defaults to 1. It is up to the rdma netdev to specify the actual number it can support. When the driver does not support a rdma netdev for acceleration, (-ENOTSUPPORTED return code or the verbs function for allocation is NULL) the ipoib ulp functions are unaffected by using the internal netdev allocated by the ipoib ulp. Link: https://lore.kernel.org/r/20200511160706.173205.19086.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
b7e159eb |
|
10-May-2020 |
Gary Leshner <Gary.S.Leshner@intel.com> |
IB/{hfi1, ipoib, rdma}: Broadcast ping sent packets which exceeded mtu size When in connected mode ipoib sent broadcast pings which exceeded the mtu size for broadcast addresses. Add an mtu attribute to the rdma_netdev structure which ipoib sets to its mcast mtu size. The RDMA netdev uses this value to determine if the skb length is too long for the mtu specified and if it is, drops the packet and logs an error about the errant packet. Link: https://lore.kernel.org/r/20200511160655.173205.14546.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Gary Leshner <Gary.S.Leshner@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
6d72344c |
|
10-May-2020 |
Kaike Wan <kaike.wan@intel.com> |
IB/ipoib: Increase ipoib Datagram mode MTU's upper limit Currently the ipoib UD mtu is restricted to 4K bytes. Remove this limitation so that the IPOIB module can potentially use an MTU (in UD mode) that is bounded by the MTU of the underlying device. A field is added to the ib_port_attr structure to indicate the maximum physical MTU the underlying device supports. Link: https://lore.kernel.org/r/20200511160618.173205.23053.stgit@awfm-01.aw.intel.com Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
11a0ae4c |
|
21-Apr-2020 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA: Allow ib_client's to fail when add() is called When a client is added it isn't allowed to fail, but all the client's have various failure paths within their add routines. This creates the very fringe condition where the client was added, failed during add and didn't set the client_data. The core code will then still call other client_data centric ops like remove(), rename(), get_nl_info(), and get_net_dev_by_params() with NULL client_data - which is confusing and unexpected. If the add() callback fails, then do not call any more client ops for the device, even remove. Remove all the now redundant checks for NULL client_data in ops callbacks. Update all the add() callbacks to return error codes appropriately. EOPNOTSUPP is used for cases where the ULP does not support the ib_device - eg because it only works with IB. Link: https://lore.kernel.org/r/20200421172440.387069-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
96870720 |
|
20-Feb-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA/ipoib: Don't set constant driver version There is no need to set driver version in in-tree kernel code. Link: https://lore.kernel.org/r/20200220071239.231800-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
0290bd29 |
|
10-Dec-2019 |
Michael S. Tsirkin <mst@redhat.com> |
netdev: pass the stuck queue to the timeout handler This allows incrementing the correct timeout statistic without any mess. Down the road, devices can learn to reset just the specific queue. The patch was generated with the following script: use strict; use warnings; our $^I = '.bak'; my @work = ( ["arch/m68k/emu/nfeth.c", "nfeth_tx_timeout"], ["arch/um/drivers/net_kern.c", "uml_net_tx_timeout"], ["arch/um/drivers/vector_kern.c", "vector_net_tx_timeout"], ["arch/xtensa/platforms/iss/network.c", "iss_net_tx_timeout"], ["drivers/char/pcmcia/synclink_cs.c", "hdlcdev_tx_timeout"], ["drivers/infiniband/ulp/ipoib/ipoib_main.c", "ipoib_timeout"], ["drivers/infiniband/ulp/ipoib/ipoib_main.c", "ipoib_timeout"], ["drivers/message/fusion/mptlan.c", "mpt_lan_tx_timeout"], ["drivers/misc/sgi-xp/xpnet.c", "xpnet_dev_tx_timeout"], ["drivers/net/appletalk/cops.c", "cops_timeout"], ["drivers/net/arcnet/arcdevice.h", "arcnet_timeout"], ["drivers/net/arcnet/arcnet.c", "arcnet_timeout"], ["drivers/net/arcnet/com20020.c", "arcnet_timeout"], ["drivers/net/ethernet/3com/3c509.c", "el3_tx_timeout"], ["drivers/net/ethernet/3com/3c515.c", "corkscrew_timeout"], ["drivers/net/ethernet/3com/3c574_cs.c", "el3_tx_timeout"], ["drivers/net/ethernet/3com/3c589_cs.c", "el3_tx_timeout"], ["drivers/net/ethernet/3com/3c59x.c", "vortex_tx_timeout"], ["drivers/net/ethernet/3com/3c59x.c", "vortex_tx_timeout"], ["drivers/net/ethernet/3com/typhoon.c", "typhoon_tx_timeout"], ["drivers/net/ethernet/8390/8390.h", "ei_tx_timeout"], ["drivers/net/ethernet/8390/8390.h", "eip_tx_timeout"], ["drivers/net/ethernet/8390/8390.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/8390p.c", "eip_tx_timeout"], ["drivers/net/ethernet/8390/ax88796.c", "ax_ei_tx_timeout"], ["drivers/net/ethernet/8390/axnet_cs.c", "axnet_tx_timeout"], ["drivers/net/ethernet/8390/etherh.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/hydra.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/mac8390.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/mcf8390.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/lib8390.c", "__ei_tx_timeout"], ["drivers/net/ethernet/8390/ne2k-pci.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/pcnet_cs.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/smc-ultra.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/wd.c", "ei_tx_timeout"], ["drivers/net/ethernet/8390/zorro8390.c", "__ei_tx_timeout"], ["drivers/net/ethernet/adaptec/starfire.c", "tx_timeout"], ["drivers/net/ethernet/agere/et131x.c", "et131x_tx_timeout"], ["drivers/net/ethernet/allwinner/sun4i-emac.c", "emac_timeout"], ["drivers/net/ethernet/alteon/acenic.c", "ace_watchdog"], ["drivers/net/ethernet/amazon/ena/ena_netdev.c", "ena_tx_timeout"], ["drivers/net/ethernet/amd/7990.h", "lance_tx_timeout"], ["drivers/net/ethernet/amd/7990.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/a2065.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/am79c961a.c", "am79c961_timeout"], ["drivers/net/ethernet/amd/amd8111e.c", "amd8111e_tx_timeout"], ["drivers/net/ethernet/amd/ariadne.c", "ariadne_tx_timeout"], ["drivers/net/ethernet/amd/atarilance.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/au1000_eth.c", "au1000_tx_timeout"], ["drivers/net/ethernet/amd/declance.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/lance.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/mvme147.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/ni65.c", "ni65_timeout"], ["drivers/net/ethernet/amd/nmclan_cs.c", "mace_tx_timeout"], ["drivers/net/ethernet/amd/pcnet32.c", "pcnet32_tx_timeout"], ["drivers/net/ethernet/amd/sunlance.c", "lance_tx_timeout"], ["drivers/net/ethernet/amd/xgbe/xgbe-drv.c", "xgbe_tx_timeout"], ["drivers/net/ethernet/apm/xgene-v2/main.c", "xge_timeout"], ["drivers/net/ethernet/apm/xgene/xgene_enet_main.c", "xgene_enet_timeout"], ["drivers/net/ethernet/apple/macmace.c", "mace_tx_timeout"], ["drivers/net/ethernet/atheros/ag71xx.c", "ag71xx_tx_timeout"], ["drivers/net/ethernet/atheros/alx/main.c", "alx_tx_timeout"], ["drivers/net/ethernet/atheros/atl1c/atl1c_main.c", "atl1c_tx_timeout"], ["drivers/net/ethernet/atheros/atl1e/atl1e_main.c", "atl1e_tx_timeout"], ["drivers/net/ethernet/atheros/atlx/atl.c", "atlx_tx_timeout"], ["drivers/net/ethernet/atheros/atlx/atl1.c", "atlx_tx_timeout"], ["drivers/net/ethernet/atheros/atlx/atl2.c", "atl2_tx_timeout"], ["drivers/net/ethernet/broadcom/b44.c", "b44_tx_timeout"], ["drivers/net/ethernet/broadcom/bcmsysport.c", "bcm_sysport_tx_timeout"], ["drivers/net/ethernet/broadcom/bnx2.c", "bnx2_tx_timeout"], ["drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h", "bnx2x_tx_timeout"], ["drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c", "bnx2x_tx_timeout"], ["drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c", "bnx2x_tx_timeout"], ["drivers/net/ethernet/broadcom/bnxt/bnxt.c", "bnxt_tx_timeout"], ["drivers/net/ethernet/broadcom/genet/bcmgenet.c", "bcmgenet_timeout"], ["drivers/net/ethernet/broadcom/sb1250-mac.c", "sbmac_tx_timeout"], ["drivers/net/ethernet/broadcom/tg3.c", "tg3_tx_timeout"], ["drivers/net/ethernet/calxeda/xgmac.c", "xgmac_tx_timeout"], ["drivers/net/ethernet/cavium/liquidio/lio_main.c", "liquidio_tx_timeout"], ["drivers/net/ethernet/cavium/liquidio/lio_vf_main.c", "liquidio_tx_timeout"], ["drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c", "lio_vf_rep_tx_timeout"], ["drivers/net/ethernet/cavium/thunder/nicvf_main.c", "nicvf_tx_timeout"], ["drivers/net/ethernet/cirrus/cs89x0.c", "net_timeout"], ["drivers/net/ethernet/cisco/enic/enic_main.c", "enic_tx_timeout"], ["drivers/net/ethernet/cisco/enic/enic_main.c", "enic_tx_timeout"], ["drivers/net/ethernet/cortina/gemini.c", "gmac_tx_timeout"], ["drivers/net/ethernet/davicom/dm9000.c", "dm9000_timeout"], ["drivers/net/ethernet/dec/tulip/de2104x.c", "de_tx_timeout"], ["drivers/net/ethernet/dec/tulip/tulip_core.c", "tulip_tx_timeout"], ["drivers/net/ethernet/dec/tulip/winbond-840.c", "tx_timeout"], ["drivers/net/ethernet/dlink/dl2k.c", "rio_tx_timeout"], ["drivers/net/ethernet/dlink/sundance.c", "tx_timeout"], ["drivers/net/ethernet/emulex/benet/be_main.c", "be_tx_timeout"], ["drivers/net/ethernet/ethoc.c", "ethoc_tx_timeout"], ["drivers/net/ethernet/faraday/ftgmac100.c", "ftgmac100_tx_timeout"], ["drivers/net/ethernet/fealnx.c", "fealnx_tx_timeout"], ["drivers/net/ethernet/freescale/dpaa/dpaa_eth.c", "dpaa_tx_timeout"], ["drivers/net/ethernet/freescale/fec_main.c", "fec_timeout"], ["drivers/net/ethernet/freescale/fec_mpc52xx.c", "mpc52xx_fec_tx_timeout"], ["drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c", "fs_timeout"], ["drivers/net/ethernet/freescale/gianfar.c", "gfar_timeout"], ["drivers/net/ethernet/freescale/ucc_geth.c", "ucc_geth_timeout"], ["drivers/net/ethernet/fujitsu/fmvj18x_cs.c", "fjn_tx_timeout"], ["drivers/net/ethernet/google/gve/gve_main.c", "gve_tx_timeout"], ["drivers/net/ethernet/hisilicon/hip04_eth.c", "hip04_timeout"], ["drivers/net/ethernet/hisilicon/hix5hd2_gmac.c", "hix5hd2_net_timeout"], ["drivers/net/ethernet/hisilicon/hns/hns_enet.c", "hns_nic_net_timeout"], ["drivers/net/ethernet/hisilicon/hns3/hns3_enet.c", "hns3_nic_net_timeout"], ["drivers/net/ethernet/huawei/hinic/hinic_main.c", "hinic_tx_timeout"], ["drivers/net/ethernet/i825xx/82596.c", "i596_tx_timeout"], ["drivers/net/ethernet/i825xx/ether1.c", "ether1_timeout"], ["drivers/net/ethernet/i825xx/lib82596.c", "i596_tx_timeout"], ["drivers/net/ethernet/i825xx/sun3_82586.c", "sun3_82586_timeout"], ["drivers/net/ethernet/ibm/ehea/ehea_main.c", "ehea_tx_watchdog"], ["drivers/net/ethernet/ibm/emac/core.c", "emac_tx_timeout"], ["drivers/net/ethernet/ibm/emac/core.c", "emac_tx_timeout"], ["drivers/net/ethernet/ibm/ibmvnic.c", "ibmvnic_tx_timeout"], ["drivers/net/ethernet/intel/e100.c", "e100_tx_timeout"], ["drivers/net/ethernet/intel/e1000/e1000_main.c", "e1000_tx_timeout"], ["drivers/net/ethernet/intel/e1000e/netdev.c", "e1000_tx_timeout"], ["drivers/net/ethernet/intel/fm10k/fm10k_netdev.c", "fm10k_tx_timeout"], ["drivers/net/ethernet/intel/i40e/i40e_main.c", "i40e_tx_timeout"], ["drivers/net/ethernet/intel/iavf/iavf_main.c", "iavf_tx_timeout"], ["drivers/net/ethernet/intel/ice/ice_main.c", "ice_tx_timeout"], ["drivers/net/ethernet/intel/ice/ice_main.c", "ice_tx_timeout"], ["drivers/net/ethernet/intel/igb/igb_main.c", "igb_tx_timeout"], ["drivers/net/ethernet/intel/igbvf/netdev.c", "igbvf_tx_timeout"], ["drivers/net/ethernet/intel/ixgb/ixgb_main.c", "ixgb_tx_timeout"], ["drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c", "adapter->netdev->netdev_ops->ndo_tx_timeout(adapter->netdev);"], ["drivers/net/ethernet/intel/ixgbe/ixgbe_main.c", "ixgbe_tx_timeout"], ["drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c", "ixgbevf_tx_timeout"], ["drivers/net/ethernet/jme.c", "jme_tx_timeout"], ["drivers/net/ethernet/korina.c", "korina_tx_timeout"], ["drivers/net/ethernet/lantiq_etop.c", "ltq_etop_tx_timeout"], ["drivers/net/ethernet/marvell/mv643xx_eth.c", "mv643xx_eth_tx_timeout"], ["drivers/net/ethernet/marvell/pxa168_eth.c", "pxa168_eth_tx_timeout"], ["drivers/net/ethernet/marvell/skge.c", "skge_tx_timeout"], ["drivers/net/ethernet/marvell/sky2.c", "sky2_tx_timeout"], ["drivers/net/ethernet/marvell/sky2.c", "sky2_tx_timeout"], ["drivers/net/ethernet/mediatek/mtk_eth_soc.c", "mtk_tx_timeout"], ["drivers/net/ethernet/mellanox/mlx4/en_netdev.c", "mlx4_en_tx_timeout"], ["drivers/net/ethernet/mellanox/mlx4/en_netdev.c", "mlx4_en_tx_timeout"], ["drivers/net/ethernet/mellanox/mlx5/core/en_main.c", "mlx5e_tx_timeout"], ["drivers/net/ethernet/micrel/ks8842.c", "ks8842_tx_timeout"], ["drivers/net/ethernet/micrel/ksz884x.c", "netdev_tx_timeout"], ["drivers/net/ethernet/microchip/enc28j60.c", "enc28j60_tx_timeout"], ["drivers/net/ethernet/microchip/encx24j600.c", "encx24j600_tx_timeout"], ["drivers/net/ethernet/natsemi/sonic.h", "sonic_tx_timeout"], ["drivers/net/ethernet/natsemi/sonic.c", "sonic_tx_timeout"], ["drivers/net/ethernet/natsemi/jazzsonic.c", "sonic_tx_timeout"], ["drivers/net/ethernet/natsemi/macsonic.c", "sonic_tx_timeout"], ["drivers/net/ethernet/natsemi/natsemi.c", "ns_tx_timeout"], ["drivers/net/ethernet/natsemi/ns83820.c", "ns83820_tx_timeout"], ["drivers/net/ethernet/natsemi/xtsonic.c", "sonic_tx_timeout"], ["drivers/net/ethernet/neterion/s2io.h", "s2io_tx_watchdog"], ["drivers/net/ethernet/neterion/s2io.c", "s2io_tx_watchdog"], ["drivers/net/ethernet/neterion/vxge/vxge-main.c", "vxge_tx_watchdog"], ["drivers/net/ethernet/netronome/nfp/nfp_net_common.c", "nfp_net_tx_timeout"], ["drivers/net/ethernet/nvidia/forcedeth.c", "nv_tx_timeout"], ["drivers/net/ethernet/nvidia/forcedeth.c", "nv_tx_timeout"], ["drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c", "pch_gbe_tx_timeout"], ["drivers/net/ethernet/packetengines/hamachi.c", "hamachi_tx_timeout"], ["drivers/net/ethernet/packetengines/yellowfin.c", "yellowfin_tx_timeout"], ["drivers/net/ethernet/pensando/ionic/ionic_lif.c", "ionic_tx_timeout"], ["drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c", "netxen_tx_timeout"], ["drivers/net/ethernet/qlogic/qla3xxx.c", "ql3xxx_tx_timeout"], ["drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c", "qlcnic_tx_timeout"], ["drivers/net/ethernet/qualcomm/emac/emac.c", "emac_tx_timeout"], ["drivers/net/ethernet/qualcomm/qca_spi.c", "qcaspi_netdev_tx_timeout"], ["drivers/net/ethernet/qualcomm/qca_uart.c", "qcauart_netdev_tx_timeout"], ["drivers/net/ethernet/rdc/r6040.c", "r6040_tx_timeout"], ["drivers/net/ethernet/realtek/8139cp.c", "cp_tx_timeout"], ["drivers/net/ethernet/realtek/8139too.c", "rtl8139_tx_timeout"], ["drivers/net/ethernet/realtek/atp.c", "tx_timeout"], ["drivers/net/ethernet/realtek/r8169_main.c", "rtl8169_tx_timeout"], ["drivers/net/ethernet/renesas/ravb_main.c", "ravb_tx_timeout"], ["drivers/net/ethernet/renesas/sh_eth.c", "sh_eth_tx_timeout"], ["drivers/net/ethernet/renesas/sh_eth.c", "sh_eth_tx_timeout"], ["drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c", "sxgbe_tx_timeout"], ["drivers/net/ethernet/seeq/ether3.c", "ether3_timeout"], ["drivers/net/ethernet/seeq/sgiseeq.c", "timeout"], ["drivers/net/ethernet/sfc/efx.c", "efx_watchdog"], ["drivers/net/ethernet/sfc/falcon/efx.c", "ef4_watchdog"], ["drivers/net/ethernet/sgi/ioc3-eth.c", "ioc3_timeout"], ["drivers/net/ethernet/sgi/meth.c", "meth_tx_timeout"], ["drivers/net/ethernet/silan/sc92031.c", "sc92031_tx_timeout"], ["drivers/net/ethernet/sis/sis190.c", "sis190_tx_timeout"], ["drivers/net/ethernet/sis/sis900.c", "sis900_tx_timeout"], ["drivers/net/ethernet/smsc/epic100.c", "epic_tx_timeout"], ["drivers/net/ethernet/smsc/smc911x.c", "smc911x_timeout"], ["drivers/net/ethernet/smsc/smc9194.c", "smc_timeout"], ["drivers/net/ethernet/smsc/smc91c92_cs.c", "smc_tx_timeout"], ["drivers/net/ethernet/smsc/smc91x.c", "smc_timeout"], ["drivers/net/ethernet/stmicro/stmmac/stmmac_main.c", "stmmac_tx_timeout"], ["drivers/net/ethernet/sun/cassini.c", "cas_tx_timeout"], ["drivers/net/ethernet/sun/ldmvsw.c", "sunvnet_tx_timeout_common"], ["drivers/net/ethernet/sun/niu.c", "niu_tx_timeout"], ["drivers/net/ethernet/sun/sunbmac.c", "bigmac_tx_timeout"], ["drivers/net/ethernet/sun/sungem.c", "gem_tx_timeout"], ["drivers/net/ethernet/sun/sunhme.c", "happy_meal_tx_timeout"], ["drivers/net/ethernet/sun/sunqe.c", "qe_tx_timeout"], ["drivers/net/ethernet/sun/sunvnet.c", "sunvnet_tx_timeout_common"], ["drivers/net/ethernet/sun/sunvnet_common.c", "sunvnet_tx_timeout_common"], ["drivers/net/ethernet/sun/sunvnet_common.h", "sunvnet_tx_timeout_common"], ["drivers/net/ethernet/synopsys/dwc-xlgmac-net.c", "xlgmac_tx_timeout"], ["drivers/net/ethernet/ti/cpmac.c", "cpmac_tx_timeout"], ["drivers/net/ethernet/ti/cpsw.c", "cpsw_ndo_tx_timeout"], ["drivers/net/ethernet/ti/cpsw_priv.c", "cpsw_ndo_tx_timeout"], ["drivers/net/ethernet/ti/cpsw_priv.h", "cpsw_ndo_tx_timeout"], ["drivers/net/ethernet/ti/davinci_emac.c", "emac_dev_tx_timeout"], ["drivers/net/ethernet/ti/netcp_core.c", "netcp_ndo_tx_timeout"], ["drivers/net/ethernet/ti/tlan.c", "tlan_tx_timeout"], ["drivers/net/ethernet/toshiba/ps3_gelic_net.h", "gelic_net_tx_timeout"], ["drivers/net/ethernet/toshiba/ps3_gelic_net.c", "gelic_net_tx_timeout"], ["drivers/net/ethernet/toshiba/ps3_gelic_wireless.c", "gelic_net_tx_timeout"], ["drivers/net/ethernet/toshiba/spider_net.c", "spider_net_tx_timeout"], ["drivers/net/ethernet/toshiba/tc35815.c", "tc35815_tx_timeout"], ["drivers/net/ethernet/via/via-rhine.c", "rhine_tx_timeout"], ["drivers/net/ethernet/wiznet/w5100.c", "w5100_tx_timeout"], ["drivers/net/ethernet/wiznet/w5300.c", "w5300_tx_timeout"], ["drivers/net/ethernet/xilinx/xilinx_emaclite.c", "xemaclite_tx_timeout"], ["drivers/net/ethernet/xircom/xirc2ps_cs.c", "xirc_tx_timeout"], ["drivers/net/fjes/fjes_main.c", "fjes_tx_retry"], ["drivers/net/slip/slip.c", "sl_tx_timeout"], ["include/linux/usb/usbnet.h", "usbnet_tx_timeout"], ["drivers/net/usb/aqc111.c", "usbnet_tx_timeout"], ["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"], ["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"], ["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"], ["drivers/net/usb/ax88172a.c", "usbnet_tx_timeout"], ["drivers/net/usb/ax88179_178a.c", "usbnet_tx_timeout"], ["drivers/net/usb/catc.c", "catc_tx_timeout"], ["drivers/net/usb/cdc_mbim.c", "usbnet_tx_timeout"], ["drivers/net/usb/cdc_ncm.c", "usbnet_tx_timeout"], ["drivers/net/usb/dm9601.c", "usbnet_tx_timeout"], ["drivers/net/usb/hso.c", "hso_net_tx_timeout"], ["drivers/net/usb/int51x1.c", "usbnet_tx_timeout"], ["drivers/net/usb/ipheth.c", "ipheth_tx_timeout"], ["drivers/net/usb/kaweth.c", "kaweth_tx_timeout"], ["drivers/net/usb/lan78xx.c", "lan78xx_tx_timeout"], ["drivers/net/usb/mcs7830.c", "usbnet_tx_timeout"], ["drivers/net/usb/pegasus.c", "pegasus_tx_timeout"], ["drivers/net/usb/qmi_wwan.c", "usbnet_tx_timeout"], ["drivers/net/usb/r8152.c", "rtl8152_tx_timeout"], ["drivers/net/usb/rndis_host.c", "usbnet_tx_timeout"], ["drivers/net/usb/rtl8150.c", "rtl8150_tx_timeout"], ["drivers/net/usb/sierra_net.c", "usbnet_tx_timeout"], ["drivers/net/usb/smsc75xx.c", "usbnet_tx_timeout"], ["drivers/net/usb/smsc95xx.c", "usbnet_tx_timeout"], ["drivers/net/usb/sr9700.c", "usbnet_tx_timeout"], ["drivers/net/usb/sr9800.c", "usbnet_tx_timeout"], ["drivers/net/usb/usbnet.c", "usbnet_tx_timeout"], ["drivers/net/vmxnet3/vmxnet3_drv.c", "vmxnet3_tx_timeout"], ["drivers/net/wan/cosa.c", "cosa_net_timeout"], ["drivers/net/wan/farsync.c", "fst_tx_timeout"], ["drivers/net/wan/fsl_ucc_hdlc.c", "uhdlc_tx_timeout"], ["drivers/net/wan/lmc/lmc_main.c", "lmc_driver_timeout"], ["drivers/net/wan/x25_asy.c", "x25_asy_timeout"], ["drivers/net/wimax/i2400m/netdev.c", "i2400m_tx_timeout"], ["drivers/net/wireless/intel/ipw2x00/ipw2100.c", "ipw2100_tx_timeout"], ["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"], ["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"], ["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"], ["drivers/net/wireless/intersil/orinoco/main.c", "orinoco_tx_timeout"], ["drivers/net/wireless/intersil/orinoco/orinoco_usb.c", "orinoco_tx_timeout"], ["drivers/net/wireless/intersil/orinoco/orinoco.h", "orinoco_tx_timeout"], ["drivers/net/wireless/intersil/prism54/islpci_dev.c", "islpci_eth_tx_timeout"], ["drivers/net/wireless/intersil/prism54/islpci_eth.c", "islpci_eth_tx_timeout"], ["drivers/net/wireless/intersil/prism54/islpci_eth.h", "islpci_eth_tx_timeout"], ["drivers/net/wireless/marvell/mwifiex/main.c", "mwifiex_tx_timeout"], ["drivers/net/wireless/quantenna/qtnfmac/core.c", "qtnf_netdev_tx_timeout"], ["drivers/net/wireless/quantenna/qtnfmac/core.h", "qtnf_netdev_tx_timeout"], ["drivers/net/wireless/rndis_wlan.c", "usbnet_tx_timeout"], ["drivers/net/wireless/wl3501_cs.c", "wl3501_tx_timeout"], ["drivers/net/wireless/zydas/zd1201.c", "zd1201_tx_timeout"], ["drivers/s390/net/qeth_core.h", "qeth_tx_timeout"], ["drivers/s390/net/qeth_core_main.c", "qeth_tx_timeout"], ["drivers/s390/net/qeth_l2_main.c", "qeth_tx_timeout"], ["drivers/s390/net/qeth_l2_main.c", "qeth_tx_timeout"], ["drivers/s390/net/qeth_l3_main.c", "qeth_tx_timeout"], ["drivers/s390/net/qeth_l3_main.c", "qeth_tx_timeout"], ["drivers/staging/ks7010/ks_wlan_net.c", "ks_wlan_tx_timeout"], ["drivers/staging/qlge/qlge_main.c", "qlge_tx_timeout"], ["drivers/staging/rtl8192e/rtl8192e/rtl_core.c", "_rtl92e_tx_timeout"], ["drivers/staging/rtl8192u/r8192U_core.c", "tx_timeout"], ["drivers/staging/unisys/visornic/visornic_main.c", "visornic_xmit_timeout"], ["drivers/staging/wlan-ng/p80211netdev.c", "p80211knetdev_tx_timeout"], ["drivers/tty/n_gsm.c", "gsm_mux_net_tx_timeout"], ["drivers/tty/synclink.c", "hdlcdev_tx_timeout"], ["drivers/tty/synclink_gt.c", "hdlcdev_tx_timeout"], ["drivers/tty/synclinkmp.c", "hdlcdev_tx_timeout"], ["net/atm/lec.c", "lec_tx_timeout"], ["net/bluetooth/bnep/netdev.c", "bnep_net_timeout"] ); for my $p (@work) { my @pair = @$p; my $file = $pair[0]; my $func = $pair[1]; print STDERR $file , ": ", $func,"\n"; our @ARGV = ($file); while (<ARGV>) { if (m/($func\s*\(struct\s+net_device\s+\*[A-Za-z_]?[A-Za-z-0-9_]*)(\))/) { print STDERR "found $1+$2 in $file\n"; } if (s/($func\s*\(struct\s+net_device\s+\*[A-Za-z_]?[A-Za-z-0-9_]*)(\))/$1, unsigned int txqueue$2/) { print STDERR "$func found in $file\n"; } print; } } where the list of files and functions is simply from: git grep ndo_tx_timeout, with manual addition of headers in the rare cases where the function is from a header, then manually changing the few places which actually call ndo_tx_timeout. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Martin Habets <mhabets@solarflare.com> changes from v9: fixup a forward declaration changes from v9: more leftovers from v3 change changes from v8: fix up a missing direct call to timeout rebased on net-next changes from v7: fixup leftovers from v3 change changes from v6: fix typo in rtl driver changes from v5: add missing files (allow any net device argument name) changes from v4: add a missing driver header changes from v3: change queue # to unsigned Changes from v2: added headers Changes from v1: Fix errors found by kbuild: generalize the pattern a bit, to pick up a couple of instances missed by the previous version. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2446887e |
|
06-Nov-2019 |
Danit Goldberg <danitg@mellanox.com> |
IB/ipoib: Add ndo operation for getting VFs GUID attributes Add ndo operation to the network driver that enables configuring ipoib_get_vf_guid operation. The operation allows to get a VF port and node GUIDs. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
91b01061 |
|
30-Jun-2019 |
Valentine Fatiev <valentinef@mellanox.com> |
IB/ipoib: Add child to parent list only if device initialized Despite failure in ipoib_dev_init() we continue with initialization flow and creation of child device. It causes to the situation where this child device is added too early to parent device list. Change the logic, so in case of failure we properly return error from ipoib_dev_init() and add child only in success path. Fixes: eaeb39842508 ("IB/ipoib: Move init code to ndo_init") Signed-off-by: Valentine Fatiev <valentinef@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
64d701c6 |
|
17-Jun-2019 |
Denis Kirjanov <kda@linux-powerpc.org> |
ipoib: correcly show a VF hardware address in the case of IPoIB with SRIOV enabled hardware ip link show command incorrecly prints 0 instead of a VF hardware address. Before: 11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast state UP mode DEFAULT group default qlen 256 link/infiniband 80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, spoof checking off, link-state disable, trust off, query_rss off ... After: 11: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast state UP mode DEFAULT group default qlen 256 link/infiniband 80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 80:00:00:66:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a4:3e:7c brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off v1->v2: just copy an address without modifing ifla_vf_mac v2->v3: update the changelog Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b79656ed |
|
06-May-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA/ipoib: Allow user space differentiate between valid dev_port Systemd triggers the following warning during IPoIB device load: mlx5_core 0000:00:0c.0 ib0: "systemd-udevd" wants to know my dev_id. Should it look at dev_port instead? See Documentation/ABI/testing/sysfs-class-net for more info. This is caused due to user space attempt to differentiate old systems without dev_port and new systems with dev_port. In case dev_port will be zero, the systemd will try to read dev_id instead. There is no need to print a warning in such case, because it is valid situation and it is needed to ensure systemd compatibility with old kernels. Link: https://github.com/systemd/systemd/blob/master/src/udev/udev-builtin-net_id.c#L358 Cc: <stable@vger.kernel.org> # 4.19 Fixes: f6350da41dc7 ("IB/ipoib: Log sysfs 'dev_id' accesses from userspace") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
ea1075ed |
|
12-Feb-2019 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA: Add and use rdma_for_each_port We have many loops iterating over all of the end port numbers on a struct ib_device, simplify them with a for_each helper. Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
0dd9ce18 |
|
13-Feb-2019 |
Erez Alfasi <ereza@mellanox.com> |
IB/ipoib: Use __func__ instead of function's name Changed debug statements to use %s and __func__ instead of hard-coded function's name. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
ed0bc265 |
|
29-Jan-2019 |
Kamal Heib <kamalheib1@gmail.com> |
IB/ipoib: Make ipoib_intercept_dev_id_attr() static The function ipoib_intercept_dev_id_attr() is only used in ipoib_main.c Fixes: f6350da41dc7 ("IB/ipoib: Log sysfs 'dev_id' accesses from userspace") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
2e061c69 |
|
22-Jan-2019 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
infiniband: ipoib: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
3023a1e9 |
|
10-Dec-2018 |
Kamal Heib <kamalheib1@gmail.com> |
RDMA: Start use ib_device_ops Make all the required change to start use the ib_device_ops structure. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
567c5e13 |
|
06-Dec-2018 |
Petr Machata <petrm@mellanox.com> |
net: core: dev: Add extack argument to dev_change_flags() In order to pass extack together with NETDEV_PRE_UP notifications, it's necessary to route the extack to __dev_open() from diverse (possibly indirect) callers. One prominent API through which the notification is invoked is dev_change_flags(). Therefore extend dev_change_flags() with and extra extack argument and update all users. Most of the calls end up just encoding NULL, but several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available. Since the function declaration line is changed anyway, name the other function arguments to placate checkpatch. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5d6b0cb3 |
|
14-Aug-2018 |
Denis Drozdov <denisd@mellanox.com> |
RDMA/netdev: Fix netlink support in IPoIB IPoIB netlink support was broken by the below commit since integrating the rdma_netdev support relies on an allocation flow for netdevs that was controlled by the ipoib driver while netdev's rtnl_newlink implementation assumes that the netdev will be allocated by netlink. Such situation leads to crash in __ipoib_device_add, once trying to reuse netlink device. This patch fixes the kernel oops for both mlx4 and mlx5 devices triggered by the following command: Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f6a8a19b |
|
14-Aug-2018 |
Denis Drozdov <denisd@mellanox.com> |
RDMA/netdev: Hoist alloc_netdev_mqs out of the driver netdev has several interfaces that expect to call alloc_netdev_mqs from the core code, with the driver only providing the arguments. This is incompatible with the rdma_netdev interface that returns the netdev directly. Thus re-organize the API used by ipoib so that the verbs core code calls alloc_netdev_mqs for the driver. This is done by allowing the drivers to provide the allocation parameters via a 'get_params' callback and then initializing an allocated netdev as a second step. Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f6350da4 |
|
06-Sep-2018 |
Arseny Maslennikov <ar@cs.msu.ru> |
IB/ipoib: Log sysfs 'dev_id' accesses from userspace Some tools may currently be using only the deprecated attribute; let's print an elaborate and clear deprecation notice to kmsg. To do that, we have to replace the whole sysfs file, since we inherit the original one from netdev. Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
9b8b2a32 |
|
06-Sep-2018 |
Arseny Maslennikov <ar@cs.msu.ru> |
IB/ipoib: Use dev_port to expose network interface port numbers Some InfiniBand network devices have multiple ports on the same PCI function. This initializes the `dev_port' sysfs field of those network interfaces with their port number. Prior to this the kernel erroneously used the `dev_id' sysfs field of those network interfaces to convey the port number to userspace. The use of `dev_id' was considered correct until Linux 3.15, when another field, `dev_port', was defined for this particular purpose and `dev_id' was reserved for distinguishing stacked ifaces (e.g: VLANs) with the same hardware address as their parent device. Similar fixes to net/mlx4_en and many other drivers, which started exporting this information through `dev_id' before 3.15, were accepted into the kernel 4 years ago. See 76a066f2a2a0 (`net/mlx4_en: Expose port number through sysfs'). Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
142a9c28 |
|
28-Aug-2018 |
Muhammad Sammar <muhammads@mellanox.com> |
IB/ipoib: Ensure that MTU isn't less than minimum permitted It is illegal to change MTU to a value lower than the minimum MTU stated in ethernet spec. In addition to that we need to add 4 bytes for encapsulation header (IPOIB_ENCAP_LEN). Before "ifconfig ib0 mtu 0" command, succeeds while it obviously shouldn't. Signed-off-by: Muhammad Sammar <muhammads@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
13476d35 |
|
29-Jul-2018 |
Jason Gunthorpe <jgg@ziepe.ca> |
IB/ipoib: Maintain the child_intfs list from ndo_init/uninit This fixes a bug in the netlink path where the vlan_rwsem was not held around __ipoib_vlan_add causing the child_intfs to be manipulated unsafely. In the process this greatly simplifies the vlan_rwsem write side locking to only cover a single non-sleeping statement. This also further increases the safety of the removal ordering by holding the netdev of the parent while the child is active to ensure most bugs become either an oops on a NULL priv or a deadlock on the netdev refcount. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
25405d98 |
|
29-Jul-2018 |
Jason Gunthorpe <jgg@ziepe.ca> |
IB/ipoib: Do not remove child devices from within the ndo_uninit Switching to priv_destructor and needs_free_netdev created a subtle ordering problem in ipoib_remove_one. Now that unregister_netdev frees the netdev and priv we must ensure that the children are unregistered before trying to unregister the parent, or child unregister will use after free. The solution is to unregister the children, then parent, in the same batch all while holding the rtnl_lock. This closes all the races where a new child could have been added and ensures proper ordering. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
ee190ab7 |
|
29-Jul-2018 |
Jason Gunthorpe <jgg@ziepe.ca> |
IB/ipoib: Get rid of the sysfs_mutex This mutex was introduced to deal with the deadlock formed by calling unregister_netdev from within the sysfs callback of a netdev. Now that we have priv_destructor and needs_free_netdev we can switch to the more targeted solution of running the unregister from a work queue. This avoids the deadlock and gets rid of the mutex. The next patch in the series needs this mutex eliminated to create atomicity of unregisteration. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
9f49a5b5 |
|
29-Jul-2018 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/netdev: Use priv_destructor for netdev cleanup Now that the unregister_netdev flow for IPoIB no longer relies on external code we can now introduce the use of priv_destructor and needs_free_netdev. The rdma_netdev flow is switched to use the netdev common priv_destructor instead of the special free_rdma_netdev and the IPOIB ULP adjusted: - priv_destructor needs to switch to point to the ULP's destructor which will then call the rdma_ndev's in the right order - We need to be careful around the error unwind of register_netdev as it sometimes calls priv_destructor on failure - ULPs need to use ndo_init/uninit to ensure proper ordering of failures around register_netdev Switching to priv_destructor is a necessary pre-requisite to using the rtnl new_link mechanism. The VNIC user for rdma_netdev should also be revised, but that is left for another patch. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
eaeb3984 |
|
29-Jul-2018 |
Jason Gunthorpe <jgg@ziepe.ca> |
IB/ipoib: Move init code to ndo_init Now that we have a proper ndo_uninit, move code that naturally pairs with the ndo_uninit into ndo_init. This allows the netdev core to natually handle ordering. This fixes the situation where register_netdev can fail before calling ndo_init, in which case it wouldn't call ndo_uninit either. Also move a bunch of duplicated init code that is shared between child and parent for clarity. Now the child and parent register functions look very similar. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
7cbee87c |
|
29-Jul-2018 |
Jason Gunthorpe <jgg@ziepe.ca> |
IB/ipoib: Move all uninit code into ndo_uninit Currently uninit is sometimes done twice in error flows, and is sprinkled a bit all over the place. Improve the clarity of the design by moving all uninit only into ndo_uinit. Some duplication is removed: - Sometimes IPOIB_STOP_NEIGH_GC was done before unregister, but this duplicates the process in ipoib_neigh_hash_init - Flushing priv->wq was sometimes done before unregister, but that duplicates what has been done in ndo_uninit Uniniting the IB event queue must remain before unregister_netdev as it requires the RTNL lock to be dropped, this is moved to a helper to make that flow really clear and remove some duplication in error flows. If register_netdev fails (and ndo_init is NULL) then it almost always calls ndo_uninit, which lets us remove all the extra code from the error unwinds. The next patch in the series will close the 'almost always' hole by pairing a proper ndo_init with ndo_uninit. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
cda8daf1 |
|
29-Jul-2018 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task The neigh_reap_task is self restarting, but so long as we call cancel_delayed_work_sync() it will be guaranteed to not be running and never start again. Thus we don't need to have the racy IPOIB_STOP_NEIGH_GC bit, or the confusing mismatch of places sometimes calling flush_workqueue after the cancel. This fixes a situation where the GC work could have been left running in some rare situations. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
577e07ff |
|
29-Jul-2018 |
Jason Gunthorpe <jgg@ziepe.ca> |
IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN This essentially duplicates the netdev's reg_state, so just use that directly. The reg_state is updated under the rntl_lock, and all places using GOING_DOWN already acquire the rtnl_lock so checking is safe. Since the only place we use GOING_DOWN is for the parent device this does not fix any bugs, but it is a step to tidy up the unregister flow so that after later patches the flow is uniform and sane. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
99a7e2bf |
|
11-Jul-2018 |
Wei Yongjun <weiyongjun1@huawei.com> |
IB/ipoib: Fix error return code in ipoib_dev_init() Fix to return a negative error code from the ipoib_neigh_hash_init() error handling case instead of 0, as done elsewhere in this function. Fixes: 515ed4f3aab4 ("IB/IPoIB: Separate control and data related initializations") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
259e1914 |
|
09-Jul-2018 |
Jan Dakinevich <jan.dakinevich@virtuozzo.com> |
IPoIB: use kvzalloc to allocate an array of bucket pointers This table by default takes 32KiB which is 3rd memory order. Meanwhile, this memory is not aimed for DMA operation and could be safely allocated by vmalloc. Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com> Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
b1b63970 |
|
04-Jul-2018 |
Kamal Heib <kamalheib1@gmail.com> |
RDMA/ipoib: Fix use of sizeof() Make sure to use sizeof(...) instead of sizeof ... which is more preferred. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
0578cdad |
|
04-Jul-2018 |
Kamal Heib <kamalheib1@gmail.com> |
RDMA/ipoib: Prefer unsigned int to bare use of unsigned This commit replaces all the unsigned definitions in favour of 'unsigned int' which is preferred. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
39839107 |
|
19-Jun-2018 |
Parav Pandit <parav@mellanox.com> |
IB/cm: Replace members of sa_path_rec with 'struct sgid_attr *' While processing a path record entry in CM messages the associated GID attribute is now also supplied. Currently for RoCE a netdevice's net namespace pointer and ifindex are stored in path record entry. Both of these fields of the netdev can change anytime while processing CM messages. Additionally storing net namespace without holding reference will lead to use-after-free crash. Therefore it is removed. Netdevice information for RoCE is instead provided via referenced gid attribute in ib_cm requests. Such a design leads to a situation where the kernel can crash when the net pointer becomes invalid. However today it is always initialized to init_net, which cannot become invalid. In order to support processing packets in any arbitrary namespace of the received packet, it is necessary to avoid such conditions. This patch removes the dependency on the net pointer and ifindex; instead it will rely on SGID attribute which contains a pointer to netdev. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
aa74f487 |
|
19-Jun-2018 |
Parav Pandit <parav@mellanox.com> |
IB: Make init_ah_attr_grh_fields set sgid_attr Use the sgid and other information from the path record to figure out the sgid_attrs. Store the selected table entry in the sgid_attr for everything else to use. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
1dfce294 |
|
04-Jun-2018 |
Parav Pandit <parav@mellanox.com> |
IB: Replace ib_query_gid/ib_get_cached_gid with rdma_query_gid If the gid_attr argument is NULL then the functions behave identically to rdma_query_gid. ib_query_gid just calls ib_get_cached_gid, so everything can be consolidated to one function. Now that all callers either use rdma_query_gid() or ib_get_cached_gid(), ib_query_gid() API is removed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
fad953ce |
|
12-Jun-2018 |
Kees Cook <keescook@chromium.org> |
treewide: Use array_size() in vzalloc() The vzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vzalloc(a * b) with: vzalloc(array_size(a, b)) as well as handling cases of: vzalloc(a * b * c) with: vzalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vzalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(u8) * COUNT + COUNT , ...) | vzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vzalloc( - sizeof(char) * COUNT + COUNT , ...) | vzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vzalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vzalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vzalloc(C1 * C2 * C3, ...) | vzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vzalloc(C1 * C2, ...) | vzalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
6396bb22 |
|
12-Jun-2018 |
Kees Cook <keescook@chromium.org> |
treewide: kzalloc() -> kcalloc() The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
15517080 |
|
22-May-2018 |
Evgenii Smirnov <evgenii.smirnov@profitbricks.com> |
RDMA/ipoib: drop skb on path record lookup failure In unicast_arp_send function there is an inconsistency in error handling of path_rec_start call. If path_rec_start is called because of an absent ah field, skb will be dropped. But if it is called on a creation of a new path, or if the path is invalid, skb will be added to the tail of path queue. In case of a new path it will be dropped on path_free, but in case of invalid path it can stay in the queue forever. This patch unifies the behavior, dropping skb in all cases of path_rec_start failure. Signed-off-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
fa9391db |
|
18-May-2018 |
Doug Ledford <dledford@redhat.com> |
RDMA/ipoib: Update paths on CLIENT_REREG/SM_CHANGE events We do a light flush on CLIENT_REREG and SM_CHANGE events. This goes through and marks paths invalid. But we weren't always checking for this validity when we needed to, and so we could keep using a path marked invalid. What's more, once we establish a path with a valid ah, we put a pointer to the ah in the neigh struct directly, so even if we mark the path as invalid, as long as the neigh has a direct pointer to the ah, it keeps using the old, outdated ah. To fix this we do several things. 1) Put the valid flag in the ah instead of the path struct, so when we put the ah pointer directly in the neigh struct, we can easily check the validity of the ah on send events. 2) Check the neigh->ah and neigh->ah->valid elements in the needed places, and if we have an ah, but it's invalid, then invoke a refresh of the ah. 3) Fix the various places that check for path, but didn't check for path->valid (now path->ah && path->ah->valid). Reported-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com> Fixes: ee1e2c82c245 ("IPoIB: Refresh paths instead of flushing them on SM change events") Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
47a3968a |
|
24-Apr-2018 |
Luc Van Oostenryck <luc.vanoostenryck@gmail.com> |
IB/ipoib: fix ipoib_start_xmit()'s return type The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', which is a typedef for an enum type, but the implementation in this driver returns an 'int'. Fix this by returning 'netdev_tx_t' in this driver too. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
10293610 |
|
01-Feb-2018 |
Alex Estrin <alex.estrin@intel.com> |
IB/ipoib: Fix for potential no-carrier state On reboot SM can program port pkey table before ipoib registered its event handler, which could result in missing pkey event and leave root interface with initial pkey value from index 0. Since OPA port starts with invalid pkey in index 0, root interface will fail to initialize and stay down with no-carrier flag. For IB ipoib interface may end up with pkey different from value opensm put in pkey table idx 0, resulting in connectivity issues (different mcast groups, for example). Close the window by calling event handler after registration to make sure ipoib pkey is in sync with port pkey table. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
16ba3def |
|
31-Dec-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Fix race condition in neigh creation When using enhanced mode for IPoIB, two threads may execute xmit in parallel to two different TX queues while the target is the same. In this case, both of them will add the same neighbor to the path's neigh link list and we might see the following message: list_add double add: new=ffff88024767a348, prev=ffff88024767a348... WARNING: lib/list_debug.c:31__list_add_valid+0x4e/0x70 ipoib_start_xmit+0x477/0x680 [ib_ipoib] dev_hard_start_xmit+0xb9/0x3e0 sch_direct_xmit+0xf9/0x250 __qdisc_run+0x176/0x5d0 __dev_queue_xmit+0x1f5/0xb10 __dev_queue_xmit+0x55/0xb10 Analysis: Two SKB are scheduled to be transmitted from two cores. In ipoib_start_xmit, both gets NULL when calling ipoib_neigh_get. Two calls to neigh_add_path are made. One thread takes the spin-lock and calls ipoib_neigh_alloc which creates the neigh structure, then (after the __path_find) the neigh is added to the path's neigh link list. When the second thread enters the critical section it also calls ipoib_neigh_alloc but in this case it gets the already allocated ipoib_neigh structure, which is already linked to the path's neigh link list and adds it again to the list. Which beside of triggering the list, it creates a loop in the linked list. This loop leads to endless loop inside path_rec_completion. Solution: Check list_empty(&neigh->list) before adding to the list. Add a similar fix in "ipoib_multicast.c::ipoib_mcast_send" Fixes: b63b70d87741 ('IPoIB: Use a private hash table for path lookup in xmit path') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
4ad6a024 |
|
14-Nov-2017 |
Parav Pandit <parav@mellanox.com> |
IB/{core, cm, cma, ipoib}: Rename ib_init_ah_from_path to ib_init_ah_attr_from_path Since ib_init_ah_from_path initializes the address handle attribute, it is renamed to reflect so. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
98aebc55 |
|
14-Nov-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Update pathrec field if not valid record In case that the PathRecord is not valid (SM changed its network prefix) ipoib will continue issue PathQuery requests with the same parameters that are in its database, which are no longer valid anymore. Now the driver in that case will re-initialize the record from a valid place (the priv structure keeps the updated values), and a valid request will be issued. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
43900089 |
|
14-Nov-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Avoid memory leak if the SA returns a different DGID The ipoib path database is organized around DGIDs from the LLADDR, but the SA is free to return a different GID when asked for path. This causes a bug because the SA's modified DGID is copied into the database key, even though it is no longer the correct lookup key, causing a memory leak and other malfunctions. Ensure the database key does not change after the SA query completes. Demonstration of the bug is as follows ipoib wants to send to GID fe80:0000:0000:0000:0002:c903:00ef:5ee2, it creates new record in the DB with that gid as a key, and issues a new request to the SM. Now, the SM from some reason returns path-record with other SGID (for example, 2001:0000:0000:0000:0002:c903:00ef:5ee2 that contains the local subnet prefix) now ipoib will overwrite the current entry with the new one, and if new request to the original GID arrives ipoib will not find it in the DB (was overwritten) and will create new record that in its turn will also be overwritten by the response from the SM, and so on till the driver eats all the device memory. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
ac6dbf7f |
|
28-Nov-2017 |
Yuval Shaia <yuval.shaia@oracle.com> |
IB/ipoib: Warn when one port fails to initialize If one port fails to initialize an error message should indicate the reason and driver should continue serving the working port(s) and other HCA(s). Fixes: e4b2d06892c7 ("IB/ipoib: Remove device when one port fails to init"). Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
c55359a2 |
|
29-Nov-2017 |
Yuval Shaia <yuval.shaia@oracle.com> |
IB/ipoib: Replace printk with pr_warn pr_* is the preferred way to print messages, replace all printk(KERN_WARN, ...) with pr_warn. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
8966e28d |
|
18-Oct-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Use NAPI in UD/TX flows Instead of explicit call to poll_cq of the tx ring, use the NAPI mechanism to handle the completions of each packet that has been sent to the HW. The next major changes were taken: * The driver init completion function in the creation of the send CQ, that function triggers the napi scheduling. * The driver uses CQ for RX for both modes UD and CM, and CQ for TX for CM and UD. Cc: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
6d290d69 |
|
04-Oct-2017 |
Kees Cook <keescook@chromium.org> |
IB/ipoib: Convert timers to use timer_setup() In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Alex Vesker <valex@mellanox.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: Zhu Yanjun <yanjun.zhu@oracle.com> Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Yuval Shaia <yuval.shaia@oracle.com> Cc: linux-rdma@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e4b2d068 |
|
25-Sep-2017 |
Yuval Shaia <yuval.shaia@oracle.com> |
IB/ipoib: Remove device when one port fails to init Call ipoib_remove_one when one of the IPoIB ports fails to initialize in order not to leave the module in unstable state. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
931bc0d9 |
|
06-Sep-2017 |
Yuval Shaia <yuval.shaia@oracle.com> |
IB: Move PCI dependency from root KConfig to HW's KConfigs No reason to have dependency on PCI for the entire infiniband stack so move it to KConfig of only the drivers that actually using PCI. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
7c9d9662 |
|
24-Sep-2017 |
Alex Vesker <valex@mellanox.com> |
IB/ipoib: Fix inconsistency with free_netdev and free_rdma_netdev Call free_rdma_netdev instead of free_netdev each time we want to release a netdevice. This call is also relevant for future freeing of offloaded child interfaces. This patch also adds a missing call for free netdevice when releasing a parent interface that has child interfaces using ipoib_remove_one. Fixes: cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
31a82362 |
|
22-Aug-2017 |
Feras Daoud <ferasda@mellanox.com> |
IB/ipoib: Enable ioctl for to IPoIB rdma netdevs Adds support for ioctl callback in the RDMA netdevs to allow supporting functions not handled by the generic interface code. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Eitan Rabin <rabin@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
69956d83 |
|
17-Aug-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Sync between remove_one to sysfs calls that use rtnl_lock In order to avoid deadlock between sysfs functions (like create/delete child) and remove_one (both of them are using the sysfs lock and rtnl_lock) the driver will use a state mutex for sync. That will fix traces as the following: schedule+0x3e/0x90 kernfs_drain+0x75/0xf0 ? wait_woken+0x90/0x90 __kernfs_remove+0x12e/0x1c0 kernfs_remove+0x25/0x40 sysfs_remove_dir+0x57/0x90 kobject_del+0x22/0x60 device_del+0x195/0x230 pm_runtime_set_memalloc_noio+0xac/0xf0 netdev_unregister_kobject+0x71/0x80 rollback_registered_many+0x205/0x2f0 rollback_registered+0x31/0x40 unregister_netdevice_queue+0x58/0xb0 unregister_netdev+0x20/0x30 ipoib_remove_one+0xb7/0x240 [ib_ipoib] ib_unregister_device+0xbc/0x1b0 [ib_core] ib_unregister_mad_agent+0x29/0x30 [ib_core] mlx4_ib_remove+0x67/0x280 [mlx4_ib] INFO: task echo:24082 blocked for more than 120 seconds. Tainted: G OE 4.1.12-37.5.1.el6uek.x86_64 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Call Trace: schedule+0x3e/0x90 schedule_preempt_disabled+0xe/0x10 __mutex_lock_slowpath+0x95/0x110 ? _rcu_barrier+0x177/0x220 mutex_lock+0x23/0x40 rtnl_lock+0x15/0x20 netdev_run_todo+0x81/0x1f0 rtnl_unlock+0xe/0x10 ipoib_vlan_delete+0x12f/0x1c0 [ib_ipoib] delete_child+0x69/0x80 [ib_ipoib] dev_attr_store+0x20/0x30 sysfs_kf_write+0x41/0x50 Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
dcc9881e |
|
17-Aug-2017 |
Leon Romanovsky <leon@kernel.org> |
RDMA/(core, ulp): Convert register/unregister event handler to be void The functions ib_register_event_handler() and ib_unregister_event_handler() always returned success and they can't fail. Let's convert those functions to be void, remove redundant checks and cleanup tons of goto statements. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e1267b01 |
|
25-Jun-2017 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Remove useless MODULE_VERSION All modules in drivers/infiniband defined and used MODULE_VERSION, which was pointless because the kernel version describes their state more accurate then those arbitrary numbers. Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Sagi Grimbrg <sagi@grimberg.me> Reviewed-by: Sagi Grimberg <sagi@grimbeg.me> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Ram Amrani <Ram.Amrani@cavium.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
dc892e17 |
|
13-Jul-2017 |
Leon Romanovsky <leon@kernel.org> |
IB/ipoib: Clean error paths in add port Refactor error paths in ipoib_add_port() function. The code flow ensures that the function terminates on every error flow and it makes redundant all "else" cases. The functions are called during the flow are returning "result < 0", in case of error, so there is no need to check it explicitly. Fixes: 58e9cc90cda7 ("IB/IPoIB: Fix bad error flow in ipoib_add_port()") Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
eb54714d |
|
02-Jul-2017 |
Feras Daoud <ferasda@mellanox.com> |
IB/ipoib: Add get statistics support to SRIOV VF Add SRIOV VF support to get traffic statistics. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
d2e46fcc |
|
16-Jul-2017 |
Feras Daoud <ferasda@mellanox.com> |
IB/ipoib: Set IPOIB_NEIGH_TBL_FLUSH after flushed completion initialization Set IPOIB_NEIGH_TBL_FLUSH bit after initializing the neighbor flushed completion, otherwise the garbage collector may signal a completion while it is not initialized yet. Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
11f74b40 |
|
13-Jul-2017 |
Alex Vesker <valex@mellanox.com> |
IB/ipoib: Prevent setting negative values to max_nonsrq_conn_qp Don't allow negative values to max_nonsrq_conn_qp. There is no functional impact on a negative value but it is logicically incorrect. Fixes: 68e995a29572 ("IPoIB/cm: Add connected mode support for devices without SRQs") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
edf3f301 |
|
10-Jul-2017 |
Feras Daoud <ferasda@mellanox.com> |
IB/ipoib: Fix race between light events and interface restart A potential race between light_event and interface restart may attach multicast group to an already attached QP. Scenario: light_event flow goes through ipoib_mcast_dev_flush function, if a context switch occurs before calling ipoib_mcast_remove_list, then we may face a situation where the broadcast of the priv is null and the corresponding QP is not detached yet. If an "interface restart" runs during the previous context switch, the following scenario occurs: When the device goes up, ipoib_ib_dev_up function will be called, it will send a new registration request to the broadcast group and then attach the group to the QP that was not detached before. IPOIB_FLUSH_LIGHT INTERFACE RESTART __ipoib_ib_dev_flush | | | | | | | ipoib_mcast_dev_flush | Move mcast list and broadcast to remove_list | | | | | Context Switch--> | | ipoib_ib_dev_down | | | | | ipoib_ib_dev_up | | | | | ipoib_mcast_join_task | allocate new broadcast | | | | | Attach QP to multicast group | | | | | <--Context Switch ipoib_mcast_leave Detach QP from multicast group Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
5c8857b6 |
|
13-Jul-2017 |
Dan Carpenter <dan.carpenter@oracle.com> |
IB/IPoIB: Fix error code in ipoib_add_port() We accidentally don't see the error code on some of these error paths. It means we return ERR_PTR(0) which is NULL and it results in a NULL dereference in the caller. This bug dates to pre-git days. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
b6c871e5 |
|
12-Jun-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Let lower driver handle get_stats64 call The driver checks if the lower level driver supports get_stats, and if so calls it to get the updated statistics, otherwise takes from the current netdevice stats object. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ed7b521d |
|
23-May-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/IPoIB: Forward MTU change to driver below This patch checks if there is a driver below that needs to be updated on the new MTU and calls it accordingly. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
8e959601 |
|
30-Jun-2017 |
Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> |
IB/core, opa_vnic, hfi1, mlx5: Properly free rdma_netdev IPOIB is calling free_rdma_netdev even though alloc_rdma_netdev has returned -EOPNOTSUPP. Move free_rdma_netdev from ib_device structure to rdma_netdev structure thus ensuring proper cleanup function is called for the rdma net device. Fix the following trace: ib0: Failed to modify QP to ERROR state BUG: unable to handle kernel paging request at 0000000000001d20 IP: hfi1_vnic_free_rn+0x26/0xb0 [hfi1] Call Trace: ipoib_remove_one+0xbe/0x160 [ib_ipoib] ib_unregister_device+0xd0/0x170 [ib_core] rvt_unregister_device+0x29/0x90 [rdmavt] hfi1_unregister_ib_device+0x1a/0x100 [hfi1] remove_one+0x4b/0x220 [hfi1] pci_device_remove+0x39/0xc0 device_release_driver_internal+0x141/0x200 driver_detach+0x3f/0x80 bus_remove_driver+0x55/0xd0 driver_unregister+0x2c/0x50 pci_unregister_driver+0x2a/0xa0 hfi1_mod_cleanup+0x10/0xf65 [hfi1] SyS_delete_module+0x171/0x250 do_syscall_64+0x67/0x150 entry_SYSCALL64_slow_path+0x25/0x25 Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
d58ff351 |
|
16-Jun-2017 |
Johannes Berg <johannes.berg@intel.com> |
networking: make skb_push & __skb_push return void pointers It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) @@ expression SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - fn(SKB, LEN)[0] + *(u8 *)fn(SKB, LEN) Note that the last part there converts from push(...)[0] to the more idiomatic *(u8 *)push(...). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b53d4566 |
|
14-Jun-2017 |
Alex Vesker <valex@mellanox.com> |
IB/ipoib: Delete napi in device uninit default This patch mekas init_default and uninit_default symmetric with a call to delete napi. Additionally, the uninit_default gained delete napi call in case of init_default fails. Fixes: 515ed4f3aab4 ('IB/IPoIB: Separate control and data related initializations') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
022d038a |
|
14-Jun-2017 |
Alex Vesker <valex@mellanox.com> |
IB/ipoib: Limit call to free rdma_netdev for capable devices Limit calls to free_rdma_netdev() for capable devices only. Fixes: cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ab156afd |
|
14-Jun-2017 |
Alex Vesker <valex@mellanox.com> |
IB/ipoib: Fix memory leaks for child interfaces priv There is a need to free priv explicitly and not just to release the device, child priv is freed explicitly on remove flow and this patch also includes priv free on error flow in P_key creation and also in add_port. Fixes: cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
0a1a9726 |
|
14-May-2017 |
Leon Romanovsky <leon@kernel.org> |
RDMA/IPoIB: Limit the ipoib_dev_uninit_default scope ipoib_dev_uninit_default() call is used in ipoib_main.c file only and it generates the following warning from smatch tool: drivers/infiniband/ulp/ipoib/ipoib_main.c:1593:6: warning: symbol 'ipoib_dev_uninit_default' was not declared. Should it be static? so let's declare that function as static. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
4c33bd19 |
|
27-Apr-2017 |
Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> |
IB/SA: Add support to query OPA path records When the bit 26 of capmask2 field in OPA classport info query is set, SA will query for OPA path records instead of querying for IB path records. Note that OPA path records can only be queried by kernel ULPs. Userspace clients continue to query IB path records. Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
57520751 |
|
27-Apr-2017 |
Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> |
IB/SA: Add OPA path record type Add opa_sa_path_rec to sa_path_rec data structure. The 'type' field in sa_path_rec identifies the type of the path record. Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
9fdca4da |
|
27-Apr-2017 |
Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> |
IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields sa_path_rec now contains a union of sa_path_rec_ib and sa_path_rec_roce based on the type of the path record. Note that fields applicable to path record type ROCE v1 and ROCE v2 fall under sa_path_rec_roce. Accessor functions are added to these fields so the caller doesn't have to know the type. Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
c2f8fc4e |
|
27-Apr-2017 |
Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> |
IB/SA: Rename ib_sa_path_rec to sa_path_rec Rename ib_sa_path_rec to a more generic sa_path_rec. This is part of extending ib_sa to also support OPA path records in addition to the IB defined path records. Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
90898850 |
|
29-Apr-2017 |
Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> |
IB/core: Rename struct ib_ah_attr to rdma_ah_attr This patch simply renames struct ib_ah_attr to rdma_ah_attr as these fields specify attributes that are not necessarily specific to IB. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ee1c60b1 |
|
20-Mar-2017 |
Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> |
IB/SA: Modify SA to implicitly cache Class Port info SA will query and cache class port info as part of its initialization. SA will also invalidate and refresh the cache based on specific events. Callers such as IPoIB and CM can query the SA to get the classportinfo information. Apart from making the caller code much simpler, this change puts the onus on the SA to query and maintain classportinfo much like how it maitains the address handle to the SM. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
cd565b4b |
|
10-Apr-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/IPoIB: Support acceleration options callbacks IPoIB driver now uses the new set of callback functions. If the hardware provider supports the new ipoib_options implementation, the driver uses the callbacks in its data path flows, otherwise it uses the driver default implementation for all data flows in its code. The default implementation wasn't change and it is exactly as it was before introduction of acceleration support. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
c1048aff |
|
10-Apr-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/IPoIB: Use defined function for netdev_priv function Make ipoib_priv point to netdev_priv where the code calls netdev_priv. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
7ce1a3ee |
|
10-Apr-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/IPoIB: Separate control from HW operation on ipoib_open/stop ndo This patch is preparing the netdev part at the IPoIB driver to be able to use the ipoib_options. It deals with the two flows from the .ndo: ipoib_open and ipoib_stop. The code is rearranged as follows: * All operations which deal with the hardware resources, (for example change QP state, post-receive etc.) are performed in one place. * All operations that are control oriented (like restart multicast task, start the reap_ah etc.) are performed in separate place. The functions that deal with the hardware resources now located at __ipoib_ib_dev_open for the ipoib_open flow and __ipoib_ib_dev_stop for ipoib_stop. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
515ed4f3 |
|
10-Apr-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/IPoIB: Separate control and data related initializations This patch prepares init and teardown flows so we can call them through ipoib_options function pointers. It arranges that area of code as the following: * All operations which deal with the resource allocation/deletion are performed in one place. * All operations that are control oriented, meaning that they are not connected to a specific hardware, are performed in a separate place. The operations for allocation of hardware resources are now in the function ipoib_dev_init_default, and the deletion of all the resources are in ipoib_dev_uninit_default The only exception is the creation of the PD object, which is used both for resource allocation (create QP etc.) and for control flows like creating AH. It also does: * Move creation of rx_ring and tx_ring to be in the resources allocation area. * Move the function ipoib_ib_dev_open that does the open device to the control area instead of the dev_init which creates resources. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
771a5258 |
|
29-Mar-2017 |
Shamir Rabinovitch <shamir.rabinovitch@oracle.com> |
IB/IPoIB: ibX: failed to create mcg debug file When udev renames the netdev devices, ipoib debugfs entries does not get renamed. As a result, if subsequent probe of ipoib device reuse the name then creating a debugfs entry for the new device would fail. Also, moved ipoib_create_debug_files and ipoib_delete_debug_files as part of ipoib event handling in order to avoid any race condition between these. Fixes: 1732b0ef3b3a ([IPoIB] add path record information in debugfs) Cc: stable@vger.kernel.org # 2.6.15+ Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
2b084176 |
|
01-Feb-2017 |
Erez Shitrit <erezsh@mellanox.com> |
IB/IPoIB: Add destination address when re-queue packet When sending packet to destination that was not resolved yet via path query, the driver keeps the skb and tries to re-send it again when the path is resolved. But when re-sending via dev_queue_xmit the kernel doesn't call to dev_hard_header, so IPoIB needs to keep 20 bytes in the skb and to put the destination address inside them. In that way the dev_start_xmit will have the correct destination, and the driver won't take the destination from the skb->data, while nothing exists there, which causes to packet be be dropped. The test flow is: 1. Run the SM on remote node, 2. Restart the driver. 4. Ping some destination, 3. Observe that first ICMP request will be dropped. Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header") Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
5c37077f |
|
18-Jan-2017 |
Zhu Yanjun <yanjun.zhu@oracle.com> |
IB/ipoib: Remove the unnecessary error check The function ipoib_mcast_start_thread/ipoib_ib_dev_up always return zero. As such, in the function ipoib_open, err_stop will never be reached. So remove this err_stop and change the return type of the function ipoib_mcast_start_thread/ipoib_ib_dev_up to void. Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
f7534f45 |
|
05-Jan-2017 |
Zhu Yanjun <yanjun.zhu@oracle.com> |
IB/ipoib: Remove unnecessary returned value check In the function ipoib_set_dev_features, the returned value is always 0. As such, it is not necessary to check the returned value. This is not a bug. It is a trivial problem. Reviewed-by: Guanglei Li <guanglei.li@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
db97ed0a |
|
20-Jan-2017 |
Bart Van Assche <bvanassche@acm.org> |
IB/IPoIB: Switch from dma_device to dev.parent Prepare for removal of ib_device.dma_device. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
c586071d |
|
28-Dec-2016 |
Feras Daoud <ferasda@mellanox.com> |
IB/ipoib: Replace list_del of the neigh->list with list_del_init In order to resolve a situation where a few process delete the same list element in sequence and cause panic, list_del is replaced with list_del_init. In this case if the first process that calls list_del releases the lock before acquiring it again, other processes who can acquire the lock will call list_del_init. Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
d32b9a81 |
|
28-Dec-2016 |
Feras Daoud <ferasda@mellanox.com> |
IB/ipoib: Add detailed error message to dev_queue_xmit call Add a detailed return code to dev_queue_xmit function when calling to requeue packet via __skb_dequeue. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
0a0007f2 |
|
28-Dec-2016 |
Feras Daoud <ferasda@mellanox.com> |
IB/ipoib: Fix deadlock between rmmod and set_mode When calling set_mode from sys/fs, the call flow locks the sys/fs lock first and then tries to lock rtnl_lock (when calling ipoib_set_mod). On the other hand, the rmmod call flow takes the rtnl_lock first (when calling unregister_netdev) and then tries to take the sys/fs lock. Deadlock a->b, b->a. The problem starts when ipoib_set_mod frees it's rtnl_lck and tries to get it after that. set_mod: [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90 [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20 [<ffffffff814fed2b>] mutex_lock+0x2b/0x50 [<ffffffff81448675>] rtnl_lock+0x15/0x20 [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib] [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib] [<ffffffff8134b840>] dev_attr_store+0x20/0x30 [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170 [<ffffffff8117b068>] vfs_write+0xb8/0x1a0 [<ffffffff8117ba81>] sys_write+0x51/0x90 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b rmmod: [<ffffffff81279ffc>] ? put_dec+0x10c/0x110 [<ffffffff8127a2ee>] ? number+0x2ee/0x320 [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0 [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0 [<ffffffff8127b550>] ? string+0x40/0x100 [<ffffffff814fe323>] wait_for_common+0x123/0x180 [<ffffffff81060250>] ? default_wake_function+0x0/0x20 [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0 [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20 [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270 [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0 [<ffffffff81273f66>] kobject_del+0x16/0x40 [<ffffffff8134cd14>] device_del+0x184/0x1e0 [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0 [<ffffffff8143c05e>] rollback_registered+0xae/0x130 [<ffffffff8143c102>] unregister_netdevice+0x22/0x70 [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30 [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib] [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core] [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib] [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core] Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Cc: <stable@vger.kernel.org> # v3.6+ Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
80b5b35a |
|
28-Dec-2016 |
Feras Daoud <ferasda@mellanox.com> |
IB/ipoib: Set device connection mode only when needed When changing the connection mode, the ipoib_set_mode function did not check if the previous connection mode equals to the new one. This commit adds the required check and return 0 if the new mode equals to the previous one. Fixes: 839fcaba355a ("IPoIB: Connected mode experimental support") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
29da686d |
|
28-Dec-2016 |
Feras Daoud <ferasda@mellanox.com> |
IB/ipoib: When given an invalid UD MTU, give debug msg In datagram mode, the IB UD (Unreliable Datagram) transport is used so the MTU of the interface is equal to the IB L2 MTU minus the IPoIB encapsulation header. Any request to change the MTU value above the maximum range will change the MTU to the max allowed, but will not show any warning message. An ipoib_warn is issued in such cases, letting the user know that even though the value is legal, it can't be currently applied. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
74226649 |
|
03-Nov-2016 |
Leon Romanovsky <leon@kernel.org> |
IB/ipoib: Remove and fix debug prints after allocation failure The prints after [k|v][m|z|c]alloc() functions are not needed, because in case of failure, allocator will print their internal error prints anyway. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
b3e3893e |
|
20-Oct-2016 |
Jarod Wilson <jarod@redhat.com> |
net: use core MTU range checking in misc drivers firewire-net: - set min/max_mtu - remove fwnet_change_mtu nes: - set max_mtu - clean up nes_netdev_change_mtu xpnet: - set min/max_mtu - remove xpnet_dev_change_mtu hippi: - set min/max_mtu - remove hippi_change_mtu batman-adv: - set max_mtu - remove batadv_interface_change_mtu - initialization is a little async, not 100% certain that max_mtu is set in the optimal place, don't have hardware to test with rionet: - set min/max_mtu - remove rionet_change_mtu slip: - set min/max_mtu - streamline sl_change_mtu um/net_kern: - remove pointless ndo_change_mtu hsi/clients/ssi_protocol: - use core MTU range checking - remove now redundant ssip_pn_set_mtu ipoib: - set a default max MTU value - Note: ipoib's actual max MTU can vary, depending on if the device is in connected mode or not, so we'll just set the max_mtu value to the max possible, and let the ndo_change_mtu function continue to validate any new MTU change requests with checks for CM or not. Note that ipoib has no min_mtu set, and thus, the network core's mtu > 0 check is the only lower bounds here. mptlan: - use net core MTU range checking - remove now redundant mpt_lan_change_mtu fddi: - min_mtu = 21, max_mtu = 4470 - remove now redundant fddi_change_mtu (including export) fjes: - min_mtu = 8192, max_mtu = 65536 - The max_mtu value is actually one over IP_MAX_MTU here, but the idea is to get past the core net MTU range checks so fjes_change_mtu can validate a new MTU against what it supports (see fjes_support_mtu in fjes_hw.c) hsr: - min_mtu = 0 (calls ether_setup, max_mtu is 1500) f_phonet: - min_mtu = 6, max_mtu = 65541 u_ether: - min_mtu = 14, max_mtu = 15412 phonet/pep-gprs: - min_mtu = 576, max_mtu = 65530 - remove redundant gprs_set_mtu CC: netdev@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: Stefan Richter <stefanr@s5r6.in-berlin.de> CC: Faisal Latif <faisal.latif@intel.com> CC: linux-rdma@vger.kernel.org CC: Cliff Whickman <cpw@sgi.com> CC: Robin Holt <robinmholt@gmail.com> CC: Jes Sorensen <jes@trained-monkey.org> CC: Marek Lindner <mareklindner@neomailbox.ch> CC: Simon Wunderlich <sw@simonwunderlich.de> CC: Antonio Quartulli <a@unstable.cc> CC: Sathya Prakash <sathya.prakash@broadcom.com> CC: Chaitra P B <chaitra.basappa@broadcom.com> CC: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> CC: MPT-FusionLinux.pdl@broadcom.com CC: Sebastian Reichel <sre@kernel.org> CC: Felipe Balbi <balbi@kernel.org> CC: Arvid Brodin <arvid.brodin@alten.se> CC: Remi Denis-Courmont <courmisch@gmail.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e0e79c8e |
|
17-Oct-2016 |
David Ahern <dsa@cumulusnetworks.com> |
IB/ipoib: Flip to new dev walk API Convert ipoib_get_net_dev_match_addr to the new upper device walk API. This is just a code conversion; no functional change is intended. v2 - removed typecast of data Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fc791b63 |
|
13-Oct-2016 |
Paolo Abeni <pabeni@redhat.com> |
IB/ipoib: move back IB LL address into the hard header After the commit 9207f9d45b0a ("net: preserve IP control block during GSO segmentation"), the GSO CB and the IPoIB CB conflict. That destroy the IPoIB address information cached there, causing a severe performance regression, as better described here: http://marc.info/?l=linux-kernel&m=146787279825501&w=2 This change moves the data cached by the IPoIB driver from the skb control lock into the IPoIB hard header, as done before the commit 936d7de3d736 ("IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses"). In order to avoid GRO issue, on packet reception, the IPoIB driver stash into the skb a dummy pseudo header, so that the received packets have actually a hard header matching the declared length. To avoid changing the connected mode maximum mtu, the allocated head buffer size is increased by the pseudo header length. After this commit, IPoIB performances are back to pre-regression value. v2 -> v3: rebased v1 -> v2: avoid changing the max mtu, increasing the head buf size Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
855cda68 |
|
15-Aug-2016 |
Bhaktipriya Shridhar <bhaktipriya96@gmail.com> |
IB/ipoib: Remove deprecated create_singlethread_workqueue alloc_ordered_workqueue() replaces deprecated create_singlethread_workqueue(). The workqueue "ipoib_workqueue" that is used for all flush operations for the device. WQ_MEM_RECLAIM has been set since the flush operations may need to complete in order for other network functions to continue, and the memory reclaim operation might need the network functioning in order to make progress. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
546481c2 |
|
28-Aug-2016 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Fix memory corruption in ipoib cm mode connect flow When a new CM connection is being requested, ipoib driver copies data from the path pointer in the CM/tx object, the path object might be invalid at the point and memory corruption will happened later when now the CM driver will try using that data. The next scenario demonstrates it: neigh_add_path --> ipoib_cm_create_tx --> queue_work (pointer to path is in the cm/tx struct) #while the work is still in the queue, #the port goes down and causes the ipoib_flush_paths: ipoib_flush_paths --> path_free --> kfree(path) #at this point the work scheduled starts. ipoib_cm_tx_start --> copy from the (invalid)path pointer: (memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);) -> memory corruption. To fix that the driver now starts the CM/tx connection only if that specific path exists in the general paths database. This check is protected with the relevant locks, and uses the gid from the neigh member in the CM/tx object which is valid according to the ref count that was taken by the CM/tx. Fixes: 839fcaba35 ('IPoIB: Connected mode experimental support') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
5faba546 |
|
20-Jul-2016 |
Yuval Shaia <yuval.shaia@oracle.com> |
IB/ipoib: Report SG feature regardless of HW UD CSUM capability Decouple SG support from HW ability to do UD checksum. This coupling is for historical reasons and removed with 'commit ec5f06156423 ("net: Kill link between CSUM and SG features.")' During driver load it is assumed that device does not supports SG. The final decision is taken after creating UD QP based on device capability. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
61c78eea |
|
04-Jun-2016 |
Erez Shitrit <erezsh@mellanox.com> |
IB/IPoIB: Don't update neigh validity for unresolved entries ipoib_neigh_get unconditionally updates the "alive" variable member on any packet send. This prevents the neighbor garbage collection from cleaning out a dead neighbor entry if we are still queueing packets for it. If the queue for this neighbor is full, then don't update the alive timestamp. That way the neighbor can time out even if packets are still being queued as long as none of them are being sent. Fixes: b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
9b29953b |
|
04-Jun-2016 |
Mark Bloch <markb@mellanox.com> |
IB/IPoIB: Disable bottom half when dealing with device address Align locking usage when touching device address with rest of the kernel. Lock the bottom half when doing so using netif_addr_lock_bh. This also solves the following case as reported by lockdep: CPU0 CPU1 ---- ---- lock(_xmit_INFINIBAND); local_irq_disable(); lock(&(&mc->mca_lock)->rlock); lock(_xmit_INFINIBAND); <Interrupt> lock(&(&mc->mca_lock)->rlock); *** DEADLOCK *** Fixes: 492a7e67ff83 ("IB/IPoIB: Allow setting the device address") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
198b12f7 |
|
04-Jun-2016 |
Erez Shitrit <erezsh@mellanox.com> |
IB/IPoIB: Fix race between ipoib_remove_one to sysfs functions In ipoib_remove_one the driver holds the rtnl_lock and tries to do some operation like dev_change_flags or unregister_netdev, while sysfs callback like ipoib_vlan_delete holds sysfs mutex and tries to hold the rtnl_lock via rtnl_trylock() and restart_syscall() if the lock is not free, meanwhile ipoib_remove_one tries to get the sysfs lock in order to free its sysfs directory, and we will get a->b, b->a deadlock. Trace like the following: schedule+0x37/0x80 schedule_preempt_disabled+0xe/0x10 __mutex_lock_slowpath+0xb5/0x120 mutex_lock+0x23/0x40 rtnl_lock+0x15/0x20 netdev_run_todo+0x17c/0x320 rtnl_unlock+0xe/0x10 ipoib_vlan_delete+0x11b/0x1b0 [ib_ipoib] delete_child+0x54/0x80 [ib_ipoib] dev_attr_store+0x18/0x30 sysfs_kf_write+0x37/0x40 mutex_lock+0x16/0x40 SyS_write+0x55/0xc0 entry_SYSCALL_64_fastpath+0x16/0x75 And schedule+0x37/0x80 __kernfs_remove+0x1a8/0x260 ? wake_atomic_t_function+0x60/0x60 kernfs_remove+0x25/0x40 sysfs_remove_dir+0x50/0x80 kobject_del+0x18/0x50 device_del+0x19f/0x260 netdev_unregister_kobject+0x6a/0x80 rollback_registered_many+0x1fd/0x340 rollback_registered+0x3c/0x70 unregister_netdevice_queue+0x55/0xc0 unregister_netdev+0x20/0x30 ipoib_remove_one+0x114/0x1b0 [ib_ipoib] ib_unregister_client+0x4a/0x170 [ib_core] ? find_module_all+0x71/0xa0 ipoib_cleanup_module+0x10/0x94 [ib_ipoib] SyS_delete_module+0x1b5/0x210 entry_SYSCALL_64_fastpath+0x16/0x75 The fix is by checking the flag IPOIB_FLAG_INTF_ON_DESTROY in order to get out from the sysfs function. Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
492a7e67 |
|
18-May-2016 |
Mark Bloch <markb@mellanox.com> |
IB/IPoIB: Allow setting the device address In IB networks, and specifically in IPoIB/rdmacm traffic, the device address of an IPoIB interface is used as a means to exchange information between nodes needed for communication. Currently an IPoIB interface will always be created with a device address based on its node GUID without a way to change that. This change adds the ability to set the device address of an IPoIB interface by value. We use the set mac address ndo to do that. The flow should be broken down to two: 1) The GID value is already in the GID table, in this case the interface will be able to set carrier up. 2) The GID value is not yet in the GID table, in this case the interface won't try to join the multicast group and will wait (listen on GID_CHANGE event) until the GID is inserted. In order to track those changes, we add a new flag: * IPOIB_FLAG_DEV_ADDR_SET. When set, it means the dev_addr is a based on a value in the gid table. this bit will be cleared upon a dev_addr change triggered by the user and set after validation. Per IB spec the port GUID can't change if the module is loaded. port GUID is the basis for GID at index 0 which is the basis for the default device address of a ipoib interface. The issue is that there are devices that don't follow the spec, they change the port GUID while HCA is powered on, so in order not to break userspace applications. We need to check if the user wanted to control the device address and we assume that if he sets the device address back to be based on GID index 0, he no longer wishs to control it. In order to track this, we add an additional flag: * IPOIB_FLAG_DEV_ADDR_CTRL When setting the device address, there is no validation of the upper twelve bytes of the device address (flags, qpn, subnet prefix) as those bytes are not under the control of the user. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
3b561130 |
|
25-May-2016 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Support SendOnlyFullMember MCG for SendOnly join Check (via an SA query) if the SM supports the new option for SendOnly multicast joins. If the SM supports that option it will use the new join state to create such multicast group. If SendOnlyFullMember is supported, we wouldn't use faked FullMember state join for SendOnly MCG, use the correct state if supported. This check is performed at every invocation of mcast_restart task, to be sure that the driver stays in sync with the current state of the SM. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
4d0e9657 |
|
03-May-2016 |
Florian Westphal <fw@strlen.de> |
drivers: replace dev->trans_start accesses with dev_trans_start a trans_start struct member exists twice: - in struct net_device (legacy) - in struct netdev_queue Instead of open-coding dev->trans_start usage to obtain the current trans_start value, use dev_trans_start() instead. This is not exactly the same, as dev_trans_start also considers the trans_start values of the netdev queues owned by the device and provides the most recent one. For legacy devices this doesn't matter as dev_trans_start can cope with netdev trans_start values of 0 (they are ignored). This is a prerequisite to eventual removal of dev->trans_start. Cc: linux-rdma@vger.kernel.org Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9c3c5f8e |
|
11-Mar-2016 |
Eli Cohen <eli@mellanox.com> |
IB/ipoib: Add ndo operations for configuring VFs Add ndo operations to the network driver that enables configuring the following operations: ipoib_set_vf_link_state - configure the VF link policy ipoib_get_vf_config - get link state configuration ipoib_set_vf_guid - set a VF port or node GUID ipoib_get_vf_stats - get statistics of a VF Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
50be28de |
|
07-Jan-2016 |
Erez Shitrit <erezsh@mellanox.com> |
IB/IPoIB: Fix kernel panic on multicast flow ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the parameter mcast->dev. That mcast is a temporary (used as an iterator) variable that may be uninitialized. There is no need to send the variable dev to the function, as each mcast has its dev as a member in the mcast struct. This causes the next panic: RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib] RSP: 0018: EFLAGS: 00010246 RAX: f0201 RBX: 24e00 RCX: 00000 .... .... Stack: Call Trace: ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib] ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib] process_one_work+0x164/0x470 worker_thread+0x11d/0x420 ... Fixes: 5a0e81f6f483 ('IB/IPoIB: factor out common multicast list removal code') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reported-by: Doron Tsur <doront@mellanox.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
432c55ff |
|
21-Dec-2015 |
Christoph Lameter <cl@linux.com> |
IB/IPoIB: Move multicast specific code out of ipoib_main.c Code cleanup to move multicast specific code that checks for a sendonly join to ipoib_multicast.c. This allows the removal of the export of __ipoib_mcast_find(). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
5a0e81f6 |
|
21-Dec-2015 |
Christoph Lameter <cl@linux.com> |
IB/IPoIB: factor out common multicast list removal code Code cleanup to remove multicast specific code from ipoib_main.c The removal of a list of multicast groups occurs in three places. Create a new function ipoib_mcast_remove_list(). Use this new function in ipoib_main.c too. That in turn allows the dropping of two functions that were exported from ipoib_multicast.c for expiration of mc groups. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
4a061b28 |
|
18-Dec-2015 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/ulps: Avoid calling ib_query_device Instead, use the cached copy of the attributes present on the device. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
55ee3ab2 |
|
15-Oct-2015 |
Matan Barak <matanb@mellanox.com> |
IB/core: Add netdev and gid attributes paramteres to cache Adding an ability to query the IB cache by a netdev and get the attributes of a GID. These parameters are necessary in order to successfully resolve the required GID (when the netdevice is known) and get the Ethernet L2 attributes from a GID. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
0b5c9279 |
|
11-Oct-2015 |
Christoph Lameter <cl@linux.com> |
IB/ipoib: For sendonly join free the multicast group on leave When we leave the multicast group on expiration of a neighbor we do not free the mcast structure. This results in a memory leak that causes ib_dealloc_pd to fail and print a WARN_ON message and backtrace. Fixes: bd99b2e05c4d (IB/ipoib: Expire sendonly multicast joins) Signed-off-by: Christoph Lameter <cl@linux.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e622f2f4 |
|
08-Oct-2015 |
Christoph Hellwig <hch@lst.de> |
IB: split struct ib_send_wr This patch split up struct ib_send_wr so that all non-trivial verbs use their own structure which embedds struct ib_send_wr. This dramaticly shrinks the size of a WR for most common operations: sizeof(struct ib_send_wr) (old): 96 sizeof(struct ib_send_wr): 48 sizeof(struct ib_rdma_wr): 64 sizeof(struct ib_atomic_wr): 96 sizeof(struct ib_ud_wr): 88 sizeof(struct ib_fast_reg_wr): 88 sizeof(struct ib_bind_mw_wr): 96 sizeof(struct ib_sig_handover_wr): 80 And with Sagi's pending MR rework the fast registration WR will also be down to a reasonable size: sizeof(struct ib_fastreg_wr): 64 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt] Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc] Tested-by: Haggai Eran <haggaie@mellanox.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Steve Wise <swise@opengridcomputing.com>
|
#
bd99b2e0 |
|
23-Sep-2015 |
Christoph Lameter <cl@linux.com> |
IB/ipoib: Expire sendonly multicast joins On neighbor expiration, check to see if the neighbor was actually a sendonly multicast join, and if so, leave the multicast group as we expire the neighbor. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ddde896e |
|
30-Jul-2015 |
Guy Shapiro <guysh@mellanox.com> |
IB/ipoib: Return IPoIB devices matching connection parameters Implement the get_net_device_by_port_pkey_ip callback that returns network device to ib_core according to connection parameters. Check the ipoib device and iterate over all child devices to look for a match. For each IPoIB device we iterate through all upper devices when searching for a matching IP, in order to support bonding. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
7c1eb45a |
|
30-Jul-2015 |
Haggai Eran <haggaie@mellanox.com> |
IB/core: lock client data with lists_rwsem An ib_client callback that is called with the lists_rwsem locked only for read is protected from changes to the IB client lists, but not from ib_unregister_device() freeing its client data. This is because ib_unregister_device() will remove the device from the device list with lists_rwsem locked for write, but perform the rest of the cleanup, including the call to remove() without that lock. Mark client data that is undergoing de-registration with a new going_down flag in the client data context. Lock the client data list with lists_rwsem for write in addition to using the spinlock, so that functions calling the callback would be able to lock only lists_rwsem for read and let callbacks sleep. Since ib_unregister_client() now marks the client data context, no need for remove() to search the context again, so pass the client data directly to remove() callbacks. Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
edcd2a74 |
|
07-Jun-2015 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Set MTU to max allowed by mode when mode changes When switching between modes (datagram / connected) change the MTU accordingly. datagram mode up to 4K, connected mode up to (64K - 0x10). Signed-off-by: ELi Cohen <eli@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
c4268778 |
|
12-Jul-2015 |
Yuval Shaia <yuval.shaia@oracle.com> |
IB/ipoib: Scatter-Gather support in connected mode By default, IPoIB-CM driver uses 64k MTU. Larger MTU gives better performance. This MTU plus overhead puts the memory allocation for IP based packets at 32 4k pages (order 5), which have to be contiguous. When the system memory under pressure, it was observed that allocating 128k contiguous physical memory is difficult and causes serious errors (such as system becomes unusable). This enhancement resolve the issue by removing the physically contiguous memory requirement using Scatter/Gather feature that exists in Linux stack. With this fix Scatter-Gather will be supported also in connected mode. This change reverts some of the change made in commit e112373fd6aa ("IPoIB/cm: Reduce connected mode TX object size"). The ability to use SG in IPoIB CM is possible because the coupling between NETIF_F_SG and NETIF_F_CSUM was removed in commit ec5f06156423 ("net: Kill link between CSUM and SG features.") Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Christian Marie <christian@ponies.io> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
58e9cc90 |
|
01-Jul-2015 |
Amir Vadai <amirv@mellanox.com> |
IB/IPoIB: Fix bad error flow in ipoib_add_port() Error values of ib_query_port() and ib_query_device() weren't propagated correctly. Because of that, ipoib_add_port() could return NULL value, which escaped the IS_ERR() check in ipoib_add_one() and we crashed. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
4139032b |
|
29-Jun-2015 |
Hal Rosenstock <hal@dev.mellanox.co.il> |
IB: Add rdma_cap_ib_switch helper and use where appropriate Persuant to Liran's comments on node_type on linux-rdma mailing list: In an effort to reform the RDMA core and ULPs to minimize use of node_type in struct ib_device, an additional bit is added to struct ib_device for is_switch (IB switch). This is needed to be initialized by any IB switch device driver. This is a NEW requirement on such device drivers which are all "out of tree". In addition, an ib_switch helper was added to ib_verbs.h based on the is_switch device bit rather than node_type (although those should be consistent). The RDMA core (MAD, SMI, agent, sa_query, multicast, sysfs) as well as (IPoIB and SRP) ULPs are updated where appropriate to use this new helper. In some cases, the helper is now used under the covers of using rdma_[start end]_port rather than the open coding previously used. Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
52374967 |
|
26-May-2015 |
Bart Van Assche <bvanassche@acm.org> |
IB/ipoib: Fix RCU annotations in ipoib_neigh_hash_init() Avoid that sparse complains about ipoib_neigh_hash_init(). This patch does not change any functionality. See also patch "IPoIB: Fix memory leak in the neigh table deletion flow" (commit ID 66172c09938b). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
8e37ab68 |
|
05-May-2015 |
Michael Wang <yun.wang@profitbricks.com> |
IB/Verbs: Reform IB-ulp ipoib Use raw management helpers to reform IB-ulp ipoib. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
2c153959 |
|
16-Apr-2015 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Fix ndo_get_iflink Currently, iflink of the parent interface was always accessed, even when interface didn't have a parent and hence we crashed there. Handle the interface types properly: for a child interface, return the ifindex of the parent, for parent interface, return its ifindex. For child devices, make sure to set the parent pointer prior to invoking register_netdevice(), this allows the new ndo to be called by the stack immediately after the child device is registered. Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink') Reported-by: Honggang Li <honli@redhat.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Honggang Li <honli@redhat.com> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>+ Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1e85b806 |
|
02-Apr-2015 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Save only IPOIB_MAX_PATH_REC_QUEUE skb's Whenever there is no path->ah to the destination, keep only defined number of skb's. Otherwise there are cases that the driver can keep infinite list of skb's. For example, when one device want to send unicast arp to the destination, and from some reason the SM doesn't respond, the driver currently keeps all the skb's. If that unicast arp traffic stopped, all these skb's are kept by the path object till the interface is down. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
efc82eee |
|
21-Feb-2015 |
Doug Ledford <dledford@redhat.com> |
IB/ipoib: No longer use flush as a parameter Various places in the IPoIB code had a deadlock related to flushing the ipoib workqueue. Now that we have per device workqueues and a specific flush workqueue, there is no longer a deadlock issue with flushing the device specific workqueues and we can do so unilaterally. Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
0b39578b |
|
21-Feb-2015 |
Doug Ledford <dledford@redhat.com> |
IB/ipoib: Use dedicated workqueues per interface During my recent work on the rtnl lock deadlock in the IPoIB driver, I saw that even once I fixed the apparent races for a single device, as soon as that device had any children, new races popped up. It turns out that this is because no matter how well we protect against races on a single device, the fact that all devices use the same workqueue, and flush_workqueue() flushes *everything* from that workqueue means that we would also have to prevent all races between different devices (for instance, ipoib_mcast_restart_task on interface ib0 can race with ipoib_mcast_flush_dev on interface ib0.8002, resulting in a deadlock on the rtnl_lock). There are several possible solutions to this problem: Make carrier_on_task and mcast_restart_task try to take the rtnl for some set period of time and if they fail, then bail. This runs the real risk of dropping work on the floor, which can end up being its own separate kind of deadlock. Set some global flag in the driver that says some device is in the middle of going down, letting all tasks know to bail. Again, this can drop work on the floor. Or the method this patch attempts to use, which is when we bring an interface up, create a workqueue specifically for that interface, so that when we take it back down, we are flushing only those tasks associated with our interface. In addition, keep the global workqueue, but now limit it to only flush tasks. In this way, the flush tasks can always flush the device specific work queues without having deadlock issues. Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
be7aa663 |
|
21-Feb-2015 |
Doug Ledford <dledford@redhat.com> |
IB/ipoib: change init sequence ordering In preparation for using per device work queues, we need to move the start of the neighbor thread task to after ipoib_ib_dev_init and move the destruction of the neighbor task to before ipoib_ib_dev_cleanup. Otherwise we will end up freeing our workqueue with work possibly still on it. Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
5aa7add8 |
|
02-Apr-2015 |
Nicolas Dichtel <nicolas.dichtel@6wind.com> |
infiniband/ipoib: implement ndo_get_iflink Don't use dev->iflink anymore. CC: Roland Dreier <roland@kernel.org> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bb759634 |
|
30-Jan-2015 |
Roland Dreier <roland@purestorage.com> |
Revert "IPoIB: change init sequence ordering" This reverts commit 3bcce487fda8161597c20ed303d510e41ad7770e. The series of IPoIB bug fixes that went into 3.19-rc1 introduce regressions, and after trying to sort things out, we decided to revert to 3.18's IPoIB driver and get things right for 3.20. Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
0306eda2 |
|
30-Jan-2015 |
Roland Dreier <roland@purestorage.com> |
Revert "IPoIB: Use dedicated workqueues per interface" This reverts commit 5141861cd5e17eac9676ff49c5abfafbea2b0e98. The series of IPoIB bug fixes that went into 3.19-rc1 introduce regressions, and after trying to sort things out, we decided to revert to 3.18's IPoIB driver and get things right for 3.20. Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
a84544a4 |
|
30-Jan-2015 |
Roland Dreier <roland@purestorage.com> |
Revert "IPoIB: No longer use flush as a parameter" This reverts commit ce347ab90eaabc69a6146d41943981d51e7a9b82. The series of IPoIB bug fixes that went into 3.19-rc1 introduce regressions, and after trying to sort things out, we decided to revert to 3.18's IPoIB driver and get things right for 3.20. Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
ce347ab9 |
|
10-Dec-2014 |
Doug Ledford <dledford@redhat.com> |
IPoIB: No longer use flush as a parameter Various places in the IPoIB code had a deadlock related to flushing the ipoib workqueue. Now that we have per device workqueues and a specific flush workqueue, there is no longer a deadlock issue with flushing the device specific workqueues and we can do so unilaterally. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
5141861c |
|
10-Dec-2014 |
Doug Ledford <dledford@redhat.com> |
IPoIB: Use dedicated workqueues per interface During my recent work on the rtnl lock deadlock in the IPoIB driver, I saw that even once I fixed the apparent races for a single device, as soon as that device had any children, new races popped up. It turns out that this is because no matter how well we protect against races on a single device, the fact that all devices use the same workqueue, and flush_workqueue() flushes *everything* from that workqueue, we can have one device in the middle of a down and holding the rtnl lock and another totally unrelated device needing to run mcast_restart_task, which wants the rtnl lock and will loop trying to take it unless is sees its own FLAG_ADMIN_UP flag go away. Because the unrelated interface will never see its own ADMIN_UP flag drop, the interface going down will deadlock trying to flush the queue. There are several possible solutions to this problem: Make carrier_on_task and mcast_restart_task try to take the rtnl for some set period of time and if they fail, then bail. This runs the real risk of dropping work on the floor, which can end up being its own separate kind of deadlock. Set some global flag in the driver that says some device is in the middle of going down, letting all tasks know to bail. Again, this can drop work on the floor. I suppose if our own ADMIN_UP flag doesn't go away, then maybe after a few tries on the rtnl lock we can queue our own task back up as a delayed work and return and avoid dropping work on the floor that way. But I'm not 100% convinced that we won't cause other problems. Or the method this patch attempts to use, which is when we bring an interface up, create a workqueue specifically for that interface, so that when we take it back down, we are flushing only those tasks associated with our interface. In addition, keep the global workqueue, but now limit it to only flush tasks. In this way, the flush tasks can always flush the device specific work queues without having deadlock issues. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
3bcce487 |
|
10-Dec-2014 |
Doug Ledford <dledford@redhat.com> |
IPoIB: change init sequence ordering In preparation for using per device work queues, we need to move the start of the neighbor thread task to after ipoib_ib_dev_init and move the destruction of the neighbor task to before ipoib_ib_dev_cleanup. Otherwise we will end up freeing our workqueue with work possibly still on it. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
02875878 |
|
05-Oct-2014 |
Eric Dumazet <edumazet@google.com> |
net: better IFF_XMIT_DST_RELEASE support Testing xmit_more support with netperf and connected UDP sockets, I found strange dst refcount false sharing. Current handling of IFF_XMIT_DST_RELEASE is not optimal. Dropping dst in validate_xmit_skb() is certainly too late in case packet was queued by cpu X but dequeued by cpu Y The logical point to take care of drop/force is in __dev_queue_xmit() before even taking qdisc lock. As Julian Anastasov pointed out, need for skb_dst() might come from some packet schedulers or classifiers. This patch adds new helper to cleanly express needs of various drivers or qdiscs/classifiers. Drivers that need skb_dst() in their ndo_start_xmit() should call following helper in their setup instead of the prior : dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; -> netif_keep_dst(dev); Instead of using a single bit, we use two bits, one being eventually rebuilt in bonding/team drivers. The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being rebuilt in bonding/team. Eventually, we could add something smarter later. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b49fe362 |
|
18-Sep-2014 |
Eric Dumazet <edumazet@google.com> |
ipoib: validate struct ipoib_cb size To catch future errors sooner. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dd57c930 |
|
06-Aug-2014 |
Alex Estrin <alex.estrin@intel.com> |
IB/ipoib: Avoid multicast join attempts with invalid P_key Currently, the parent interface keeps sending broadcast group join requests even if p_key index 0 is invalid, which is possible/common in virtualized environments where a VF has been probed to VM but the actual P_key configuration has not yet been assigned by the management software. This creates unnecessary noise on the fabric and in the kernel logs: ib0: multicast join failed for ff12:401b:8000:0000:0000:0000:ffff:ffff, status -22 The original code run the multicast task regardless of the actual P_key value, which can be avoided. The fix is to re-init resources and bring interface up only if P_key index 0 is valid either when starting up or on PKEY_CHANGE event. Fixes: c290414169 ("IPoIB: Fix pkey change flow for virtualization environments") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
4eae3748 |
|
07-Jul-2014 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Avoid flushing the workqueue from worker context The error flow of ipoib_ib_dev_open() invokes ipoib_ib_dev_stop() with workqueue flushing enabled, which deadlocks if the open procedure itself was called by a worker thread. Fix this by adding a flush enabled flag to ipoib_ib_dev_open() and set it accordingly from the locations where such a call is made. The call trace was the following: [<ffffffff81095bc4>] ? flush_workqueue+0x54/0x80 [<ffffffffa056c657>] ? ipoib_ib_dev_stop+0x447/0x650 [ib_ipoib] [<ffffffffa056cc34>] ? ipoib_ib_dev_open+0x284/0x430 [ib_ipoib] [<ffffffffa05674a8>] ? ipoib_open+0x78/0x1d0 [ib_ipoib] [<ffffffffa05697b8>] ? ipoib_pkey_open+0x38/0x40 [ib_ipoib] [<ffffffffa056cf3c>] ? __ipoib_ib_dev_flush+0x15c/0x2c0 [ib_ipoib] [<ffffffffa056ce56>] ? __ipoib_ib_dev_flush+0x76/0x2c0 [ib_ipoib] [<ffffffffa056d0a0>] ? ipoib_ib_dev_flush_heavy+0x0/0x20 [ib_ipoib] [<ffffffffa056d0ba>] ? ipoib_ib_dev_flush_heavy+0x1a/0x20 [ib_ipoib] [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40 Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
db84f880 |
|
07-Jul-2014 |
Erez Shitrit <erezsh@mellanox.com> |
IB/ipoib: Use P_Key change event instead of P_Key polling mechanism The current code use a dedicated polling logic to determine when the P_Key assigned to the ipoib device is present in HCA port table and act accordingly. Move to use the code which acts upon getting PKEY_CHANGE event to handle this task and remove the P_Key polling logic/thread as they add extra complexity which isn't needed. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
c835a677 |
|
14-Jul-2014 |
Tom Gundersen <teg@jklm.no> |
net: set name_assign_type in alloc_netdev() Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert all users to pass NET_NAME_UNKNOWN. Coccinelle patch: @@ expression sizeof_priv, name, setup, txqs, rxqs, count; @@ ( -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs) +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs) | -alloc_netdev_mq(sizeof_priv, name, setup, count) +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count) | -alloc_netdev(sizeof_priv, name, setup) +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup) ) v9: move comments here from the wrong commit Signed-off-by: Tom Gundersen <teg@jklm.no> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
437708c4 |
|
17-Jan-2014 |
Michal Schmidt <mschmidt@redhat.com> |
IPoIB: Report operstate consistently when brought up without a link After booting without a working link, "ip link" shows: 5: mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast state DOWN qlen 256 ... 7: mlx4_ib1.8003@mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast state DOWN qlen 256 ... Then after connecting and disconnecting the link, which should result in exactly the same state as before, it shows: 5: mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast state DOWN qlen 256 ... 7: mlx4_ib1.8003@mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast state LOWERLAYERDOWN qlen 256 ... Notice the (now correct) LOWERLAYERDOWN operstate shown for the mlx4_ib1.8003 interface. Ideally the identical state would be shown right after boot. The problem is related to the calling of netif_carrier_off() in network drivers. For a long time it was known that doing netif_carrier_off() before registering the netdevice would result in the interface's operstate being shown as UNKNOWN if the device was brought up without a working link. This problem was fixed in commit 8f4cccbbd92 ('net: Set device operstate at registration time'), but still there remains the minor inconsistency demonstrated above. This patch fixes it by moving ipoib's call to netif_carrier_off() into the .ndo_open method, which is where network drivers ordinarily do it. With the patch when doing the same test as above, the operstate of mlx4_ib1.8003 is shown as LOWERLAYERDOWN right after boot. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
7f1a3867 |
|
21-Aug-2013 |
Michal Schmidt <mschmidt@redhat.com> |
IPoIB: lower NAPI weight Since commit 82dc3c63c692 ("net: introduce NAPI_POLL_WEIGHT") netif_napi_add() produces an error message if a NAPI poll weight greater than 64 is requested. Use the standard NAPI weight. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
f47944cc |
|
16-Oct-2013 |
Erez Shitrit <erezsh@mellanox.com> |
IPoIB: Fix deadlock between dev_change_flags() and __ipoib_dev_flush() When ipoib interface is going down it takes all of its children with it, under mutex. For each child, dev_change_flags() is called. That function calls ipoib_stop() via the ndo, and causes flush of the workqueue. Sometimes in the workqueue an __ipoib_dev_flush work() is waiting and when invoked tries to get the same mutex, which leads to a deadlock, as seen below. The solution is to switch to rw-sem instead of mutex. The deadlock: [11028.165303] [<ffffffff812b0977>] ? vgacon_scroll+0x107/0x2e0 [11028.171844] [<ffffffff814eaac5>] schedule_timeout+0x215/0x2e0 [11028.178465] [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80 [11028.185962] [<ffffffff814ea743>] wait_for_common+0x123/0x180 [11028.192491] [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20 [11028.199504] [<ffffffff814ea85d>] wait_for_completion+0x1d/0x20 [11028.206224] [<ffffffff8108b4f1>] flush_cpu_workqueue+0x61/0x90 [11028.212948] [<ffffffff8108b5a0>] ? wq_barrier_func+0x0/0x20 [11028.219375] [<ffffffff8108bfc4>] flush_workqueue+0x54/0x80 [11028.225712] [<ffffffffa05a0576>] ipoib_mcast_stop_thread+0x66/0x90 [ib_ipoib] [11028.233988] [<ffffffffa059ccea>] ipoib_ib_dev_down+0x6a/0x100 [ib_ipoib] [11028.241678] [<ffffffffa059849a>] ipoib_stop+0x8a/0x140 [ib_ipoib] [11028.248692] [<ffffffff8142adf1>] dev_close+0x71/0xc0 [11028.254447] [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0 [11028.261062] [<ffffffffa059851b>] ipoib_stop+0x10b/0x140 [ib_ipoib] [11028.268172] [<ffffffff8142adf1>] dev_close+0x71/0xc0 [11028.273922] [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0 [11028.280452] [<ffffffff8148f20b>] devinet_ioctl+0x5eb/0x6a0 [11028.286786] [<ffffffff814903b8>] inet_ioctl+0x88/0xa0 [11028.292633] [<ffffffff8141591a>] sock_ioctl+0x7a/0x280 [11028.298576] [<ffffffff81189012>] vfs_ioctl+0x22/0xa0 [11028.304326] [<ffffffff81140540>] ? unmap_region+0x110/0x130 [11028.310756] [<ffffffff811891b4>] do_vfs_ioctl+0x84/0x580 [11028.316897] [<ffffffff81189731>] sys_ioctl+0x81/0xa0 and 11028.017533] [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80 [11028.025030] [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 [11028.031945] [<ffffffff814eb2ae>] __mutex_lock_slowpath+0x13e/0x180 [11028.039053] [<ffffffff814eb14b>] mutex_lock+0x2b/0x50 [11028.044910] [<ffffffffa059f7e7>] __ipoib_ib_dev_flush+0x37/0x210 [ib_ipoib] [11028.052894] [<ffffffffa059fa00>] ? ipoib_ib_dev_flush_light+0x0/0x20 [ib_ipoib] [11028.061363] [<ffffffffa059fa17>] ipoib_ib_dev_flush_light+0x17/0x20 [ib_ipoib] [11028.069738] [<ffffffff8108b120>] worker_thread+0x170/0x2a0 [11028.076068] [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40 [11028.083374] [<ffffffff8108afb0>] ? worker_thread+0x0/0x2a0 [11028.089709] [<ffffffff81090626>] kthread+0x96/0xa0 [11028.095266] [<ffffffff8100c0ca>] child_rip+0xa/0x20 [11028.100921] [<ffffffff81090590>] ? kthread+0x0/0xa0 [11028.106573] [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 [11028.112423] INFO: task ifconfig:23640 blocked for more than 120 seconds. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
49b8e744 |
|
08-Aug-2013 |
Jim Foraker <foraker1@llnl.gov> |
IPoIB: Fix race in deleting ipoib_neigh entries In several places, this snippet is used when removing neigh entries: list_del(&neigh->list); ipoib_neigh_free(neigh); The list_del() removes neigh from the associated struct ipoib_path, while ipoib_neigh_free() removes neigh from the device's neigh entry lookup table. Both of these operations are protected by the priv->lock spinlock. The table however is also protected via RCU, and so naturally the lock is not held when doing reads. This leads to a race condition, in which a thread may successfully look up a neigh entry that has already been deleted from neigh->list. Since the previous deletion will have marked the entry with poison, a second list_del() on the object will cause a panic: #5 [ffff8802338c3c70] general_protection at ffffffff815108c5 [exception RIP: list_del+16] RIP: ffffffff81289020 RSP: ffff8802338c3d20 RFLAGS: 00010082 RAX: dead000000200200 RBX: ffff880433e60c88 RCX: 0000000000009e6c RDX: 0000000000000246 RSI: ffff8806012ca298 RDI: ffff880433e60c88 RBP: ffff8802338c3d30 R8: ffff8806012ca2e8 R9: 00000000ffffffff R10: 0000000000000001 R11: 0000000000000000 R12: ffff8804346b2020 R13: ffff88032a3e7540 R14: ffff8804346b26e0 R15: 0000000000000246 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib] #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm] #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm] #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10 #10 [ffff8802338c3ee8] kthread at ffffffff81096c66 #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca We move the list_del() into ipoib_neigh_free(), so that deletion happens only once, after the entry has been successfully removed from the lookup table. This same behavior is already used in ipoib_del_neighs_by_gid() and __ipoib_reap_neigh(). Signed-off-by: Jim Foraker <foraker1@llnl.gov> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
3d790a4c |
|
18-Jul-2013 |
Or Gerlitz <ogerlitz@mellanox.com> |
IPoIB: Make sure child devices use valid/proper pkeys Make sure that the IB invalid pkey (0x0000 or 0x8000) isn't used for child devices. Also, make sure to always set the full membership bit for the pkey of devices created by rtnl link ops. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
dc850b0e |
|
17-Apr-2013 |
Patrick McHardy <kaber@trash.net> |
IPoIB: add support for TIPC protocol Support TIPC in the IPoIB driver. Since IPoIB now keeps track of its own neighbour entries and doesn't require the packet to have a dst_entry anymore, the only necessary changes are to: - not drop multicast TIPC packets because of the unknown ethernet type - handle unicast TIPC packets similar to IPv4/IPv6 unicast packets in ipoib_start_xmit(). An alternative would be to remove all ethertype limitations since they're not necessary anymore, all TIPC needs to know about is ARP and RARP since it wants to always perform "path find", even if a path is already known. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
83bdd3b9 |
|
01-Apr-2013 |
Doug Ledford <dledford@redhat.com> |
IPoIB: Fix ipoib_hard_header() return value If you have a patched up dhcp server (and dhclient), they will use AF_PACKET/SOCK_DGRAM pair to send dhcp packets over IPoIB. However, when testing an upstream kernel, this has been broken for a very long time (I tested 2.6.34, 2.6.38, 3.0, 3.1, 3.8, HEAD). It turns out that the hard_header routine in ipoib is not following the API and is returning 0 even when it pushed data onto the skb. This then causes af_packet.c to overwrite the header just pushed with data from user space. Fixing this gets DHCP working on IPoIB. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
f72dd566 |
|
25-Feb-2013 |
Roland Dreier <roland@purestorage.com> |
IPoIB: Free ipoib neigh on path record failure so path rec queries are retried If IPoIB fails to look up a path record (eg if it tries during an SM failover when one SM is dead but the new one hasn't taken over yet), the driver ends up with a neighbour structure but no address handle (AH). There's no mechanism to recover from this: any further packets sent to this destination will be silently dumped in ipoib_start_xmit(). Fix this by freeing the neighbour structures when a path rec query fails, so that the next packet queued to be sent will trigger a new path record query. Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
5a2815f0 |
|
19-Feb-2013 |
Itai Garbi <igarbi@mellanox.com> |
IPoIB: Don't attempt to release resources on error flow If the ipoib client info isn't found on the _remove_one callback, we must not attempt to scan the returned null list. Found by Coverity. Signed-off-by: Itai Garbi <igarbi@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
4b48680b |
|
19-Feb-2013 |
Yan Burman <yanb@mellanox.com> |
IPoIB: Add version and firmware info to ethtool reporting Implement version info as well as report firmware version and bus info of the underlying IB HW device. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
9d1ad66e |
|
19-Feb-2013 |
Shlomo Pongratz <shlomop@mellanox.com> |
IPoIB: Fix ipoib_neigh hashing to use the correct daddr octets The hash function introduced in commit b63b70d877 ("IPoIB: Use a private hash table for path lookup in xmit path") was designd to use the 3 octets of the IPoIB HW address that holds the remote QPN. However, this currently isn't the case on little-endian machines, because the the code there uses the flags part (octet[0]) and not the last octet of the QPN (octet[3]). Fix this. The fix caused a checkpatch warning on line over 80 characters, to solve that changed the name of the temp variable that holds the daddr. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
71d9c5f9 |
|
02-Oct-2012 |
Roland Dreier <roland@purestorage.com> |
IPoIB: Fix build with CONFIG_INFINIBAND_IPOIB_CM=n With the new netlink support in commit 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") we need ipoib_set_mode() to be available even if connected mode isn't built. Move the function from ipoib_cm.c to ipoib_main.c (and make a few CM-related macros available unconditonally). This fixes the build error drivers/built-in.o: In function 'ipoib_changelink': ipoib_netlink.c:(.text+0x6a5fc9): undefined reference to 'ipoib_set_mode' ipoib_netlink.c:(.text+0x6a5fe3): undefined reference to 'ipoib_set_mode' when CONFIG_INFINIBAND_IPOIB_CM isn't set. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Reported-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
862096a8 |
|
26-Sep-2012 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/ipoib: Add more rtnl_link_ops callbacks Add the rtnl_link_ops changelink and fill_info callbacks, through which the admin can now set/get the driver mode, etc policies. Maintain the proprietary sysfs entries only for legacy childs. For child devices, set dev->iflink to point to the parent device ifindex, such that user space tools can now correctly show the uplink relation as done for vlan, macvlan, etc devices. Pointed out by Patrick McHardy <kaber@trash.net> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bea1e22d |
|
30-Aug-2012 |
Patrick McHardy <kaber@trash.net> |
IPoIB: Fix use-after-free of multicast object Fix a crash in ipoib_mcast_join_task(). (with help from Or Gerlitz) Commit c8c2afe360b7 ("IPoIB: Use rtnl lock/unlock when changing device flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which is run from the ipoib_workqueue, and hence the workqueue can't be flushed from the context of ipoib_stop(). In the current code, ipoib_stop() (which doesn't flush the workqueue) calls ipoib_mcast_dev_flush(), which goes and deletes all the multicast entries. This takes place without any synchronization with a possible running instance of ipoib_mcast_join_task() for the same ipoib device, leading to a crash due to NULL pointer dereference. Fix this by making sure that the workqueue is flushed before ipoib_mcast_dev_flush() is called. To make that possible, we move the RTNL-lock wrapped code to ipoib_mcast_join_finish(). Signed-off-by: Patrick McHardy <kaber@trash.net> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
9baa0b03 |
|
12-Sep-2012 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/ipoib: Add rtnl_link_ops support Add rtnl_link_ops to IPoIB, with the first usage being child device create/delete through them. Childs devices are now either legacy ones, created/deleted through the ipoib sysfs entries, or RTNL ones. Adding support for RTNL childs involved refactoring of ipoib_vlan_add which is now used by both the sysfs and the link_ops code. Also, added ndo_uninit entry to support calling unregister_netdevice_queue from the rtnl dellink entry. This required removal of calls to ipoib_dev_cleanup from the driver in flows which use unregister_netdevice, since the networking core will invoke ipoib_uninit which does exactly that. Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b5120a6e |
|
29-Aug-2012 |
Shlomo Pongratz <shlomop@mellanox.com> |
IPoIB: Fix AB-BA deadlock when deleting neighbours Lockdep points out a circular locking dependency betwwen the ipoib device priv spinlock (priv->lock) and the neighbour table rwlock (ntbl->rwlock). In the normal path, ie neigbour garbage collection task, the neigh table rwlock is taken first and then if the neighbour needs to be deleted, priv->lock is taken. However in some error paths, such as in ipoib_cm_handle_tx_wc(), priv->lock is taken first and then ipoib_neigh_free routine is called which in turn takes the neighbour table ntbl->rwlock. The solution is to get rid the neigh table rwlock completely and use only priv->lock. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
66172c09 |
|
29-Aug-2012 |
Shlomo Pongratz <shlomop@mellanox.com> |
IPoIB: Fix memory leak in the neigh table deletion flow If the neighbours hash table is empty when unloading the module, then ipoib_flush_neighs(), the cleanup routine, isn't called and the memory used for the hash table itself leaked. To fix this, ipoib_flush_neighs() is allways called, and another completion object is added to signal when the table is freed. Once invoked, ipoib_flush_neighs() flushes all the neighbours (if there are any), calls the the hash table RCU free routine, which now signals completion of the deletion process, and waits for the last neighbour to be freed. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
6c723a68 |
|
13-Aug-2012 |
Shlomo Pongratz <shlomop@mellanox.com> |
IB/ipoib: Fix RCU pointer dereference of wrong object Commit b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") introduced a bug where in ipoib_neigh_free() (which is called from a few errors flows in the driver), rcu_dereference() is invoked with the wrong pointer object, which results in a crash. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
b63b70d8 |
|
24-Jul-2012 |
Shlomo Pongratz <shlomop@mellanox.com> |
IPoIB: Use a private hash table for path lookup in xmit path Dave Miller <davem@davemloft.net> provided a detailed description of why the way IPoIB is using neighbours for its own ipoib_neigh struct is buggy: Any time an ipoib_neigh is changed, a sequence like the following is made: spin_lock_irqsave(&priv->lock, flags); /* * It's safe to call ipoib_put_ah() inside * priv->lock here, because we know that * path->ah will always hold one more reference, * so ipoib_put_ah() will never do more than * decrement the ref count. */ if (neigh->ah) ipoib_put_ah(neigh->ah); list_del(&neigh->list); ipoib_neigh_free(dev, neigh); spin_unlock_irqrestore(&priv->lock, flags); ipoib_path_lookup(skb, n, dev); This doesn't work, because you're leaving a stale pointer to the freed up ipoib_neigh in the special neigh->ha pointer cookie. Yes, it even fails with all the locking done to protect _changes_ to *ipoib_neigh(n), and with the code in ipoib_neigh_free() that NULLs out the pointer. The core issue is that read side calls to *to_ipoib_neigh(n) are not being synchronized at all, they are performed without any locking. So whether we hold the lock or not when making changes to *ipoib_neigh(n) you still can have threads see references to freed up ipoib_neigh objects. cpu 1 cpu 2 n = *ipoib_neigh() *ipoib_neigh() = NULL kfree(n) n->foo == OOPS [..] Perhaps the ipoib code can have a private path database it manages entirely itself, which holds all the necessary information and is looked up by some generic key which is available easily at transmit time and does not involve generic neighbour entries. See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and <http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b> for the full discussion. This patch aims to solve the race conditions found in the IPoIB driver. The patch removes the connection between the core networking neighbour structure and the ipoib_neigh structure. In addition to avoiding the race described above, it allows us to handle SKBs carrying IP packets that don't have any associated neighbour. We add an ipoib_neigh hash table with N buckets where the key is the destination hardware address. The ipoib_neigh is fetched from the hash table and instead of the stashed location in the neighbour structure. The hash table uses both RCU and reference counting to guarantee that no ipoib_neigh instance is ever deleted while in use. Fetching the ipoib_neigh structure instance from the hash also makes the special code in ipoib_start_xmit that handles remote and local bonding failover redundant. Aged ipoib_neigh instances are deleted by a garbage collection task that runs every M seconds and deletes every ipoib_neigh instance that was idle for at least 2*M seconds. The deletion is safe since the ipoib_neigh instances are protected using RCU and reference count mechanisms. The number of buckets (N) and frequency of running the GC thread (M), are taken from the exported arb_tbl. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
178709bb |
|
02-Jul-2012 |
David S. Miller <davem@davemloft.net> |
ipoib: Convert over to dev_lookup_neigh_skb(). Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
936d7de3 |
|
07-Feb-2012 |
Roland Dreier <roland@purestorage.com> |
IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses Commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") made it possible for a netdev driver to use skb->cb between its header_ops.create method and its .ndo_start_xmit method. Use this in ipoib_hard_header() to stash away the LL address (GID + QPN), instead of the "ipoib_pseudoheader" hack. This allows IPoIB to stop lying about its hard_header_len, which will let us fix the L2 check for GRO. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
377cb4f9 |
|
07-Feb-2012 |
Roland Dreier <roland@purestorage.com> |
IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses Commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") made it possible for a netdev driver to use skb->cb between its header_ops.create method and its .ndo_start_xmit method. Use this in ipoib_hard_header() to stash away the LL address (GID + QPN), instead of the "ipoib_pseudoheader" hack. This allows IPoIB to stop lying about its hard_header_len, which will let us fix the L2 check for GRO. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
17e6abee |
|
02-Dec-2011 |
David Miller <davem@davemloft.net> |
infiniband: ipoib: Sanitize neighbour handling in ipoib_main.c Reduce the number of dst_get_neighbour_noref() calls within a single call chain. Primarily by passing the neighbour pointer down to the helper functions. Handle dst_get_neighbour_noref() returning NULL in ipoib_start_xmit() by incrementing the dropped counter and freeing the packet. We don't want it to fall through into the ARP/RARP/multicast handling, since that should only happen when skb_dst() is NULL. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
|
#
27217455 |
|
02-Dec-2011 |
David Miller <davem@davemloft.net> |
net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}. To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
|
#
596b9b68 |
|
24-Jul-2011 |
David Miller <davem@davemloft.net> |
neigh: Add infrastructure for allocating device neigh privates. netdev->neigh_priv_len records the private area length. This will trigger for neigh_table objects which set tbl->entry_size to zero, and the first instances of this will be forthcoming. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
580da35a |
|
29-Nov-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
IB: Fix RCU lockdep splats Commit f2c31e32b37 ("net: fix NULL dereferences in check_peer_redir()") forgot to take care of infiniband uses of dst neighbours. Many thanks to Marc Aurele who provided a nice bug report and feedback. Reported-by: Marc Aurele La France <tsi@ualberta.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
3874397c |
|
21-Nov-2011 |
Mike Marciniszyn <mike.marciniszyn@qlogic.com> |
IB/ipoib: Prevent hung task or softlockup processing multicast response This following can occur with ipoib when processing a multicast reponse: BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982] Modules linked in: ... CPU 0: Modules linked in: ... Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5 RIP: 0010:[<ffffffff814ddb27>] [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20 RSP: 0018:ffff8802119ed860 EFLAGS: 00000246 0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299 RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246 RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000 R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860 R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib] [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib] [<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib] [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0 [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0 [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0 [<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib] [<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib] [<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa] [<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad] [<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core] [<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa] [<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa] [<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa] [<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad] [<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad] [<ffffffff81088830>] ? worker_thread+0x170/0x2a0 [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40 [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0 [<ffffffff8108ddf6>] ? kthread+0x96/0xa0 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 Coinciding with stack trace is the following message: ib0: ib_address_create failed The code below in ipoib_mcast_join_finish() will note the above failure in the address handle but otherwise continue: ah = ipoib_create_ah(dev, priv->pd, &av); if (!ah) { ipoib_warn(priv, "ib_address_create failed\n"); } else { The while loop at the bottom of ipoib_mcast_join_finish() will attempt to send queued multicast packets in mcast->pkt_queue and eventually end up in ipoib_mcast_send(): if (!mcast->ah) { if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE) skb_queue_tail(&mcast->pkt_queue, skb); else { ++dev->stats.tx_dropped; dev_kfree_skb_any(skb); } My read is that the code will requeue the packet and return to the ipoib_mcast_join_finish() while loop and the stage is set for the "hung" task diagnostic as the while loop never sees a non-NULL ah, and will do nothing to resolve. There are GFP_ATOMIC allocates in the provider routines, so this is possible and should be dealt with. The test that induced the failure is associated with a host SM on the same server during a shutdown. This patch causes ipoib_mcast_join_finish() to exit with an error which will flush the queued mcast packets. Nothing is done to unwind the QP attached state so that subsequent sends from above will retry the join. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Reviewed-by: Gary Leshner <gary.leshner@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
9ca36f7d |
|
16-Nov-2011 |
David S. Miller <davem@davemloft.net> |
infiniband: Update net drivers for netdev_features_t changes. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
afc4b13d |
|
16-Aug-2011 |
Jiri Pirko <jpirko@redhat.com> |
net: remove use of ndo_set_multicast_list in drivers replace it by ndo_set_rx_mode Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
22cfb0bf |
|
16-Aug-2011 |
Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> |
IPoIB: Fix possible NULL dereference in ipoib_start_xmit() Fix a bug introduced in 69cce1d14049 ("net: Abstract dst->neighbour accesses behind helpers.") where we might dereference skb_dst(skb) even if it is NULL, which causes: [ 240.944030] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 [ 240.948007] IP: [<ffffffffa0366ce9>] ipoib_start_xmit+0x39/0x280 [ib_ipoib] [...] [ 240.948007] Call Trace: [ 240.948007] <IRQ> [ 240.948007] [<ffffffff812cd5e0>] dev_hard_start_xmit+0x2a0/0x590 [ 240.948007] [<ffffffff8131f680>] ? arp_create+0x70/0x200 [ 240.948007] [<ffffffff812e8e1f>] sch_direct_xmit+0xef/0x1c0 Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=41212 Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
69cce1d1 |
|
18-Jul-2011 |
David S. Miller <davem@davemloft.net> |
net: Abstract dst->neighbour accesses behind helpers. dst_{get,set}_neighbour() Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d96c74d |
|
18-Apr-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: infiniband/ulp/ipoib: convert to hw_features Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
948579cd |
|
04-Nov-2010 |
Joe Perches <joe@perches.com> |
RDMA: Use vzalloc() to replace vmalloc()+memset(0) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
8ae31e5b |
|
10-Jan-2011 |
Or Gerlitz <ogerlitz@voltaire.com> |
IPoIB: Add GRO support Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
19e364f6 |
|
10-Jan-2011 |
Or Gerlitz <ogerlitz@voltaire.com> |
IPoIB: Remove LRO support As a first step in moving from LRO to GRO, revert commit af40da894e9 ("IPoIB: add LRO support"). Also eliminate the ethtool set_flags callback which isn't needed anymore. Finally, we need to include <linux/sched.h> directly to get the declaration of restart_syscall() (which used to be included implicitly through <linux/inet_lro.h>). Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
732eacc0 |
|
26-Oct-2010 |
Hagen Paul Pfeifer <hagen@jauu.net> |
replace nested max/min macros with {max,min}3 macro Use the new {max,min}3 macros to save some cycles and bytes on the stack. This patch substitutes trivial nested macros with their counterpart. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c3aa9b18 |
|
20-Sep-2010 |
Eli Cohen <eli@dev.mellanox.co.il> |
IPoIB: Set dev_id field of net_device Use the net device's dev_id field to encode the port number of the pci device. This can be used to to associate a net device with the pci device's port. The encoding is: dev_id = port - 1. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
7b4c8769 |
|
27-Sep-2010 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Skip IBoE ports IPoIB is IP-over-Infiniband link layer. In the case of IBoE, the link layer is Ethernet and IP can work directly over Ethernet, so disable IPoIB for non-IB_LINK_LAYER_INFINIBAND ports. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
7a52b34b |
|
05-Jun-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IPoIB: Fix world-writable child interface control sysfs attributes Sumeet Lahorani <sumeet.lahorani@oracle.com> reported that the IPoIB child entries are world-writable; however we don't want ordinary users to be able to create and destroy child interfaces, so fix them to be writable only by root. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
0cd4d0fd |
|
09-Dec-2009 |
David J. Wilder <dwilder@us.ibm.com> |
IPoIB: Clear ipoib_neigh.dgid in ipoib_neigh_alloc() IPoIB can miss a change in destination GID under some conditions. The problem is caused when ipoib_neigh->dgid contains a stale address. The fix is to set ipoib_neigh->dgid to zero in ipoib_neigh_alloc(). This can happen when a system using bonding on its IPoIB interfaces has switched its active interface from interface A to B and back to A. The system that fails over will not correctly processes the 2nd address change, as described below. When an address has changed neighbor->ha is updated with the new address. Each neighbor has an associated ipoib_neigh. ipoib_neigh->dgid also holds a copy of the remote node's hardware address. When an address changes neighbor->ha is updated by the network layer (arp code) with the new address. IPoIB detects this change in ipoib_start_xmit() by comparing neighbor->ha with ipoib_neigh->dgid. The bug is that ipoib_neigh->dgid may already contain the new address (A) thus the change from B to A is missed by ipoib. Here is the sequence of events: ipoib_neigh->dgid = A and neighbor->ha = A The address is switched to B (the first switch) neighbor->ha = B The change is seen in ipoib_start_xmit() -- neighbor->ha != ipoib_neigh->dgid so ipoib_neigh is released, and a new one is allocated. The allocator may return the same chunk of memory that was just released, therefore ipoib_neigh->dgid still contains A at this point. ipoib_neigh->dgid should be updated in neigh_add_path(), but if the following conditions are true dgid is not updated: 1) __path_find() returns a path 2) path->ah is NULL The remote system now switches from address B to A, neighbor->ha is updated to A. Now we have again : ipoib_neigh->dgid = A and neighbor->ha = A Since the addresses are the same ipoib won't process the change in address. Fix this by zeroing out the dgid field when allocating a new struct ipoib_neigh. Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
721d67cd |
|
05-Sep-2009 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Drop priv->lock before calling ipoib_send() IPoIB currently must use irqsave locking for priv->lock, since it is taken from interrupt context in one path. However, ipoib_send() does skb_orphan(), and the network stack locking is not IRQ-safe. Therefore we need to make sure we don't hold priv->lock when calling ipoib_send() to avoid lockdep warnings (the code was almost certainly safe in practice, since the only code path that takes priv->lock from interrupt context would never call into the network stack). Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757 Reported-by: Bart Van Assche <bart.vanassche@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
adf30907 |
|
01-Jun-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: skb->dst accessors Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
86d15cd8 |
|
31-May-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: unset IFF_XMIT_DST_RELEASE for qeth and ipoib Last two drivers that need skb->dst in their start_xmit() function Tell dev_hard_start_xmit() to no release it by unsetting IFF_XMIT_DST_RELEASE Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e028cc55 |
|
20-Apr-2009 |
Yossi Etigin <yosefe@voltaire.com> |
IPoIB: Disable NAPI while CQ is being drained If NAPI is enabled while IPoIB's CQ is being drained, it creates a race on priv->ibwc between ipoib_poll() and ipoib_drain_cq(), leading to memory corruption. The solution is to enable/disable NAPI in ipoib_ib_dev_{open/stop}() instead of in ipoib_{open/stop}(), and sync NAPI on the INITIALIZED flag instead on the ADMIN_UP flag. This way NAPI will be disabled when ipoib_drain_cq() is called. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1587>. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
fe8114e8 |
|
20-Mar-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
infiniband: convert ipoib to net_device_ops Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
71d98b46 |
|
17-Feb-2009 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IPoIB: In unicast_arp_send(), only free newly-created paths If path_rec_start() returns error, call path_free() only if the path was newly-created. If we free an existing path whose valid flag was zero, (but do not detach it from the list) we cause corruption of the path list (of which it is a member), and get a kernel crash. The simplest solution is to not free an existing path -- just leave it in the list as-is (i.e., with its valid flag cleared). Thanks to Yossi Etigin of Voltaire for identifying the problem flow which caused the kernel crash. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Moni Shua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
b8a1b1ce |
|
14-Jan-2009 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Fix hang in napi_disable() if P_Key is never found After commit fe25c561 ("IPoIB: Don't enable NAPI when it's already enabled"), if an interface is brought up but the corresponding P_Key never appears, then ipoib_stop() will hang in napi_disable(), because ipoib_open() returns before it does napi_enable(). Fix this by changing ipoib_open() to call napi_enable() even if the P_Key isn't present. Reported-by: Yossi Etigin <yosefe@Voltaire.COM> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
a50df398 |
|
09-Jan-2009 |
Yossi Etigin <yosefe@Voltaire.COM> |
IPoIB: Fix loss of connectivity after bonding failover on both sides Fix bonding failover in the case both peers failover and the gratuitous ARP is lost. In that case, the sender side will create an ipoib_neigh and issue a path request with the old GID first. When skb->dst->neighbour->ha changes due to ARP refresh, this ipoib_neigh will not be added to the path->list of the path of the new GID, because the ipoib_neigh already exists. It will not have an AH either, because of sender-side failover. Therefore, it will not get an AH when the path is resolved. The solution here is to compare GIDs in ipoib_start_xmit() even if neigh->ah is invalid. Comparing with an uninitialized value of neigh->dgid should be fine, since a spurious match is harmless (and astronomically unlikely too). Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
ff79ae80 |
|
12-Nov-2008 |
Yossi Etigin <yosefe@Voltaire.COM> |
IPoIB: Fix crash in path_rec_completion() Fix a crash in path_rec_completion() during an SM up/down loop. If more than one path record request is issued, the first completion releases path->done, allowing ipoib_flush_paths() to free the path, and thus corrupting it for the second completion. Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM change events") added the field path->valid and changed the test "if (!path)" to "if (!path || !path->valid)". This change made it possible for a path with an outstanding query to pass the test and issue another query on the same path. Having two queries on the same path leads to a crash. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1325>. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
93a3ab93 |
|
12-Nov-2008 |
Yossi Etigin <yosefe@Voltaire.COM> |
IPoIB: Fix hang in ipoib_flush_paths() ipoib_flush_paths() can hang during an SM up/down loop: if path_rec_start() fails (for instance, because there is no sm_ah), the path is still added to the path list by neigh_add_path(). Then, ipoib_flush_paths() will wait for path->done, but it will never complete because the request was not issued at all. Fix this by completing path->done if issuing the query fails. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1329>. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
fe25c561 |
|
12-Nov-2008 |
Yossi Etigin <yosefe@Voltaire.COM> |
IPoIB: Don't enable NAPI when it's already enabled If a P_Key is not present when an interface is created, ipoib_open() will return after doing napi_enable(). ipoib_open() will be called again from ipoib_pkey_poll() when the P_Key appears, after NAPI has already been enabled, and try to enable it again. This triggers a BUG_ON() in napi_enable(). Fix this by moving the call to napi_enable() to after the test for P_Key presence. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
5b095d989 |
|
29-Oct-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: replace %p6 with %pI6 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fcace2fe |
|
28-Oct-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
infiniband: ipoib replace IPOIB_GID_FMT with %p6 Replace all uses of IPOIB_GID_FMT, IPOIB_GID_RAW_ARG() and IPOIB_GID_ARG() Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
83bb63f6 |
|
22-Oct-2008 |
Or Gerlitz <ogerlitz@voltaire.com> |
IPoIB: Set netdev offload features properly for child (VLAN) interfaces Child devices were created without any offload features set, fix this by moving the code that computes the features into generic function which is now called through non-child and child device creation. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> -- v1 has a bug where the 'result' flag in ipoib_vlan_add may be used uninitialized Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
943c246e |
|
30-Sep-2008 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling tx_lock. Not only do we want to get rid of LLTX, this actually causes problems because of the skb_orphan() done with this tx_lock held: some skb destructors expect to be run with interrupts enabled. The simplest fix for this is to get rid of the driver-private tx_lock and stop using LLTX. We kill off priv->tx_lock and use netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit tricky because we need to update places that take priv->lock inside the tx_lock to disable IRQs, rather than relying on tx_lock having already disabled IRQs. Also, there are a couple of places where we need to disable BHs to make sure we have a consistent context to call netif_tx_lock() (since we no longer can use _irqsave() variants), and we also have to change ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather than directly, because ipoib_send_comp_handler() runs in interrupt context and drain_tx_cq() must run in BH context so it can call netif_tx_lock(). Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
6ef190cc |
|
25-Sep-2008 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Fix crash when path record fails after path flush Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM change events") changed how paths are flushed on an SM event. This change introduces a problem if the path record query triggered by fails, causing path->ah to become NULL. A later successful path query will then trigger WARN_ON() in path_rec_completion(), and crash because path->ah has already been freed, so the ipoib_put_ah() inside the lock in path_rec_completion() may actually drop the last reference (contrary to the comment that claims this is safe). Fix this by updating path->ah and freeing old_ah only when the path record query is successful. This prevents the neighbour AH and that path AH from getting out of sync. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1194> Reported-by: Rabah Salem <ravah@mellanox.com> Debugged-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c9da4bad |
|
25-Sep-2008 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Fix crash when path record fails after path flush Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM change events") changed how paths are flushed on an SM event. This change introduces a problem if the path record query triggered by fails, causing path->ah to become NULL. A later successful path query will then trigger WARN_ON() in path_rec_completion(), and crash because path->ah has already been freed, so the ipoib_put_ah() inside the lock in path_rec_completion() may actually drop the last reference (contrary to the comment that claims this is safe). Fix this by updating path->ah and freeing old_ah only when the path record query is successful. This prevents the neighbour AH and that path AH from getting out of sync. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1194> Reported-by: Rabah Salem <ravah@mellanox.com> Debugged-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
e8224e4b |
|
16-Sep-2008 |
Yossi Etigin <yossi.openib@gmail.com> |
IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. This works because we only flush the ipoib_workqueue with the RTNL not held. The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop(). This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on RTNL in ipoib_stop()"). Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
a77a57a1 |
|
19-Aug-2008 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Fix deadlock on RTNL in ipoib_stop() Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which is run from the ipoib_workqueue. However, ipoib_stop() (which is run inside rtnl_lock()) flushes this workqueue, which leads to a deadlock if the join task is pending. Fix this by simply not flushing the workqueue from ipoib_stop(). It turns out that we really don't care about workqueue tasks running during or after ipoib_stop(), as long as we make sure to flush the workqueue before unregistering a netdev. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
01b3fc8b |
|
22-Jul-2008 |
Or Gerlitz <ogerlitz@voltaire.com> |
IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures Print the return code of ib_sa_path_rec_get() if it fails to help debug errors. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
5892eff9 |
|
15-Jul-2008 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Remove priv->mcast_mutex No need for a mutex around calls to ib_attach_mcast/ib_detach_mcast since these operations are synchronized at the HW driver layer. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
ee1e2c82 |
|
15-Jul-2008 |
Moni Shoua <monis@Voltaire.COM> |
IPoIB: Refresh paths instead of flushing them on SM change events The patch tries to solve the problem of device going down and paths being flushed on an SM change event. The method is to mark the paths as candidates for refresh (by setting the new valid flag to 0), and wait for an ARP probe a new path record query. The solution requires a different and less intrusive handling of SM change event. For that, the second argument of the flush function changes its meaning from a boolean flag to a level. In most cases, SM failover doesn't cause LID change so traffic won't stop. In the rare cases of LID change, the remote host (the one that hadn't changed its LID) will lose connectivity until paths are refreshed. This is no worse than the current state. In fact, preventing the device from going down saves packets that otherwise would be lost. Signed-off-by: Moni Levy <monil@voltaire.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
af40da89 |
|
15-Jul-2008 |
Vladimir Sokolovsky <vlad@mellanox.co.il> |
IPoIB: add LRO support Add "ipoib_use_lro" module parameter to enable LRO and an "ipoib_lro_max_aggr" module parameter to set the max number of packets to be aggregated. Make LRO controllable and LRO statistics accessible through ethtool. Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
f89271da |
|
15-Jul-2008 |
Eli Cohen <eli@dev.mellanox.co.il> |
IPoIB: Copy small received SKBs in connected mode The connected mode implementation in the IPoIB driver has a large overhead in the way SKBs are handled in the receive flow. It usually allocates an SKB with as big as was used in the currently received SKB and moves unused fragments from the old SKB to the new one. This involves a loop on all the remaining fragments and incurs overhead on the CPU. This patch, for small SKBs, allocates an SKB just large enough to contain the received data and copies to it the data from the received SKB. The newly allocated SKB is passed to the stack and the old SKB is reposted. When running netperf, UDP small messages, without this pach I get: UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 14.4.3.178 (14.4.3.178) port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 114688 128 10.00 5142034 0 526.31 114688 10.00 1130489 115.71 With this patch I get both send and receive at ~315 mbps. The reason that send performance actually slows down is as follows: When using this patch, the overhead of the CPU for handling RX packets is dramatically reduced. As a result, we do not experience RNR NAK messages from the receiver which cause the connection to be closed and reopened again; when the patch is not used, the receiver cannot handle the packets fast enough so there is less time to post new buffers and hence the mentioned RNR NACKs. So what happens is that the application *thinks* it posted a certain number of packets for transmission but these packets are flushed and do not really get transmitted. Since the connection gets opened and closed many times, each time netperf gets the CPU time that otherwise would have been given to IPoIB to actually transmit the packets. This can be verified when looking at the port counters -- the output of ifconfig and the oputput of netperf (this is for the case without the patch): tx packets ========== port counter: 1,543,996 ifconfig: 1,581,426 netperf: 5,142,034 rx packets ========== netperf 1,1304,089 Signed-off-by: Eli Cohen <eli@mellanox.co.il>
|
#
f3781d2e |
|
15-Jul-2008 |
Roland Dreier <rolandd@cisco.com> |
RDMA: Remove subversion $Id tags They don't get updated by git and so they're worse than useless. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
f56bcd80 |
|
29-Apr-2008 |
Eli Cohen <eli@dev.mellanox.co.il> |
IPoIB: Use separate CQ for UD send completions Use a dedicated CQ for UD send completions. Also, do not arm the UD send CQ, which reduces the number of interrupts generated. This patch farther reduces overhead by not calling poll CQ for every posted send WR -- it does polls only when there 16 or more outstanding work requests. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
bc7b3a36 |
|
23-Apr-2008 |
Shirley Ma <mashirle@us.ibm.com> |
IPoIB: Handle 4K IB MTU for UD (datagram) mode This patch enables IPoIB to use 4K UD messages (when the underlying device and fabrics support a 4K MTU) by using two scatter buffers when PAGE_SIZE is less than or equal to thhe HCA IB MTU size. The first buffer is for IPoIB header + GRH header, and the second buffer is the IPoIB payload, which is 4K-4. Signed-off-by: Shirley Ma <xma@us.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
82c24c18 |
|
16-Apr-2008 |
Eli Cohen <eli@dev.mellanox.co.il> |
IPoIB: Add basic ethtool support Just add the infrastructure so we can add functionality later. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
40ca1988 |
|
16-Apr-2008 |
Eli Cohen <eli@dev.mellanox.co.il> |
IPoIB: Add LSO support For HCAs that support TCP segmentation offload (IB_DEVICE_UD_TSO), set NETIF_F_TSO and use HW LSO to offload TCP segmentation. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
157de229 |
|
16-Apr-2008 |
Robert P. J. Day <rpjday@crashcourse.ca> |
IB: Use shorter list_splice_init() for brevity Convert list_splice() + INIT_LIST_HEAD() to the equivalent list_splice_init() Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
6046136c |
|
16-Apr-2008 |
Eli Cohen <eli@dev.mellanox.co.il> |
IPoIB: Use checksum offload support if available For HCAs that support checksum offload (ie that set IB_DEVICE_UD_IP_CSUM in the device capabilities flags), have IPoIB set NETIF_F_IP_CSUM and use the HCA to generate and verify IP checksums. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
10313cbb |
|
12-Mar-2008 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Allocate priv->tx_ring with vmalloc() Commit 7143740d ("IPoIB: Add send gather support") made struct ipoib_tx_buf significantly larger, since the mapping member changed from a single u64 to an array with MAX_SKB_FRAGS + 1 entries. This means that allocating tx_rings with kzalloc() may fail because there is not enough contiguous memory for the new, much bigger size. Fix this regression by allocating the rings with vmalloc() instead. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
eb14032f |
|
30-Jan-2008 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Add high DMA feature flag All current InfiniBand devices can handle all DMA addresses, and it's hard to imagine anyone would be silly enough to build a new device that couldn't. Therefore, enable the NETIF_F_HIGHDMA feature for IPoIB. This has no effect for no, but is needed when we enable gather/scatter support and checksum stateless offloads. Signed-off-by: Eli Cohen <eli@mellnaox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
7bc531dd |
|
28-Jan-2008 |
Or Gerlitz <ogerlitz@voltaire.com> |
IPoIB: Remove a misleading debug print Commit 732a2170 ("IB/ipoib: Bound the net device to the ipoib_neigh structue") left a misleading debug print (n->dev would be a bond device only if boding is used). Clean it up. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
bafff974 |
|
17-Jan-2008 |
Or Gerlitz <ogerlitz@voltaire.com> |
IPoIB: Handle bonding failover race for connected neighbours too Move up the code that checks for a situation where the remote GID stored in the ipoib_neigh is different than the one present in the neighbour (handle gratuitous ARP) or that a bonding fail over has happened but the neighbour still has a pointer to an ipoib_neigh created by a different device than the current slave. This will cause the driver to apply the check also for connected mode neighbours. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
48fe5e59 |
|
14-Nov-2007 |
Krishna Kumar <krkumar2@in.ibm.com> |
IPoIB: Remove redundant check of netif_queue_stopped() in xmit handler qdisc_run() now tests for queue_stopped() before calling __qdisc_run(), and the same check is done in every iteration of __qdisc_run(), so another check is not required in the driver xmit. This means that ipoib_start_xmit() no longer needs to test netif_queue_stopped(); the test was added to fix earlier kernels, where the networking stack did not guarantee that the xmit method of an LLTX driver would not be called after the queue was stopped, but current kernels do provide this guarantee. To validate, I put a debug in the TX_BUSY path which never hit with 64 threads running overnight exercising this code a few 100 million times. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
586a6934 |
|
21-Dec-2007 |
Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> |
IPoIB/CM: Enable SRQ support on HCAs that support fewer than 16 SG entries Some HCAs (such as ehca2) support SRQ, but only support fewer than 16 SG entries for SRQs. Currently IPoIB/CM implicitly assumes all HCAs will support 16 SG entries for SRQs (to handle a 64K MTU with 4K pages). This patch removes that restriction by limiting the maximum MTU in connected mode to what the maximum number of SRQ SG entries allows. This patch addresses <https://bugs.openfabrics.org/show_bug.cgi?id=728> Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
68e995a2 |
|
25-Jan-2008 |
Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> |
IPoIB/cm: Add connected mode support for devices without SRQs Some IB adapters (notably IBM's eHCA) do not implement SRQs (shared receive queues). The current IPoIB connected mode support only works on devices that support SRQs. Fix this by adding support for using the receive queue of each connected mode receive QP. The disadvantage of this compared to using an SRQ is that it means a full queue of receives must be posted for each remote connected mode peer, which means that total memory usage is potentially much higher than when using SRQs. To manage this, add a new module parameter "max_nonsrq_conn_qp" that limits the number of connections allowed per interface. The rest of the changes are fairly straightforward: we use a table of struct ipoib_cm_rx to hold all the active connections, and put the table index of the connection in the high bits of receive WR IDs. This is needed because we cannot rely on the struct ib_wc.qp field for non-SRQ receive completions. Most of the rest of the changes just test whether or not an SRQ is available, and post receives or find received packets in the right place depending on the answer. Cleaning up dead connections actually becomes simpler, because we do not have to do the "last WQE reached" dance that is required to destroy QPs attached to an SRQ. We just move the QP to the error state and wait for all pending receives to be flushed. Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> [ Completely rewritten and split up, based on Pradeep's work. Several bugs fixed and no doubt several bugs introduced. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
2337f809 |
|
23-Oct-2007 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Trivial formatting cleanups Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
1401b53a |
|
26-Nov-2007 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IPoIB: Fix oops if xmit is called when priv->broadcast is NULL If a port goes down, ipoib_ib_dev_down() is invoked -- which flushes the mcasts (clearing priv->broadcast) and clearing the path record cache. If ipoib_start_xmit() is then invoked (before the broadcast group is rejoined), a kernel oops results from attempting to access priv->broadcast, which is still unset. Returning NULL from path_rec_create() if priv->broadcast is NULL is a harmless way of bypassing the problem -- the offending packet is simply discarded "without prejudice." Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
1b524963 |
|
16-Aug-2007 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB/cm: Use common CQ for CM send completions Use the same CQ for CM send completions as for all other IPoIB completions. This means all completions are processed via the same NAPI polling routine. This should help reduce the number of interrupts for bi-directional traffic (such as TCP) and fixes "driver is hogging interrupts" errors reported for IPoIB send side, e.g. <https://bugs.openfabrics.org/show_bug.cgi?id=508> To do this, keep a per-interface counter of outstanding send WRs, and stop the interface when this counter reaches the send queue size to avoid CQ overruns. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
200d1713 |
|
09-Oct-2007 |
Moni Shoua <monis@voltaire.com> |
IB/ipoib: Verify address handle validity on send When the bonding device senses a carrier loss of its active slave it replaces that slave with a new one. In between the times when the carrier of an IPoIB device goes down and ipoib_neigh is destroyed, it is possible that the bonding driver will send a packet on a new slave that uses an old ipoib_neigh. This patch detects and prevents this from happenning. Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: Roland Dreier <rdreier@cisco.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
732a2170 |
|
09-Oct-2007 |
Moni Shoua <monis@voltaire.com> |
IB/ipoib: Bound the net device to the ipoib_neigh structue IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb->dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Under this scheme, ipoib_neigh_destructor assumption that for each struct neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev) can be casted to struct ipoib_dev_priv is buggy. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one, when n->dev->flags has the IFF_MASTER bit set. Signed-off-by: Moni Shoua <monis at voltaire.com> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com> Acked-by: Roland Dreier <rdreier@cisco.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
9153f66a |
|
09-Oct-2007 |
Roland Dreier <rdreier@cisco.com> |
IPoIB: Fix unused variable warning The conversion to use netdevice internal stats left an unused variable in ipoib_neigh_free(), since there's no longer any reason to get netdev_priv() in order to increment dropped packets. Delete the unused priv variable. Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
#
de903512 |
|
28-Sep-2007 |
Roland Dreier <rolandd@cisco.com> |
[IPoIB]: Convert to netdevice internal stats Use the stats member of struct netdevice in IPoIB, so we can save memory by deleting the stats member of struct ipoib_dev_priv, and save code by deleting ipoib_get_stats(). Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3b04ddde |
|
09-Oct-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[NET]: Move hardware header operations out of netdevice. Since hardware header operations are part of the protocol class not the device instance, make them into a separate object and save memory. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
10d024c1 |
|
17-Sep-2007 |
Ralf Baechle <ralf@linux-mips.org> |
[NET]: Nuke SET_MODULE_OWNER macro. It's been a useless no-op for long enough in 2.6 so I figured it's time to remove it. The number of people that could object because they're maintaining unified 2.4 and 2.6 drivers is probably rather small. [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bea3348e |
|
03-Oct-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[NET]: Make NAPI polling independent of struct net_device objects. Several devices have multiple independant RX queues per net device, and some have a single interrupt doorbell for several queues. In either case, it's easier to support layouts like that if the structure representing the poll is independant from the net device itself. The signature of the ->poll() call back goes from: int foo_poll(struct net_device *dev, int *budget) to int foo_poll(struct napi_struct *napi, int budget) The caller is returned the number of RX packets processed (or the number of "NAPI credits" consumed if you want to get abstract). The callee no longer messes around bumping dev->quota, *budget, etc. because that is all handled in the caller upon return. The napi_struct is to be embedded in the device driver private data structures. Furthermore, it is the driver's responsibility to disable all NAPI instances in it's ->stop() device close handler. Since the napi_struct is privatized into the driver's private data structures, only the driver knows how to get at all of the napi_struct instances it may have per-device. With lots of help and suggestions from Rusty Russell, Roland Dreier, Michael Chan, Jeff Garzik, and Jamal Hadi Salim. Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra, Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan. [ Ported to current tree and all drivers converted. Integrated Stephen's follow-on kerneldoc additions, and restored poll_list handling to the old style to fix mutual exclusion issues. -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
335a64a5a |
|
08-Oct-2007 |
Or Gerlitz <ogerlitz@voltaire.com> |
IPoIB: Allow setting policy to ignore multicast groups The kernel IB stack allows (through the RDMA CM) userspace applications to join and use multicast groups from the IPoIB MGID range. This allows multicast traffic to be handled directly from userspace QPs, without going through the kernel stack, which gives better performance for some applications. However, to fully interoperate with IP multicast, such userspace applications need to participate in IGMP reports and queries, or else routers may not forward the multicast traffic to the system where the application is running. The simplest way to do this is to share the kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to join multicast groups that are being handled directly in userspace. However, in such cases, the actual multicast traffic should not also be handled by the IPoIB interface, because that would burn resources handling multicast packets that will just be discarded in the kernel. To handle this, this patch adds lookup on the database used for IB multicast group reference counting when IPoIB is joining multicast groups, and if a multicast group is already handled by user space, then the IPoIB kernel driver ignores the group. This is controlled by a per-interface policy flag. When the flag is set, IPoIB will not join and attach its QP to a multicast group which already has an entry in the database; when the flag is cleared, IPoIB will behave as before this change. For each IPoIB interface, the /sys/class/net/$intf/umcast attribute controls the policy flag. The default value is off/0. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
81668838 |
|
02-Aug-2007 |
Sean Hefty <sean.hefty@intel.com> |
IPoIB: Specify Traffic Class with path record queries for QoS support To support QoS within and between subnets, modify IPoIB to request specific Traffic Class values with path record queries, using the value associated with the IPoIB broadcast group. Signed-off-by: Sean Hefty <sean.hefty@intel.com> [ See some comments I made on this at v1 and v2 of the posts <http://lists.openfabrics.org/pipermail/general/2007-August/039275.html> <http://lists.openfabrics.org/pipermail/general/2007-September/040312.html> ] Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
ca6de177 |
|
21-Aug-2007 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Fix error path memory leak Clean up properly if ib_query_pkey() or ib_query_gid() fail. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
26bbf13c |
|
19-May-2007 |
Yosef Etigin <yosefe@voltaire.com> |
IPoIB: Handle P_Key table reordering SM reconfiguration or failover possibly causes a shuffling of the values in the P_Key table. Right now, IPoIB only queries for the P_Key index once when it creates the device QP, and hence there are problems if the index of a P_Key value changes. Fix this by using the PKEY_CHANGE event to trigger a recheck of the P_Key index. Signed-off-by: Yosef Etigin <yosefe@voltaire.com> Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
8d1cc86a |
|
06-May-2007 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Convert to NAPI Convert the IP-over-InfiniBand network device driver over to using NAPI to handle completions for the main CQ. This covers all receives as well as datagram mode sends; send completions for connected mode connections are still handled from interrupt context. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
46f1b3d7 |
|
05-Apr-2007 |
Sean Hefty <sean.hefty@intel.com> |
IB/ipoib: Use ib_init_ah_from_path to initialize ah_attr To support destinations that are not on the local IB subnet, IPoIB should include the GRH information when constructing an address handle. Using the existing ib_init_ah_from_path() call will do this for us. Signed-off-by: Sean Hefty <sean.hefty@intel.com>
|
#
ecbb4169 |
|
24-Mar-2007 |
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> |
[NET]: Fix neighbour destructor handling. ->neigh_destructor() is killed (not used), replaced with ->neigh_cleanup(), which is called when neighbor entry goes to dead state. At this point everything is still valid: neigh->dev, neigh->parms etc. The device should guarantee that dead neighbor entries (neigh->dead != 0) do not get private part initialized, otherwise nobody will cleanup it. I think this is enough for ipoib which is the only user of this thing. Initialization private part of neighbor entries happens in ipib start_xmit routine, which is not reached when device is down. But it would be better to add explicit test for neigh->dead in any case. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d04d01b1 |
|
22-Mar-2007 |
Michael S. Tsirkin <mst@dev.mellanox.co.il> |
IPoIB: Fix use-after-free in path_rec_completion() The connected mode code added the possibility that an neigh struct gets freed in the list_for_each_entry() loop in path_rec_completion(), which causes a use-after-free. Fix this by changing to the _safe variant of the list walking macro. This was spotted by the Coverity checker (CID 1567). Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
843613b0 |
|
26-Feb-2007 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Correct debugging output when path record lookup fails If path_rec_completion() is passed a non-NULL path record pointer along with an unsuccessful status value, the tracing code incorrectly prints the (invalid) DLID from the path record rather than the more interesting status code. The actual logic of the function correctly uses the path record only if the status indicates a successful lookup. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
839fcaba |
|
05-Feb-2007 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: Connected mode experimental support The following patch adds experimental support for IPoIB connected mode, as defined by the draft from the IETF ipoib working group. The idea is to increase performance by increasing the MTU from the maximum of 2K (theoretically 4K) supported by IPoIB on top of UD. With this code, I'm able to get 800MByte/sec or more with netperf without options on a Mellanox 4x back-to-back DDR system. Some notes on code: 1. SRQ is used for scalability to large cluster sizes 2. Only RC connections are used (UC does not support SRQ now) 3. Retry count is set to 0 since spec draft warns against retries 4. Each connection is used for data transfers in only 1 direction, so each connection is either active(TX) or passive (RX). 2 sides that want to communicate create 2 connections. 5. Each active (TX) connection has a separate CQ for send completions - this keeps the code simple without CQ resize and other tricks 6. To detect stale passive side connections (where the remote side is down), we keep an LRU list of passive connections (updated once per second per connection) and destroy a connection after it has been unused for several seconds. The LRU rule makes it possible to avoid scanning connections that have recently been active. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
43cb76d9 |
|
09-Apr-2002 |
Greg Kroah-Hartman <gregkh@suse.de> |
Network: convert network devices to use struct device instead of class_device This lets the network core have the ability to handle suspend/resume issues, if it wants to. Thanks to Frederik Deweerdt <frederik.deweerdt@gmail.com> for the arm driver fixes. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
82b39913 |
|
12-Dec-2006 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Make sure struct ipoib_neigh.queue is always initialized Move the initialization of ipoib_neigh's skb_queue into ipoib_neigh_alloc(), since commit 2745b5b7 ("IPoIB: Fix skb leak when freeing neighbour") will make iterate over the skb_queue to free any packets left over when freeing the ipoib_neigh structure. This fixes a crash when freeing ipoib_neigh structures allocated in ipoib_mcast_send(), which otherwise don't have their skb_queue initialized. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
2745b5b7 |
|
16-Nov-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: Fix skb leak when freeing neighbour ipoib_neigh_free() is sometimes called while neighbour is still alive, so it might still have queued skbs. Fix skb leak in this case. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
c4028958 |
|
22-Nov-2006 |
David Howells <dhowells@redhat.com> |
WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>
|
#
073ae841 |
|
16-Nov-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: Clear high octet in QP number IPoIB assumes that high (reserved) octet in the hardware address is 0, and copies it into the QPN. This violates RFC 4391 (which requires that the high 8 bits are ignored on receive), and will result in an invalid QPN being used when interoperating with IPoIB connected mode. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
a8bfca02 |
|
22-Sep-2006 |
Eli Cohen <eli@dev.mellanox.co.il> |
IPoIB: Add some likely/unlikely annotations in hot path Signed-off-by: Eli Cohen <eli@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
507c3350 |
|
21-Sep-2006 |
Dotan Barak <dotanb@dev.mellanox.co.il> |
IPoIB: Remove unused include of vmalloc.h IPoIB doesn't use anything from <linux/vmalloc.h>, so don't include it. Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
c1a0b23b |
|
21-Aug-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IB/sa: Require SA registration Require users to register with SA module, to prevent the sa_query module text from going away while an SA query callback is still running. Update all in-tree users for the new interface. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
07ebafba |
|
03-Aug-2006 |
Tom Tucker <tom@opengridcomputing.com> |
RDMA: iWARP Core Changes. Modifications to the existing rdma header files, core files, drivers, and ulp files to support iWARP, including: - Hook iWARP CM into the build system and use it in rdma_cm. - Convert enum ib_node_type to enum rdma_node_type, which includes the possibility of RDMA_NODE_RNIC, and update everything for this. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
9217b27b |
|
03-Aug-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IB/ipoib: Fix flush/start xmit race (from code review) Prevent flush task from freeing the ipoib_neigh pointer, while ipoib_start_xmit() is accessing the ipoib_neigh through the pointer it has loaded from the skb's hardware address. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
8a7f7521 |
|
19-Jul-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IB/ipoib: Fix packet loss after hardware address update The neighbour ha field may get updated without destroying the neighbour. In this case, the ha field gets out of sync with the address handle stored in ipoib_neigh->ah, with the result that the ah field would point to an incorrect path, resulting in all packets being lost. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
37c22a77 |
|
29-May-2006 |
Jack Morgenstein <jackm@mellanox.co.il> |
IPoIB: Fix kernel unaligned access on ia64 Fix misaligned access faults on ia64: never cast a misaligned neighbour->ha + 4 pointer to union ib_gid type; pass a void * pointer instead. The memcpy was being optimized to use full word accesses because the compiler thought that union ib_gid is always aligned. The cast in IPOIB_GID_ARG is safe, since it is fixed to access each byte separately. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
f697f74a |
|
10-Apr-2006 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Use spin_lock_irq() instead of spin_lock_irqsave() We know ipoib_flush_paths() is called from plain process context with interrupts enabled, since it does wait_for_completion(). So there's no need to use spin_lock_irqsave() -- spin_lock_irq() is fine. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
a30bb96c |
|
05-Apr-2006 |
Eli Cohen <eli@mellanox.co.il> |
IPoIB: Close race in ipoib_flush_paths() ib_sa_cancel_query() must be called with priv->lock held since a completion might arrive and set path->query to NULL. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
0f485251 |
|
10-Apr-2006 |
Shirley Ma <xma@us.ibm.com> |
IPoIB: Make send and receive queue sizes tunable Make IPoIB's send and receive queue sizes tunable via module parameters ("send_queue_size" and "recv_queue_size"). This allows the queue sizes to be enlarged to fix disastrously bad performance on some platforms and workloads, without bloating memory usage when large queues aren't needed. Signed-off-by: Shirley Ma <xma@us.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
bf6a9e31 |
|
10-Apr-2006 |
Jack Morgenstein <jackm@mellanox.co.il> |
IB: simplify static rate encoding Push translation of static rate to HCA format into low-level drivers, where it belongs. For static rate encoding, use encoding of rate field from IB standard PathRecord, with addition of value 0, for backwards compatibility with current usage. The changes are: - Add enum ib_rate to midlayer includes. - Get rid of static rate translation in IPoIB; just use static rate directly from Path and MulticastGroup records. - Update mthca driver to translate absolute static rate into the format used by hardware. This also fixes mthca's static rate handling for HCAs that are capable of 4X DDR. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
d2e0655e |
|
04-Apr-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: Consolidate private neighbour data handling Consolidate IPoIB's private neighbour data handling into ipoib_neigh_alloc() and ipoib_neigh_free(). This will make it easier to keep track of the neighbour structures that IPoIB is handling, and is a nice cleanup of the code: add/remove: 2/1 grow/shrink: 1/8 up/down: 100/-178 (-78) function old new delta ipoib_neigh_alloc - 61 +61 ipoib_neigh_free - 36 +36 ipoib_mcast_join_finish 1288 1291 +3 path_rec_completion 575 573 -2 ipoib_mcast_join_task 664 660 -4 ipoib_neigh_destructor 101 92 -9 ipoib_neigh_setup_dev 14 3 -11 ipoib_neigh_setup 17 - -17 path_free 238 215 -23 ipoib_mcast_free 329 306 -23 ipoib_mcast_send 718 684 -34 neigh_add_path 705 650 -55 Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
ef12d456 |
|
29-Mar-2006 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Fix oops with raw sockets ipoib_hard_header() needs to handle the case that daddr is NULL. This can happen when packets are injected via a raw socket, and IPoIB shouldn't oops in this case. Reported by Anton Blanchard <anton@samba.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
7a343d4c |
|
23-Mar-2006 |
Leonid Arsh <leonida@voltaire.com> |
IPoIB: P_Key change event handling This patch causes the network interface to respond to P_Key change events correctly. As a result, you'll see a child interface in the "RUNNING" state (netif_carrier_on()) only when the corresponding P_Key is configured by the SM. When SM removes a P_Key, the "RUNNING" state will be disabled for the corresponding network interface. To implement this, I added IB_EVENT_PKEY_CHANGE event handling. To prevent flushing the device before the device is open by the "delay open" mechanism, I added an additional device flag called IPOIB_FLAG_INITIALIZED. This also prevents the child network interface from trying to join to multicast groups until the PKEY is configured. We used to get error messages like: ib0.f2f2: couldn't attach QP to multicast group ff12:401b:f2f2:0:0:0:ffff:ffff in this case. To fix this, I just check IPOIB_FLAG_OPER_UP flag in ipoib_set_mcast_list(). Signed-off-by: Leonid Arsh <leonida@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
c5ecd62c |
|
20-Mar-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
[NET]: Move destructor from neigh->ops to neigh_params struct neigh_ops currently has a destructor field, which no in-kernel drivers outside of infiniband use. The infiniband/ulp/ipoib in-tree driver stashes some info in the neighbour structure (the results of the second-stage lookup from ARP results to real link-level path), and it uses neigh->ops->destructor to get a callback so it can clean up this extra info when a neighbour is freed. We've run into problems with this: since the destructor is in an ops field that is shared between neighbours that may belong to different net devices, there's no way to set/clear it safely. The following patch moves this field to neigh_parms where it can be safely set, together with its twin neigh_setup. Two additional patches in the patch series update ipoib to use this new interface. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bfef73fa |
|
20-Mar-2006 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: Get rid of useless test of queue length In neigh_add_path(), the queue of delayed packets can never be full, because the queue is always freshly created and cannot be found by any other code path. In fact, the test of the queue length is worse than useless: if somehow the test ever triggered and path_rec_start() also failed, then dev_kfree_skb_any() will be called twice on the same skb. Fix this by deleting the useless test. Pointed out by Michael S. Tsirkin <mst@mellanox.co.il>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
0b3ea082 |
|
20-Mar-2006 |
Jack Morgenstein <jackm@mellanox.co.il> |
IPoIB: Move ipoib_ib_dev_flush() to ipoib workqueue Move ipoib_ib_dev_flush() to ipoib's workqueue. This keeps it ordered with respect to other work scheduled by the ipoib driver. This fixes problems with races, for example: - ipoib_ib_dev_flush() has started running because of an IB event - user does ifconfig ib0 down - ipoib_mcast_stop_thread() gets called twice and waits for the same completion twice Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
47f7a071 |
|
17-Jan-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
IPoIB: Make sure path is fully initialized before using it The SA path record query completion can initialize path->pathrec.dlid before IPoIB's callback runs and initializes path->ah, so we must test ah rather than dlid. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
95ed644f |
|
13-Jan-2006 |
Ingo Molnar <mingo@elte.hu> |
IB: convert from semaphores to mutexes semaphore to mutex conversion by Ingo and Arjan's script. Signed-off-by: Ingo Molnar <mingo@elte.hu> [ Sanity-checked on real IB hardware ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
14c85021 |
|
26-Dec-2005 |
Arnaldo Carvalho de Melo <acme@mandriva.com> |
[INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h To help in reducing the number of include dependencies, several files were touched as they were getting needed headers indirectly for stuff they use. Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had linux/dccp.h include twice. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
267ee88e |
|
29-Nov-2005 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: fix error handling in ipoib_open If ipoib_ib_dev_up() fails after ipoib_ib_dev_open() is called, then ipoib_ib_dev_stop() needs to be called to clean up. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
5872a9fc |
|
29-Nov-2005 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: always set path->query to NULL when query finishes Always set path->query to NULL when the SA path record query completes, rather than only when we don't have an address handle. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
65c7edda |
|
28-Nov-2005 |
Roland Dreier <rolandd@cisco.com> |
IPoIB: reinitialize path struct's completion for every query It's possible that IPoIB will issue multiple SA queries for the same path struct. Therefore the struct's completion needs to be initialized for each query rather than only once when the struct is allocated, or else we might not wait long enough for later queries to finish and free the path struct too soon. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
1732b0ef |
|
07-Nov-2005 |
Roland Dreier <rolandd@cisco.com> |
[IPoIB] add path record information in debugfs Add ibX_path files to debugfs that contain information about the IPoIB path cache. IPoIB ARP only gives GIDs, which the IPoIB driver must resolve to real IB paths through the ib_sa module. For debugging, when the ARP table looks OK but traffic isn't flowing, it's useful to be able to see if the resolution from GID to path worked. Also clean up the formatting of the existing _mcg debugfs files. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
21a38489 |
|
02-Nov-2005 |
Roland Dreier <rolandd@cisco.com> |
[IPoIB] remove unneeded initializations to 0 Shrink our source and .text a little by removing a few assignments of NULL and 0 to memory that is already cleared as part of the allocation. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
de6eb66b |
|
02-Nov-2005 |
Roland Dreier <rolandd@cisco.com> |
[IB] kzalloc() conversions Replace kmalloc()+memset(,0,) with kzalloc(), for a net savings of 35 source lines and about 500 bytes of text. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
a20583a7 |
|
29-Oct-2005 |
Roland Dreier <rolandd@cisco.com> |
[IPoIB] use spin_trylock_irqsave() Use spin_trylock_irqsave() in ipoib_start_xmit() instead of reinventing it out of local_irq_save(), spin_trylock() and local_irq_restore(). Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
1993d683 |
|
28-Oct-2005 |
Roland Dreier <rolandd@cisco.com> |
[IPoIB] Drop RX packets when out of memory Change the way IPoIB handles RX packets when it can't allocate a new receive skbuff. If the allocation of a new receive skb fails, we now drop the packet we just received and repost the original receive skb. This means that the receive ring always stays full and we don't have to monkey around with trying to schedule a refill task for later. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
4b2d319b |
|
18-Oct-2005 |
Roland Dreier <rolandd@cisco.com> |
[IPoIB] Improve ipoib_timeout() output Use jiffies_to_msecs() so we print a human-readable time so we don't have to worry about what HZ is configured to, and print out a few values to make post-mortem analysis easier. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
d70ed607 |
|
28-Sep-2005 |
Roland Dreier <rolandd@cisco.com> |
[IPoIB] Rename IPoIB's path_lookup() to avoid name clashes Rename IPoIB driver's path_lookup() to ipoib_path_lookup() to avoid a clashes with the kernel global path_lookup(). We don't hit this with the current kernel source, but some external patches seem to trigger this, and it's cleaner to avoid clashing with global names anyway. Signed-off-by: Roland Dreier <rolandd@cisco.com> refs/heads/for-linus
|
#
51574e03 |
|
12-Sep-2005 |
Michael S. Tsirkin <mst@mellanox.co.il> |
[PATCH] IPoIB: fix module removal race Since ipoib uses queue_delayed_work to run flush task on port state events, it must flush scheduled work after unregistering the event handler. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
06c56e44 |
|
01-Sep-2005 |
Michael S. Tsirkin <mst@mellanox.co.il> |
[PATCH] IPoIB: fix memory leak Fix IPoIB memory leak on device removal. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
1ad62a19 |
|
24-Aug-2005 |
Michael S. Tsirkin <mst@mellanox.co.il> |
[PATCH] IPoIB: Fix device removal race Currently we may have work scheduled in default kernel workqueue when the device is going down. The device could get freed before this workqueue gets serviced. I am actually seeing this causing system hangs. The following patch fixes this by using ipoib_workqueue which gets flushed when the device is going down. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
4ce05937 |
|
19-Aug-2005 |
Roland Dreier <roland@eddore.topspincom.com> |
[PATCH] IPoIB: Set full membership bit in P_Keys Always make sure that the full membership bit is set in the P_Keys that IPoIB uses. This makes sure that all hosts join the correct multicast groups so that hosts that are partial partition members can talk to the rest of the network. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
2aeba9a0 |
|
15-Aug-2005 |
Olaf Hering <olh@suse.de> |
[PATCH] IB: Remove unnecessary includes of <linux/version.h> changing CONFIG_LOCALVERSION rebuilds too much, for no appearent reason. Remove unneeded includes of <linux/version.h>. Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
97f52eb4 |
|
13-Aug-2005 |
Sean Hefty <sean.hefty@intel.com> |
[PATCH] IB: sparse endianness cleanup Fix sparse warnings. Use __be* where appropriate. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
92a6b34b |
|
13-Aug-2005 |
Hal Rosenstock <halr@voltaire.com> |
[PATCH] IB: Eliminate redundant NULL checks IPoIB: Eliminate NULL checks prior to calling kfree Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
2a1d9b7f |
|
11-Aug-2005 |
Roland Dreier <roland@eddore.topspincom.com> |
[PATCH] IB: Add copyright notices Make some lawyers happy and add copyright notices for people who forgot to include them when they actually touched the code. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
0dca0f7b |
|
28-Jul-2005 |
Hal Rosenstock <halr@voltaire.com> |
[PATCH] [IPoIB] Handle sending of unicast RARP responses RARP replies are another valid case where IPoIB may need to send a unicast packet with no neighbour structure. Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
9adec1a8 |
|
16-Apr-2005 |
Roland Dreier <roland@topspin.com> |
[PATCH] IPoIB: convert to debugfs Convert IPoIB to use debugfs instead of its own custom debugging filesystem. Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e6ded99c |
|
16-Apr-2005 |
Roland Dreier <roland@topspin.com> |
[PATCH] IPoIB: fix static rate calculation Correct and simplify calculation of static rate. We need to round up the quotient of (local_rate - path_rate) / path_rate. To round up we add (path_rate - 1) to the numerator, so the quotient simplifies to (local_rate - 1) / path_rate. No idea how I came up with the old formula. Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|