#
465d6b42 |
|
09-Oct-2023 |
Patrisious Haddad <phaddad@nvidia.com> |
RDMA/core: Add support to set privileged QKEY parameter Add netlink command that enables/disables privileged QKEY by default. It is disabled by default, since according to IB spec only privileged users are allowed to use privileged QKEY. According to the IB specification rel-1.6, section 3.5.3: "QKEYs with the most significant bit set are considered controlled QKEYs, and a HCA does not allow a consumer to arbitrarily specify a controlled QKEY." Using rdma tool, $rdma system set privileged-qkey on When enabled non-privileged users would be able to use controlled QKEYs which are considered privileged. Using rdma tool, $rdma system set privileged-qkey off When disabled only privileged users would be able to use controlled QKEYs. You can also use the command below to check the parameter state: $rdma system show netns shared privileged-qkey off copy-on-fork on Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/90398be70a9d23d2aa9d0f9fd11d2c264c1be534.1696848201.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
d2b10794 |
|
03-Aug-2021 |
Leon Romanovsky <leon@kernel.org> |
RDMA/core: Create clean QP creations interface for uverbs Unify create QP creation interface to make clean approach to create XRC_TGT and regular QPs. Link: https://lore.kernel.org/r/5cd50e7d8ad9112545a1a61dea62799a5cb3224a.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
5507f67d |
|
03-Aug-2021 |
Leon Romanovsky <leon@kernel.org> |
RDMA/core: Properly increment and decrement QP usecnts The QP usecnts were incremented through QP attributes structure while decreased through QP itself. Rely on the ib_creat_qp_user() code that initialized all QP parameters prior returning to the user and increment exactly like destroy does. Link: https://lore.kernel.org/r/25d256a3bb1fc480b77d7fe439817b993de48610.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
8da9fe4e |
|
03-Aug-2021 |
Leon Romanovsky <leon@kernel.org> |
RDMA/core: Reorganize create QP low-level functions The low-level create QP function grew to be larger than any sensible inline function should be. The inline attribute is not really needed for that function and can be implemented as exported symbol. Link: https://lore.kernel.org/r/2c08709d86f876c3dfb77684357b2a939e570ca4.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
8fc3beeb |
|
03-Aug-2021 |
Leon Romanovsky <leon@kernel.org> |
RDMA/core: Delete duplicated and unreachable code The ib_create_named_qp() is kernel verb and no kernel users exist that use XRC_INI QP. Hence such QP path is not reachable. In addition, delete duplicated assignments of QP attributes from the initialization structure. Link: https://lore.kernel.org/r/1b4c0d1def5f8f6d26839e14d19da950cc4a0b05.1628014762.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
514aee66 |
|
23-Jul-2021 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Globally allocate and release QP memory Convert QP object to follow IB/core general allocation scheme. That change allows us to make sure that restrack properly kref the memory. Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com Reviewed-by: Gal Pressman <galpress@amazon.com> #efa Tested-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
c5f8f2c5 |
|
16-Jun-2021 |
Anand Khoje <anand.a.khoje@oracle.com> |
IB/core: Removed port validity check from ib_get_cached_subnet_prefix Removed port validity check from ib_get_cached_subnet_prefix() as this check is not needed because "port_num" is valid. Link: https://lore.kernel.org/r/20210616154509.1047-2-anand.a.khoje@oracle.com Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
526a12c8 |
|
11-Jun-2021 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's This code is trying to attach a list of counters grouped into 4 groups to the ib_port sysfs. Instead of creating a bunch of kobjects simply express everything naturally as an ib_port_attribute and add a single attribute_groups list. Remove all the naked kobject manipulations. Link: https://lore.kernel.org/r/0d5a7241ee0fe66622de04fcbaafaf6a791d5c7c.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
b7066b32 |
|
11-Jun-2021 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/core: Create the device hw_counters through the normal groups mechanism Instead of calling device_add_groups() add the group to the existing groups array which is managed through device_add(). This requires setting up the hw_counters before device_add(), so it gets split up from the already split port sysfs flow. Move all the memory freeing to the release function. Link: https://lore.kernel.org/r/666250d937b64f6fdf45da9e2dc0b6e5e4f7abd8.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
d8a58838 |
|
11-Jun-2021 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer It is much saner to store a pointer to the kobject structure that contains the cannonical stats pointer than to copy the stats pointers into a public structure. Future patches will require the sysfs pointer for other purposes. Link: https://lore.kernel.org/r/f90551dfd296cde1cb507bbef27cca9891d19871.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
1fb7f897 |
|
01-Mar-2021 |
Mark Bloch <mbloch@nvidia.com> |
RDMA: Support more than 255 rdma ports Current code uses many different types when dealing with a port of a RDMA device: u8, unsigned int and u32. Switch to u32 to clean up the logic. This allows us to make (at least) the core view consistent and use the same type. Unfortunately not all places can be converted. Many uverbs functions expect port to be u8 so keep those places in order not to break UAPIs. HW/Spec defined values must also not be changed. With the switch to u32 we now can support devices with more than 255 ports. U32_MAX is reserved to make control logic a bit easier to deal with. As a device with U32_MAX ports probably isn't going to happen any time soon this seems like a non issue. When a device with more than 255 ports is created uverbs will report the RDMA device as having 255 ports as this is the max currently supported. The verbs interface is not changed yet because the IBTA spec limits the port size in too many places to be u8 and all applications that relies in verbs won't be able to cope with this change. At this stage, we are extending the interfaces that are using vendor channel solely Once the limitation is lifted mlx5 in switchdev mode will be able to have thousands of SFs created by the device. As the only instance of an RDMA device that reports more than 255 ports will be a representor device and it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other ULPs aren't effected by this change and their sysfs/interfaces that are exposes to userspace can remain unchanged. While here cleanup some alignment issues and remove unneeded sanity checks (mainly in rdmavt), Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
286e1d3f |
|
08-Dec-2020 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
RDMA/core: Clean up cq pool mechanism The CQ pool mechanism had two problems: 1. The CQ pool lists were uninitialized in the device registration error flow. As a result, all the list pointers remained NULL. This caused the kernel to crash (in procedure ib_cq_pool_destroy) when that error flow was taken (and unregister called). The stack trace snippet: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0×0000) ? not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI . . . RIP: 0010:ib_cq_pool_destroy+0x1b/0×70 [ib_core] . . . Call Trace: disable_device+0x9f/0×130 [ib_core] __ib_unregister_device+0x35/0×90 [ib_core] ib_register_device+0x529/0×610 [ib_core] __mlx5_ib_add+0x3a/0×70 [mlx5_ib] mlx5_add_device+0x87/0×1c0 [mlx5_core] mlx5_register_interface+0x74/0xc0 [mlx5_core] do_one_initcall+0x4b/0×1f4 do_init_module+0x5a/0×223 load_module+0x1938/0×1d40 2. At device unregister, when cleaning up the cq pool, the cq's in the pool lists were freed, but the cq entries were left in the list. The fix for the first issue is to initialize the cq pool lists when the ib_device structure is allocated for a new device (in procedure _ib_alloc_device). The fix for the second problem is to delete cq entries from the pool lists when cleaning up the cq pool. In addition, procedure ib_cq_pool_destroy() is renamed to the more appropriate name ib_cq_pool_cleanup(). Fixes: 4aa1615268a8 ("RDMA/core: Fix ordering of CQ pool destruction") Link: https://lore.kernel.org/r/20201208073545.9723-2-leon@kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
66f57b87 |
|
17-Nov-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA/restrack: Support all QP types The latest changes in restrack name handling allowed to simplify the QP creation code to support all types of QPs. For example XRC QP are presented with rdmatool. $ ibv_xsrq_pingpong & $ rdma res show qp link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 7 type UD state RTS sq-psn 0 comm [mlx5_ib] link ibp0s9/1 lqpn 42 type XRC_TGT state INIT sq-psn 0 path-mig-state MIGRATED comm [ib_uverbs] link ibp0s9/1 lqpn 43 type XRC_INI state INIT sq-psn 0 path-mig-state MIGRATED pdn 197 pid 419 comm ibv_xsrq_pingpong Link: https://lore.kernel.org/r/20201117070148.1974114-4-leon@kernel.org Reviewed-by: Mark Zhang <markz@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
0413755c |
|
04-Nov-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA/restrack: Store all special QPs in restrack DB Special QPs (SMI and GSI) have different rules in regards of their QP numbers. While all other QP numbers are unique per-device, the QP0 and QP1 are created per-port as requested by IBTA. In multiple port devices, the number of SMI and GSI QPs with be equal to the number ports. $ rdma dev 0: ibp0s9: node_type ca fw 4.4.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455 $ rdma link 0/1: ibp0s9/1: subnet_prefix fe80:0000:0000:0000 lid 13397 sm_lid 49151 lmc 0 state ACTIVE physical_state LINK_UP 0/2: ibp0s9/2: subnet_prefix fe80:0000:0000:0000 lid 13397 sm_lid 49151 lmc 0 state UNKNOWN physical_state UNKNOWN Before: $ rdma res show qp type SMI,GSI link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core] After: $ rdma res show qp type SMI,GSI link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core] link ibp0s9/2 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp0s9/2 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core] Link: https://lore.kernel.org/r/20201104144008.3808124-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
b09c4d70 |
|
21-Sep-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA/restrack: Improve readability in task name management Use rdma_restrack_set_name() and rdma_restrack_parent_name() instead of tricky uses of rdma_restrack_attach_task()/rdma_restrack_uadd(). This uniformly makes all restracks add'd using rdma_restrack_add(). Link: https://lore.kernel.org/r/20200922091106.2152715-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
c34a23c2 |
|
21-Sep-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA/restrack: Simplify restrack tracking in kernel flows Have a single rdma_restrack_add() that adds an entry, there is no reason to split the user/kernel here, the rdma_restrack_set_task() is responsible for this difference. This patch prepares the code to the future requirement of making restrack is mandatory for managing ib objects. Link: https://lore.kernel.org/r/20200922091106.2152715-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
13ef5539 |
|
21-Sep-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA/restrack: Count references to the verbs objects Refactor the restrack code to make sure the kref inside the restrack entry properly kref's the object in which it is embedded. This slight change is needed for future conversions of MR and QP which are refcounted before the release and kfree. The ideal flow from ib_core perspective as follows: * Allocate ib_* structure with rdma_zalloc_*. * Set everything that is known to ib_core to that newly created object. * Initialize kref with restrack help * Call to driver specific allocation functions. * Insert into restrack DB .... * Return and release restrack with restrack_put. Largely this means a rdma_restrack_new() should be called near allocating the containing structure. Link: https://lore.kernel.org/r/20200922091106.2152715-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
c7ff819a |
|
27-May-2020 |
Yamin Friedman <yaminf@mellanox.com> |
RDMA/core: Introduce shared CQ pool API Allow a ULP to ask the core to provide a completion queue based on a least-used search on a per-device CQ pools. The device CQ pools grow in a lazy fashion when more CQs are requested. This feature reduces the amount of interrupts when using many QPs. Using shared CQs allows for more effcient completion handling. It also reduces the amount of overhead needed for CQ contexts. Test setup: Intel(R) Xeon(R) Platinum 8176M CPU @ 2.10GHz servers. Running NVMeoF 4KB read IOs over ConnectX-5EX across Spectrum switch. TX-depth = 32. The patch was applied in the nvme driver on both the target and initiator. Four controllers are accessed from each core. In the current test case we have exposed sixteen NVMe namespaces using four different subsystems (four namespaces per subsystem) from one NVM port. Each controller allocated X queues (RDMA QPs) and attached to Y CQs. Before this series we had X == Y, i.e for four controllers we've created total of 4X QPs and 4X CQs. In the shared case, we've created 4X QPs and only X CQs which means that we have four controllers that share a completion queue per core. Until fourteen cores there is no significant change in performance and the number of interrupts per second is less than a million in the current case. ================================================== |Cores|Current KIOPs |Shared KIOPs |improvement| |-----|---------------|--------------|-----------| |14 |2332 |2723 |16.7% | |-----|---------------|--------------|-----------| |20 |2086 |2712 |30% | |-----|---------------|--------------|-----------| |28 |1971 |2669 |35.4% | |================================================= |Cores|Current avg lat|Shared avg lat|improvement| |-----|---------------|--------------|-----------| |14 |767us |657us |14.3% | |-----|---------------|--------------|-----------| |20 |1225us |943us |23% | |-----|---------------|--------------|-----------| |28 |1816us |1341us |26.1% | ======================================================== |Cores|Current interrupts|Shared interrupts|improvement| |-----|------------------|-----------------|-----------| |14 |1.6M/sec |0.4M/sec |72% | |-----|------------------|-----------------|-----------| |20 |2.8M/sec |0.6M/sec |72.4% | |-----|------------------|-----------------|-----------| |28 |2.9M/sec |0.8M/sec |63.4% | ==================================================================== |Cores|Current 99.99th PCTL lat|Shared 99.99th PCTL lat|improvement| |-----|------------------------|-----------------------|-----------| |14 |67ms |6ms |90.9% | |-----|------------------------|-----------------------|-----------| |20 |5ms |6ms |-10% | |-----|------------------------|-----------------------|-----------| |28 |8.7ms |6ms |25.9% | |=================================================================== Performance improvement with sixteen disks (sixteen CQs per core) is comparable. Link: https://lore.kernel.org/r/1590568495-101621-3-git-send-email-yaminf@mellanox.com Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
e38b55ea |
|
27-Feb-2020 |
Maor Gottlieb <maorg@mellanox.com> |
RDMA/core: Fix protection fault in ib_mr_pool_destroy Fix NULL pointer dereference in the error flow of ib_create_qp_user when accessing to uninitialized list pointers - rdma_mrs and sig_mrs. The following crash from syzkaller revealed it. kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 23167 Comm: syz-executor.1 Not tainted 5.5.0-rc5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:ib_mr_pool_destroy+0x81/0x1f0 Code: 00 00 fc ff df 49 c1 ec 03 4d 01 fc e8 a8 ea 72 fe 41 80 3c 24 00 0f 85 62 01 00 00 48 8b 13 48 89 d6 4c 8d 6a c8 48 c1 ee 03 <42> 80 3c 3e 00 0f 85 34 01 00 00 48 8d 7a 08 4c 8b 02 48 89 fe 48 RSP: 0018:ffffc9000951f8b0 EFLAGS: 00010046 RAX: 0000000000040000 RBX: ffff88810f268038 RCX: ffffffff82c41628 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc9000951f850 RBP: ffff88810f268020 R08: 0000000000000004 R09: fffff520012a3f0a R10: 0000000000000001 R11: fffff520012a3f0a R12: ffffed1021e4d007 R13: ffffffffffffffc8 R14: 0000000000000246 R15: dffffc0000000000 FS: 00007f54bc788700(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000116920002 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rdma_rw_cleanup_mrs+0x15/0x30 ib_destroy_qp_user+0x674/0x7d0 ib_create_qp_user+0xb01/0x11c0 create_qp+0x1517/0x2130 ib_uverbs_create_qp+0x13e/0x190 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465b49 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f54bc787c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49 RDX: 0000000000000040 RSI: 0000000020000540 RDI: 0000000000000003 RBP: 00007f54bc787c70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f54bc7886bc R13: 00000000004ca2ec R14: 000000000070ded0 R15: 0000000000000005 Fixes: a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API") Link: https://lore.kernel.org/r/20200227112708.93023-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
620d3f81 |
|
08-Jan-2020 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/core: Do not erase the type of ib_qp.uobject This is a struct ib_uqp_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-8-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
6b57cea9 |
|
12-Dec-2019 |
Parav Pandit <parav@mellanox.com> |
IB/core: Let IB core distribute cache update events Currently when the low level driver notifies Pkey, GID, and port change events they are notified to the registered handlers in the order they are registered. IB core and other ULPs such as IPoIB are interested in GID, LID, Pkey change events. Since all GID queries done by ULPs are serviced by IB core, and the IB core deferes cache updates to a work queue, it is possible for other clients to see stale cache data when they handle their own events. For example, the below call tree shows how ipoib will call rdma_query_gid() concurrently with the update to the cache sitting in the WQ. mlx5_ib_handle_event() ib_dispatch_event() ib_cache_event() queue_work() -> slow cache update [..] ipoib_event() queue_work() [..] work handler ipoib_ib_dev_flush_light() __ipoib_ib_dev_flush() ipoib_dev_addr_changed_valid() rdma_query_gid() <- Returns old GID, cache not updated. Move all the event dispatch to a work queue so that the cache update is always done before any clients are notified. Fixes: f35faa4ba956 ("IB/core: Simplify ib_query_gid to always refer to cache") Link: https://lore.kernel.org/r/20191212113024.336702-3-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
c043ff2c |
|
30-Oct-2019 |
Michal Kalderon <michal.kalderon@marvell.com> |
RDMA: Connect between the mmap entry and the umap_priv structure The rdma_user_mmap_io interface created a common interface for drivers to correctly map hw resources and zap them once the ucontext is destroyed enabling the drivers to safely free the hw resources. However, this meant the drivers need to delay freeing the resource to the ucontext destroy phase to ensure they were no longer mapped. The new mechanism for a common way of handling user/driver address mapping enabled notifying the driver if all umap_priv mappings were removed, and enabled freeing the hw resources when they are done with and not delay it until ucontext destroy. Since not all drivers use the mechanism, NULL can be sent to the rdma_user_mmap_io interface to continue working as before. Drivers that use the mmap_xa interface can pass the entry being mapped to the rdma_user_mmap_io function to be linked together. Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
b86deba9 |
|
30-Oct-2019 |
Michal Kalderon <michal.kalderon@marvell.com> |
RDMA/core: Move core content from ib_uverbs to ib_core Move functionality that is called by the driver, which is related to umap, to a new file that will be linked in ib_core. This is a first step in later enabling ib_uverbs to be optional. vm_ops is now initialized in ib_uverbs_mmap instead of priv_init to avoid having to move all the rdma_umap functions as well. Link: https://lore.kernel.org/r/20191030094417.16866-2-michal.kalderon@marvell.com Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
549af008 |
|
15-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
IB/core: Avoid deadlock during netlink message handling When rdmacm module is not loaded, and when netlink message is received to get char device info, it results into a deadlock due to recursive locking of rdma_nl_mutex with the below call sequence. [..] rdma_nl_rcv() mutex_lock() [..] rdma_nl_rcv_msg() ib_get_client_nl_info() request_module() iw_cm_init() rdma_nl_register() mutex_lock(); <- Deadlock, acquiring mutex again Due to above call sequence, following call trace and deadlock is observed. kernel: __mutex_lock+0x35e/0x860 kernel: ? __mutex_lock+0x129/0x860 kernel: ? rdma_nl_register+0x1a/0x90 [ib_core] kernel: rdma_nl_register+0x1a/0x90 [ib_core] kernel: ? 0xffffffffc029b000 kernel: iw_cm_init+0x34/0x1000 [iw_cm] kernel: do_one_initcall+0x67/0x2d4 kernel: ? kmem_cache_alloc_trace+0x1ec/0x2a0 kernel: do_init_module+0x5a/0x223 kernel: load_module+0x1998/0x1e10 kernel: ? __symbol_put+0x60/0x60 kernel: __do_sys_finit_module+0x94/0xe0 kernel: do_syscall_64+0x5a/0x270 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe process stack trace: [<0>] __request_module+0x1c9/0x460 [<0>] ib_get_client_nl_info+0x5e/0xb0 [ib_core] [<0>] nldev_get_chardev+0x1ac/0x320 [ib_core] [<0>] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core] [<0>] rdma_nl_rcv+0xcd/0x120 [ib_core] [<0>] netlink_unicast+0x179/0x220 [<0>] netlink_sendmsg+0x2f6/0x3f0 [<0>] sock_sendmsg+0x30/0x40 [<0>] ___sys_sendmsg+0x27a/0x290 [<0>] __sys_sendmsg+0x58/0xa0 [<0>] do_syscall_64+0x5a/0x270 [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe To overcome this deadlock and to allow multiple netlink messages to progress in parallel, following scheme is implemented. 1. Split the lock protecting the cb_table into a per-index lock, and make it a rwlock. This lock is used to ensure no callbacks are running after unregistration returns. Since a module will not be registered once it is already running callbacks, this avoids the deadlock. 2. Use smp_store_release() to update the cb_table during registration so that no lock is required. This avoids lockdep problems with thinking all the rwsems are the same lock class. Fixes: 0e2d00eb6fd45 ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload") Link: https://lore.kernel.org/r/20191015080733.18625-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
52e0a118 |
|
01-Aug-2019 |
Gal Pressman <galpress@amazon.com> |
RDMA/restrack: Track driver QP types in resource tracker The check for QP type different than XRC has excluded driver QP types from the resource tracker. As a result, "rdma resource show" user command would not show opened driver QPs which does not reflect the real state of the system. Check QP type explicitly instead of assuming enum values/ordering. Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation") Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190801104354.11417-1-galpress@amazon.com Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
1d2fedd8 |
|
23-Jul-2019 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Support netlink commands in non init_net net namespaces Now that IB core supports RDMA device binding with specific net namespace, enable IB core to accept netlink commands in non init_net namespaces. This is done by having per net namespace netlink socket. At present only netlink device handling client RDMA_NL_NLDEV supports device handling in multiple net namespaces. Hence do not accept netlink messages for other clients in non init_net net namespaces. Link: https://lore.kernel.org/r/20190723070205.6247-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
f8fc8cd9 |
|
08-Jul-2019 |
Yamin Friedman <yaminf@mellanox.com> |
RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink Added parameter in ib_device for enabling dynamic interrupt moderation so that it can be configured in userspace using rdma tool. In order to set adaptive-moderation for an ib device the command is: rdma dev set [DEV] adaptive-moderation [on|off] Please set on/off. rdma dev show 0: mlx5_0: node_type ca fw 16.26.0055 node_guid 248a:0703:00a5:29d0 sys_image_guid 248a:0703:00a5:29d0 adaptive-moderation on rdma resource show cq dev mlx5_0 cqn 0 cqe 1023 users 4 poll-ctx UNBOUND_WORKQUEUE adaptive-moderation off comm [ib_core] Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
0e2d00eb |
|
13-Jun-2019 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload Allow userspace to issue a netlink query against the ib_device for something like "uverbs" and get back the char dev name, inode major/minor, and interface ABI information for "uverbs0". Since we are now in netlink this can also trigger a module autoload to make the uverbs device come into existence. Largely this will let us replace searching and reading inside sysfs to setup devices, and provides an alternative (using driver_id) to device name based provider binding for things like rxe. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
eb15c78b |
|
30-Apr-2019 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Do not invoke init_port on compat devices The driver interface cannot manipulate the sysfs of the compat device, only of the full device so we must avoid calling the driver sysfs APIs on compat devices. This prevents an oops: Call Trace: dump_stack+0x5a/0x73 kobject_init+0x74/0x80 kobject_init_and_add+0x35/0xb0 hfi1_create_port_files+0x6e/0x3c0 [hfi1] ib_setup_port_attrs+0x43b/0x560 [ib_core] add_one_compat_dev+0x16a/0x230 [ib_core] rdma_dev_init_net+0x110/0x160 [ib_core] ops_init+0x38/0xf0 setup_net+0xcf/0x1e0 copy_net_ns+0xb7/0x130 create_new_namespaces+0x11a/0x1b0 unshare_nsproxy_namespaces+0x55/0xa0 ksys_unshare+0x1a7/0x340 __x64_sys_unshare+0xe/0x20 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 5417783eabb2 ("RDMA/core: Support core port attributes in non init_net") Reported-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
1a418f77 |
|
30-Apr-2019 |
Artemy Kovalyov <artemyko@mellanox.com> |
IB/core: Set qp->real_qp before it may be accessed real_qp should be initialized before ib_destroy_qp() is called. ib_destroy_qp() may be called in the error flow if ib_create_qp_security() failed. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
2e5b8a01 |
|
15-Apr-2019 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Add a netlink command to change net namespace of rdma device Provide an option to change the net namespace of a rdma device through a netlink command. When multiple rdma devices exists in a system, and when containers are used, this will limit rdma device visibility to a specified net namespace. An example command to change net namespace of mlx5_1 device to the previously created net namespace 'foo' is: $ ip netns add foo $ rdma dev set mlx5_1 netns foo Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
c87e65cf |
|
11-Mar-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA/cm: Move debug counters to be under relevant IB device The sysfs layout is created by CM incorrectly presented RDMA devices with InfiniBand link layer. Layout of such devices represents device tree of connections. By moving CM statistics to be under relevant port of IB device, we will fix the following issues: * Symlink name - It used device name instead of specific identifier. * Target location - It was supposed to point to PCI-ID/infiniband_cm/ instead of PCI-ID/infiniband/ * Target name - It created extra device file under already existing device folder, e.g. mlx5_0/mlx5_0 * Crash during boot with RDMA persistent naming patches. sysfs: cannot create duplicate filename '/class/infiniband_cm/mlx5_0' CPU: 29 PID: 433 Comm: modprobe Not tainted 5.0.0-rc5+ #178 Call Trace: dump_stack+0xcc/0x180 sysfs_warn_dup.cold.3+0x17/0x2d sysfs_do_create_link_sd.isra.2+0xd0/0xf0 device_add+0x7cb/0x1450 device_create_groups_vargs+0x1ae/0x220 device_create+0x93/0xc0 cm_add_one+0x38f/0xf60 [ib_cm] add_client_context+0x167/0x210 [ib_core] enable_device_and_get+0x230/0x3f0 [ib_core] ib_register_device+0x823/0xbf0 [ib_core] __mlx5_ib_add+0x45/0x150 [mlx5_ib] mlx5_ib_add+0x1b3/0x5e0 [mlx5_ib] mlx5_add_device+0x130/0x3a0 [mlx5_core] mlx5_register_interface+0x1a9/0x270 [mlx5_core] do_one_initcall+0x14f/0x5de do_init_module+0x247/0x7c0 load_module+0x4c2f/0x60d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe After this change: [leonro@server ~]$ ls -al /sys/class/infiniband/ibp0s12f0/ports/1/ drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_rx_duplicates drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_rx_msgs drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_tx_msgs drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_tx_retries Fixes: 110cf374a809 ("infiniband: make cm_device use a struct device and not a kobject.") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
2b34c558 |
|
26-Feb-2019 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Add command to set ib_core device net namspace sharing mode Add netlink command that enables/disables sharing rdma device among multiple net namespaces. Using rdma tool, $rdma sys set netns shared (default mode) When rdma subsystem netns mode is set to shared mode, rdma devices will be accessible in all net namespaces. Using rdma tool, $rdma sys set netns exclusive When rdma subsystem netns mode is set to exclusive mode, devices will be accessible in only one net namespace at any given point of time. If there are any net namespaces other than default init_net exists, while executing this command, it will fail and mode cannot be changed. To change this mode, netlink command is used instead of sysctl, because netlink command allows to auto load a module. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
cb7e0e13 |
|
26-Feb-2019 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Add interface to read device namespace sharing mode Add an interface via netlink command to query whether rdma devices are shared among multiple net namespaces or not. When using RDMAtool, it can be queried as, $rdma system show netns netns shared Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
37eeab55 |
|
26-Feb-2019 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Extend ib_device_get_by_index for net namespace Extend ib_device_get_by_index() API to check device access for net namespace for serving netlink commands. Also enforce net ns check on dumpit commands which iterate over all registered rdma devices and which don't call ib_device_get_by_index(). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
5417783e |
|
26-Feb-2019 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Support core port attributes in non init_net Now that sysfs compatibility layer for non init_net exists, add core port attributes such as pkey and gid table to non init_net ns. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
c2261dd7 |
|
12-Feb-2019 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev The associated netdev should not actually be very dynamic, so for most drivers there is no reason for a callback like this. Provide an API to inform the core code about the net dev affiliation and use a core maintained data structure instead. This allows the core code to be more aware of the ndev relationship which will allow some new APIs based around this. This also uses locking that makes some kind of sense, many drivers had a confusing RCU lock, or missing locking which isn't right. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
5f8f5499 |
|
13-Feb-2019 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Move device addition deletion to device.c Move core device addition and removal from sysfs.c to device.c as device.c is more appropriate place for device management. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
b34b269a |
|
06-Feb-2019 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/device: Ensure that security memory is always freed Since this only frees memory it should be done during the release callback. Otherwise there are possible error flows where it might not get called if registration aborts. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
c66f6741 |
|
02-Feb-2019 |
Daniel Jurgens <danielj@mellanox.com> |
IB/core: Don't register each MAD agent for LSM notifier When creating many MAD agents in a short period of time, receive packet processing can be delayed long enough to cause timeouts while new agents are being added to the atomic notifier chain with IRQs disabled. Notifier chain registration and unregstration is an O(n) operation. With large numbers of MAD agents being created and destroyed simultaneously the CPUs spend too much time with interrupts disabled. Instead of each MAD agent registering for it's own LSM notification, maintain a list of agents internally and register once, this registration already existed for handling the PKeys. This list is write mostly, so a normal spin lock is used vs a read/write lock. All MAD agents must be checked, so a single list is used instead of breaking them down per device. Notifier calls are done under rcu_read_lock, so there isn't a risk of similar packet timeouts while checking the MAD agents security settings when notified. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
d79af724 |
|
10-Jan-2019 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/device: Expose ib_device_try_get(() It turns out future patches need this capability quite widely now, not just for netlink, so provide two global functions to manage the registration lock refcount. This also moves the point the lock becomes 1 to within ib_register_device() so that the semantics of the public API are very sane and clear. Calling ib_device_try_get() will fail on devices that are only allocated but not yet registered. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
|
#
7527a7b1 |
|
17-Jan-2019 |
Parav Pandit <parav@mellanox.com> |
IB/core: Simplify rdma cgroup registration RDMA cgroup registration routine always returns success, so simplify function to be void and run clang formatter over whole CONFIG_CGROUP_RDMA art of core_priv.h. This reduces unwinding error path for regular registration and future net namespace change functionality for rdma device. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
ea4baf7f |
|
18-Dec-2018 |
Parav Pandit <parav@mellanox.com> |
RDMA: Rename port_callback to init_port Most provider routines are callback routines which ib core invokes. _callback suffix doesn't convey information about when such callback is invoked. Therefore, rename port_callback to init_port. Additionally, store the init_port function pointer in ib_device_ops, so that it can be accessed in subsequent patches when binding rdma device to net namespace. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
af8d7037 |
|
17-Dec-2018 |
Shamir Rabinovitch <shamir.rabinovitch@oracle.com> |
RDMA/restrack: Resource-tracker should not use uobject pointers Having uobject pointer embedded in ib core objects is not aligned with a future shared ib_x model. The resource tracker only does this to keep track of user/kernel objects - track this directly instead. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
3023a1e9 |
|
10-Dec-2018 |
Kamal Heib <kamalheib1@gmail.com> |
RDMA: Start use ib_device_ops Make all the required change to start use the ib_device_ops structure. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
01b67117 |
|
15-Nov-2018 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Sync unregistration with netlink commands When the rdma device is getting removed, get resource info can race with device removal, as below: CPU-0 CPU-1 -------- -------- rdma_nl_rcv_msg() nldev_res_get_cq_dumpit() mutex_lock(device_lock); get device reference mutex_unlock(device_lock); [..] ib_unregister_device() /* Valid reference to * device->dev exists. */ ib_dealloc_device() [..] provider->fill_res_entry(); Even though device object is not freed, fill_res_entry() can get called on device which doesn't have a driver anymore. Kernel core device reference count is not sufficient, as this only keeps the structure valid, and doesn't guarantee the driver is still loaded. Similar race can occur with device renaming and device removal, where device_rename() tries to rename a unregistered device. While this is fine for devices of a class which are not net namespace aware, but it is incorrect for net namespace aware class coming in subsequent series. If a class is net namespace aware, then the below [1] call trace is observed in above situation. Therefore, to avoid the race, keep a reference count and let device unregistration wait until all netlink users drop the reference. [1] Call trace: kernfs: ns required in 'infiniband' for 'mlx5_0' WARNING: CPU: 18 PID: 44270 at fs/kernfs/dir.c:842 kernfs_find_ns+0x104/0x120 libahci i2c_core mlxfw libata dca [last unloaded: devlink] RIP: 0010:kernfs_find_ns+0x104/0x120 Call Trace: kernfs_find_and_get_ns+0x2e/0x50 sysfs_rename_link_ns+0x40/0xb0 device_rename+0xb2/0xf0 ib_device_rename+0xb3/0x100 [ib_core] nldev_set_doit+0x165/0x190 [ib_core] rdma_nl_rcv_msg+0x249/0x250 [ib_core] ? netlink_deliver_tap+0x8f/0x3e0 rdma_nl_rcv+0xd6/0x120 [ib_core] netlink_unicast+0x17c/0x230 netlink_sendmsg+0x2f0/0x3e0 sock_sendmsg+0x30/0x40 __sys_sendto+0xdc/0x160 Fixes: da5c85078215 ("RDMA/nldev: add driver-specific resource tracking") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
eeb8df87 |
|
12-Nov-2018 |
Parav Pandit <parav@mellanox.com> |
RDMA/cma: Move cma module specific functions to cma_priv.h Currently several rdma_cm module specific functions are declared in core_priv.h file. Now that we have cma_priv.h file specific to rdma_cm kernel module, move them from core_priv.h to cma_priv.h Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
7d65cbf0 |
|
02-Oct-2018 |
Leon Romanovsky <leon@kernel.org> |
RDMA/core: Increase total number of RDMA ports across all devices IDA adds overhead to store IDs bitmap with maximal value of IDA can be upto 2099202 (IDA_MAX = 0x80000000U / IDA_BITMAP_BITS - 1). However, there is no need to add such enormous number of devices and it is enough for now to limit it to be 8192. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
d21943dd |
|
10-Oct-2018 |
Leon Romanovsky <leon@kernel.org> |
RDMA/core: Implement IB device rename function Generic implementation of IB device rename function. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
0e9d2c19 |
|
04-Sep-2018 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Consider net ns of gid attribute for RoCE When resolving destination address or route, when net namespace is unavailable, refer to the net namespace of the netdevice of the SGID attribute. This is typically the case for requests arriving from the network for RoCE ports. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
d6b1764a |
|
04-Sep-2018 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Introduce rdma_read_gid_attr_ndev_rcu() to check GID attribute Introduce an API rdma_read_gid_attr_ndev_rcu() to return GID attribute netdevice which is in UP state for accessing netdevice's fields such as net namespace and ifindex. This is useful for users who intent to access netdevice fields under rcu lock. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
6aaecd38 |
|
04-Sep-2018 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Simplify roce_resolve_route_from_path() Currently RoCE route resolve functionality is split between two functions. (a) roce_resolve_route_from_path() and its helper function rdma_resolve_ip_route(). Due to this multiple sockaddr src structures are created in both functions with rdma_dev_addr is an interface between the two for checks. Since there is only one user of rdma_resolve_ip_route() as RoCE, combine the functionality of both functions to roce_resolve_route_from_path() and further reduce the scope of rdma_dev_addr to core/addr.c This also allow to extend addr_resolve() in subsequent patch to consider netdev properties of GID in safer way under rcu lock. Additionally src and dst addresses were always provided, so skip the src addr NULL pointer check as they are present on the stack now. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
77addc52 |
|
04-Sep-2018 |
Parav Pandit <parav@mellanox.com> |
RDMA/core: Rename rdma_copy_addr to rdma_copy_src_l2_addr Now that rdma_copy_addr() only copies the source addresses and all callers are interested in copying only source addresses, simplify it to drop the destination address argument. Given that it only copies source layer2 addresses, rename it to rdma_copy_src_l2_addr for better code readability. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
dd81b2c8 |
|
14-Aug-2018 |
Parav Pandit <parav@mellanox.com> |
IB/core: Change filter function return type from int to bool Filter functions returns either 0 or 1, therefore better change their return type from int to bool to reflect the same. Additionally some filter functions have suffix of _filter some doesn't. Make all filter function consistent to have __filter suffix to improve code readability. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
5ef8c0c1 |
|
28-May-2018 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/core: Remove indirection through ib_cache_setup() This once might have made sense when cache.c was in a different module from device.c, but today it just obfuscation. Get rid of the wrappers and call roge_gid_mgmt_init()/cleanup() directly. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
|
#
e41a7c41 |
|
13-Mar-2018 |
Parav Pandit <parav@mellanox.com> |
IB/core: Move rdma_addr_find_l2_eth_by_grh to core_priv.h Before commit [1], rdma_addr_find_l2_eth_by_grh() was an exported function and therefore declaration in include/rdma/ib_addr.h was fine. But now that its scope is limited to ib_core module, its better to have it in core_priv.h. [1] commit 1060f8653414 ("IB/{core/cm}: Fix generating a return AH for RoCEE") Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
a9c06aeba |
|
13-Mar-2018 |
Parav Pandit <parav@mellanox.com> |
IB/core: Remove rdma_resolve_ip_route() as exported symbol rdma_resolve_ip_route() is used only by ib_core module. Therefore it is removed as an exported symbol. Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
2f08ee36 |
|
14-Feb-2018 |
Steve Wise <larrystevenwise@gmail.com> |
RDMA/restrack: don't use uaccess_kernel() uaccess_kernel() isn't sufficient to determine if an rdma resource is user-mode or not. For example, resources allocated in the add_one() function of an ib_client get falsely labeled as user mode, when they are kernel mode allocations. EG: mad qps. The result is that these qps are skipped over during a nldev query because of an erroneous namespace mismatch. So now we determine if the resource is user-mode by looking at the object struct's uobject or similar pointer to know if it was allocated for user mode applications. Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
21885586 |
|
14-Feb-2018 |
Leon Romanovsky <leon@kernel.org> |
RDMA/verbs: Check existence of function prior to accessing it Update all the flows to ensure that function pointer exists prior to accessing it. This is much safer than checking the uverbs_ex_mask variable, especially since we know that test isn't working properly and will be removed in -next. This prevents a user triggereable oops. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
78a0cd64 |
|
28-Jan-2018 |
Leon Romanovsky <leon@kernel.org> |
RDMA/core: Add resource tracking for create and destroy QPs Track create and destroy operations of QP objects. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
02d8883f |
|
28-Jan-2018 |
Leon Romanovsky <leon@kernel.org> |
RDMA/restrack: Add general infrastructure to track RDMA resources The RDMA subsystem has very strict set of objects to work with, but it completely lacks tracking facilities and has no visibility of resource utilization. The following patch adds such infrastructure to keep track of RDMA resources to help with debugging of user space applications. The primary user of this infrastructure is RDMA nldev netlink (following patches), to be exposed to userspace via rdmatool, but it is not limited too that. At this stage, the main three objects (PD, CQ and QP) are added, and more will be added later. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
8cf12d77 |
|
07-Jan-2018 |
Huy Nguyen <huyn@mellanox.com> |
IB/core: Increase number of char device minors There is a need to increase number of possible char devices to support large number of SR-IOV instances. The current limit is in the range of 64-128 devices/ports. Increase it to support up to 1024. The patch performs the following steps to refactor the code: 1. Removes the split bitmap for fixed and overflow dev numbers. 2. Pre-allocates the non-legacy major number range during driver initialization, choosen for simplicity. 3. Add new define (RDMA_MAX_PORTS) that is shared between all drivers. This is the maximum total number of ports on all struct ib_devices. 4. Set RDMA_MAX_PORTS to 1024. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
32f69e4b |
|
04-Jan-2018 |
Daniel Jurgens <danielj@mellanox.com> |
{net, IB}/mlx5: Manage port association for multiport RoCE When mlx5_ib_add is called determine if the mlx5 core device being added is capable of dual port RoCE operation. If it is, determine whether it is a master device or a slave device using the num_vhca_ports and affiliate_nic_vport_criteria capabilities. If the device is a slave, attempt to find a master device to affiliate it with. Devices that can be affiliated will share a system image guid. If none are found place it on a list of unaffiliated ports. If a master is found bind the port to it by configuring the port affiliation in the NIC vport context. Similarly when mlx5_ib_remove is called determine the port type. If it's a slave port, unaffiliate it from the master device, otherwise just remove it from the unaffiliated port list. The IB device is registered as a multiport device, even if a 2nd port is not available for affiliation. When the 2nd port is affiliated later the GID cache must be refreshed in order to get the default GIDs for the 2nd port in the cache. Export roce_rescan_device to provide a mechanism to refresh the cache after a new port is bound. In a multiport configuration all IB object (QP, MR, PD, etc) related commands should flow through the master mlx5_core_dev, other commands must be sent to the slave port mlx5_core_mdev, an interface is provide to get the correct mdev for non IB object commands. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
908d6460 |
|
04-Jan-2018 |
Daniel Jurgens <danielj@mellanox.com> |
IB/core: Change roce_rescan_device to return void It always returns 0. Change return type to void. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
f8978bd9 |
|
01-Jan-2018 |
Leon Romanovsky <leon@kernel.org> |
RDMA/netlink: Fix locking around __ib_get_device_by_index Holding locks is mandatory when calling __ib_device_get_by_index, otherwise there are races during the list iteration with device removal. Since the locks are static to device.c, __ib_device_get_by_index can never be called correctly by any user out side the file. Make the function static and provide a safe function that gets the correct locks and returns a kref'd pointer. Fix all callers. Fixes: e5c9469efcb1 ("RDMA/netlink: Add nldev device doit implementation") Fixes: c3f66f7b0052 ("RDMA/netlink: Implement nldev port doit callback") Fixes: 7d02f605f0dc ("RDMA/netlink: Add nldev port dumpit implementation") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
efac5ac0 |
|
27-Dec-2017 |
Randy Dunlap <rdunlap@infradead.org> |
infiniband: drop unknown function from core_priv.h Delete ibnl_chk_listeners() and its kernel-doc comments from the core_priv.h header file. There is no such function. Fixes: 233c1955835b ("RDMA/netlink: Reduce exposure of RDMA netlink functions") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
df8441c6 |
|
14-Nov-2017 |
Parav Pandit <parav@mellanox.com> |
IB/core: Avoid exporting module internal function ib_security_modify_qp and ib_security_pkey_access are core internal function. So avoid exporting them. ib_security_pkey_access is used only when secuirty hooks are enabled so avoid defining it otherwise. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
647c75ac |
|
15-Jun-2017 |
Leon Romanovsky <leon@kernel.org> |
RDMA/netlink: Convert LS to doit callback RDMA_NL_LS protocol is actually does not dump anything, but sets data and it should be handled by doit callback. This patch actually converts RDMA_NL_LS to doit callback, while preserving IWCM and RDMA_CM flows through netlink_dump_start(). Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
|
#
ecc82c53 |
|
18-Jun-2017 |
Leon Romanovsky <leon@kernel.org> |
RDMA/core: Add and expose static device index This patch adds static device index in similar fashion to already available in netdev world (struct net->ifindex). In downstream patches, the RDMA nelink will use this idx-to-ib_device conversion, so as part of this commit, we are exposing the translation function to be visible for IB/core users. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
8030c835 |
|
19-Jun-2017 |
Leon Romanovsky <leon@kernel.org> |
RDMA/core: Add iterator over ib_devices The coming nldev needs to iterate over all IB devices in the system and in order to not expose the ib_devices list outside the devices.c, it is necessary to provide function iterator. Current version is written explicitly for nldev callback to avoid over-engineering at this stage, but it can be easily extended for other types. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
|
#
c9901724 |
|
05-Jun-2017 |
Leon Romanovsky <leon@kernel.org> |
RDMA/netlink: Remove netlink clients infrastructure RDMA netlink has a complicated infrastructure for dynamically registering and de-registering netlink clients to the NETLINK_RDMA group. The complicated portion of this code is not widely used because 2 of the 3 current clients are statically compiled together with netlink.c. The infrastructure, therefore, is deemed overkill. Refactor the code to eliminate the dynamically added clients. Now all clients are pre-registered in a client array at compile time, and at run time they merely check-in with the infrastructure to pass their callback table for inclusion in the pre-sized client array. This also allows for future cleanups and removal of unneeded code in the iwcm* netlink handler. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
|
#
582faf31 |
|
08-Jun-2017 |
Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> |
IB/core: Change port_attr.lid size from 16 to 32 bits lid field in struct ib_port_attr is increased to 32 bits. This enables core components to use larger LIDs if needed. The user ABI is unchanged and return 16 bit values when queried. Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
233c1955 |
|
14-May-2017 |
Leon Romanovsky <leon@kernel.org> |
RDMA/netlink: Reduce exposure of RDMA netlink functions RDMA netlink is part of ib_core, hence ibnl_chk_listeners(), ibnl_init() and ibnl_cleanup() don't need to be published in public header file. Let's remove EXPORT_SYMBOL from ibnl_chk_listeners() and move all these functions to private header file. CC: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
47a2b338 |
|
19-May-2017 |
Daniel Jurgens <danielj@mellanox.com> |
IB/core: Enforce security on management datagrams Allocate and free a security context when creating and destroying a MAD agent. This context is used for controlling access to PKeys and sending and receiving SMPs. When sending or receiving a MAD check that the agent has permission to access the PKey for the Subnet Prefix of the port. During MAD and snoop agent registration for SMI QPs check that the calling process has permission to access the manage the subnet and register a callback with the LSM to be notified of policy changes. When notificaiton of a policy change occurs recheck permission and set a flag indicating sending and receiving SMPs is allowed. When sending and receiving MADs check that the agent has access to the SMI if it's on an SMI QP. Because security policy can change it's possible permission was allowed when creating the agent, but no longer is. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Acked-by: Doug Ledford <dledford@redhat.com> [PM: remove the LSM hook init code] Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
d291f1a6 |
|
19-May-2017 |
Daniel Jurgens <danielj@mellanox.com> |
IB/core: Enforce PKey security on QPs Add new LSM hooks to allocate and free security contexts and check for permission to access a PKey. Allocate and free a security context when creating and destroying a QP. This context is used for controlling access to PKeys. When a request is made to modify a QP that changes the port, PKey index, or alternate path, check that the QP has permission for the PKey in the PKey table index on the subnet prefix of the port. If the QP is shared make sure all handles to the QP also have access. Store which port and PKey index a QP is using. After the reset to init transition the user can modify the port, PKey index and alternate path independently. So port and PKey settings changes can be a merge of the previous settings and the new ones. In order to maintain access control if there are PKey table or subnet prefix change keep a list of all QPs are using each PKey index on each port. If a change occurs all QPs using that device and port must have access enforced for the new cache settings. These changes add a transaction to the QP modify process. Association with the old port and PKey index must be maintained if the modify fails, and must be removed if it succeeds. Association with the new port and PKey index must be established prior to the modify and removed if the modify fails. 1. When a QP is modified to a particular Port, PKey index or alternate path insert that QP into the appropriate lists. 2. Check permission to access the new settings. 3. If step 2 grants access attempt to modify the QP. 4a. If steps 2 and 3 succeed remove any prior associations. 4b. If ether fails remove the new setting associations. If a PKey table or subnet prefix changes walk the list of QPs and check that they have permission. If not send the QP to the error state and raise a fatal error event. If it's a shared QP make sure all the QPs that share the real_qp have permission as well. If the QP that owns a security structure is denied access the security structure is marked as such and the QP is added to an error_list. Once the moving the QP to error is complete the security structure mark is cleared. Maintaining the lists correctly turns QP destroy into a transaction. The hardware driver for the device frees the ib_qp structure, so while the destroy is in progress the ib_qp pointer in the ib_qp_security struct is undefined. When the destroy process begins the ib_qp_security structure is marked as destroying. This prevents any action from being taken on the QP pointer. After the QP is destroyed successfully it could still listed on an error_list wait for it to be processed by that flow before cleaning up the structure. If the destroy fails the QPs port and PKey settings are reinserted into the appropriate lists, the destroying flag is cleared, and access control is enforced, in case there were any cache changes during the destroy flow. To keep the security changes isolated a new file is used to hold security related functionality. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Acked-by: Doug Ledford <dledford@redhat.com> [PM: merge fixup in ib_verbs.h and uverbs_cmd.c] Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
883c71fe |
|
19-May-2017 |
Daniel Jurgens <danielj@mellanox.com> |
IB/core: IB cache enhancements to support Infiniband security Cache the subnet prefix and add a function to access it. Enforcing security requires frequent queries of the subnet prefix and the pkeys in the pkey table. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
89052d78 |
|
13-Feb-2017 |
Majd Dibbiny <majd@mellanox.com> |
IB/cma: Add default RoCE TOS to CMA configfs Add new entry to the RDMA-CM configfs that allows users to select default TOS for RDMA-CM QPs. This is useful for users that want to control the TOS for legacy applications without changing their code. Application that sets the TOS explicitly using the rdma_set_option API will continue to work as expected, meaning overriding the configfs value. CC: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
43579b5f |
|
09-Jan-2017 |
Parav Pandit <pandit.parav@gmail.com> |
IB/core: added support to use rdma cgroup controller Added support APIs for IB core to register/unregister every IB/RDMA device with rdma cgroup for tracking rdma resources. IB core registers with rdma cgroup controller. Added support APIs for uverbs layer to make use of rdma controller. Added uverbs layer to perform resource charge/uncharge functionality. Added support during query_device uverb operation to ensure it returns resource limits by honoring rdma cgroup configured limits. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
|
#
c90ea9d8 |
|
22-Nov-2016 |
Moni Shoua <monis@mellanox.com> |
IB/core: Change ib_resolve_eth_dmac to use it in create AH The function ib_resolve_eth_dmac() requires struct qp_attr * and qp_attr_mask as parameters while the function might be useful to resolve dmac for address handles. This patch changes the signature of the function so it can be used in the flow of creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
453d3932 |
|
17-Oct-2016 |
David Ahern <dsa@cumulusnetworks.com> |
IB/core: Flip to the new dev walk API Convert rdma_is_upper_dev_rcu, handle_netdev_upper and ipoib_get_net_dev_match_addr to the new upper device walk API. This is just a code conversion; no functional change is intended. v2 - removed typecast of data Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae43f828 |
|
19-May-2016 |
Mark Bloch <markb@mellanox.com> |
IB/core: Add IP to GID netlink offload There is an assumption that rdmacm is used only between nodes in the same IB subnet, this why ARP resolution can be used to turn IP to GID in rdmacm. When dealing with IB communication between subnets this assumption is no longer valid. ARP resolution will get us the next hop device address and not the peer node's device address. To solve this issue, we will check user space if it can provide the GID of the peer node, and fail if not. We add a sequence number to identify each request and fill in the GID upon answer from userspace. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
735c631a |
|
19-May-2016 |
Mark Bloch <markb@mellanox.com> |
IB/core: Register SA ibnl client during ib_core initialization Move SA ibnl client registration to ib_core module init. This will allow us to register a single client to handle all RDMA_NL_LS operations and make it SA independent. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
c2e49c92 |
|
19-May-2016 |
Mark Bloch <markb@mellanox.com> |
IB/SA: Integrate ib_sa module into ib_core module Consolidate ib_sa into ib_core, this commit eliminates ib_sa.ko and makes it part of ib_core.ko Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
4c2cb422 |
|
19-May-2016 |
Mark Bloch <markb@mellanox.com> |
IB/MAD: Integrate ib_mad module into ib_core module Consolidate ib_mad into ib_core, this commit eliminates ib_mad.ko and makes it part of ib_core.ko Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e3f20f02 |
|
19-May-2016 |
Leon Romanovsky <leon@kernel.org> |
IB/core: Integrate IB address resolution module into core IB address resolution is declared as a module (ib_addr.ko) which loads itself before IB core module (ib_core.ko). It causes to the scenario where IB netlink which is initialized by IB core can't be used by ib_addr.ko. In order to solve it, we are converting ib_addr.ko to be part of IB core module. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
045959db |
|
23-Dec-2015 |
Matan Barak <matanb@mellanox.com> |
IB/cma: Add configfs for rdma_cm Users would like to control the behaviour of rdma_cm. For example, old applications which don't set the required RoCE gid type could be executed on RoCE V2 network types. In order to support this configuration, we implement a configfs for rdma_cm. In order to use the configfs, one needs to mount it and mkdir <IB device name> inside rdma_cm directory. The patch adds support for a single configuration file, default_roce_mode. The mode can either be "IB/RoCE v1" or "RoCE v2". Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
218a773f |
|
23-Dec-2015 |
Matan Barak <matanb@mellanox.com> |
IB/rdma_cm: Add wrapper for cma reference count Currently, cma users can't increase or decrease the cma reference count. This is necassary when setting cma attributes (like the default GID type) in order to avoid use-after-free errors. Adding cma_ref_dev and cma_deref_dev APIs. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
6020d7e5 |
|
23-Dec-2015 |
Matan Barak <matanb@mellanox.com> |
IB/core: Move rdma_is_upper_dev_rcu to header file In order to validate the route, we need an easy way to check if a net-device belongs to our RDMA device. Move this helper function to a header file in order to make this check easier. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
b39ffa1d |
|
23-Dec-2015 |
Matan Barak <matanb@mellanox.com> |
IB/core: Add gid_type to gid attribute In order to support multiple GID types, we need to store the gid_type with each GID. This is also aligned with the RoCE v2 annex "RoCEv2 PORT GID table entries shall have a "GID type" attribute that denotes the L3 Address type". The currently supported GID is IB_GID_TYPE_IB which is also RoCE v1 GID type. This implies that gid_type should be added to roce_gid_table meta-data. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
dbf727de |
|
15-Oct-2015 |
Matan Barak <matanb@mellanox.com> |
IB/core: Use GID table in AH creation and dmac resolution Previously, vlan id and source MAC were used from QP attributes. Since the net device is now stored in the GID attributes, they could be used instead of getting this information from the QP attributes. IB_QP_SMAC, IB_QP_ALT_SMAC, IB_QP_VID and IB_QP_ALT_VID were removed because there is no known libibverbs that uses them. This commit also modifies the vendors (mlx4, ocrdma) drivers in order to use the new approach. ocrdma driver changes were done by Somnath Kotur <Somnath.Kotur@Avagotech.Com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
d300ec52 |
|
15-Oct-2015 |
Matan Barak <matanb@mellanox.com> |
IB/core: Expose and rename ib_find_cached_gid_by_port cache API Sometime consumers might want to search for a GID in a specific port. For example, when a WC arrives and we want to search the GID that matches that port - it's better to search only the relevant port. Exposing and renaming ib_cache_gid_find_by_port in order to match the naming convention of the module. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
03db3a2d |
|
30-Jul-2015 |
Matan Barak <matanb@mellanox.com> |
IB/core: Add RoCE GID table management RoCE GIDs are based on IP addresses configured on Ethernet net-devices which relate to the RDMA (RoCE) device port. Currently, each of the low-level drivers that support RoCE (ocrdma, mlx4) manages its own RoCE port GID table. As there's nothing which is essentially vendor specific, we generalize that, and enhance the RDMA core GID cache to do this job. In order to populate the GID table, we listen for events: (a) netdev up/down/change_addr events - if a netdev is built onto our RoCE device, we need to add/delete its IPs. This involves adding all GIDs related to this ndev, add default GIDs, etc. (b) inet events - add new GIDs (according to the IP addresses) to the table. For programming the port RoCE GID table, providers must implement the add_gid and del_gid callbacks. RoCE GID management requires us to state the associated net_device alongside the GID. This information is necessary in order to manage the GID table. For example, when a net_device is removed, its associated GIDs need to be removed as well. RoCE mandates generating a default GID for each port, based on the related net-device's IPv6 link local. In contrast to the GID based on the regular IPv6 link-local (as we generate GID per IP address), the default GID is also available when the net device is down (in order to support loopback). Locking is done as follows: The patch modify the GID table code both for new RoCE drivers implementing the add_gid/del_gid callbacks and for current RoCE and IB drivers that do not. The flows for updating the table are different, so the locking requirements are too. While updating RoCE GID table, protection against multiple writers is achieved via mutex_lock(&table->lock). Since writing to a table requires us to find an entry (possible a free entry) in the table and then modify it, this mutex protects both the find_gid and write_gid ensuring the atomicity of the action. Each entry in the GID cache is protected by rwlock. In RoCE, writing (usually results from netdev notifier) involves invoking the vendor's add_gid and del_gid callbacks, which could sleep. Therefore, an invalid flag is added for each entry. Updates for RoCE are done via a workqueue, thus sleeping is permitted. In IB, updates are done in write_lock_irq(&device->cache.lock), thus write_gid isn't allowed to sleep and add_gid/del_gid are not called. When passing net-device into/out-of the GID cache, the device is always passed held (dev_hold). The code uses a single work item for updating all RDMA devices, following a netdev or inet notifier. The patch moves the cache from being a client (which was incorrect, as the cache is part of the IB infrastructure) to being explicitly initialized/freed when a device is registered/removed. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
55aeed06 |
|
04-Aug-2015 |
Jason Gunthorpe <jgg@ziepe.ca> |
IB/core: Make ib_alloc_device init the kobject This gets rid of the weird in-between state where struct ib_device was allocated but the kobject didn't work. Consequently ib_device_release is now guaranteed to be called in all situations and we needn't duplicate its kfrees on error paths. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ed4c54e5 |
|
12-Dec-2013 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/core: Resolve Ethernet L2 addresses when modifying QP Existing user space applications provide only IBoE L3 address attributes to the kernel when they issue a modify QP modify. To work with them and let such apps (plus kernel consumers which don't use the RDMA-CM) keep working transparently under the IBoE GID IP addressing changes, add an Eth L2 address resolution helper. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
9a6edb60 |
|
06-May-2010 |
Ralph Campbell <ralph.campbell@qlogic.com> |
IB/core: Allow device-specific per-port sysfs files Add a new parameter to ib_register_device() so that low-level device drivers can pass in a pointer to a callback function that will be called for each port that is registered in sysfs. This allows low-level device drivers to create files in /sys/class/infiniband/<hca>/ports/<N>/ without having to poke through the internals of the RDMA sysfs handling. There is no need for an unregister function since the kobject reference will go to zero when ib_unregister_device() is called. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
f3781d2e |
|
15-Jul-2008 |
Roland Dreier <rolandd@cisco.com> |
RDMA: Remove subversion $Id tags They don't get updated by git and so they're worse than useless. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
a4d61e84 |
|
25-Aug-2005 |
Roland Dreier <roland@eddore.topspincom.com> |
[PATCH] IB: move include files to include/rdma Move the InfiniBand headers from drivers/infiniband/include to include/rdma. This allows InfiniBand-using code to live elsewhere, and lets us remove the ugly EXTRA_CFLAGS include path from the InfiniBand Makefiles. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|