#
aafe4cc5 |
|
02-Feb-2024 |
Guoqing Jiang <guoqing.jiang@linux.dev> |
RDMA/rxe: Remove unused 'iova' parameter from rxe_mr_init_user This one is not needed since commit 954afc5a8fd8 ("RDMA/rxe: Use members of generic struct in rxe_mr"). Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20240202124144.16033-1-guoqing.jiang@linux.dev Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
64827180 |
|
09-Jan-2024 |
Li Zhijian <lizhijian@fujitsu.com> |
RDMA/rxe: Improve newline in printing messages Previously rxe_{dbg,info,err}() macros are appended built-in newline, but some users will add redundant newline sometimes. So remove the built-in newline for these macros. In terms of rxe_{dbg,info,err}_xxx() macros, because they don't have built-in newline, append newline when using them. CC: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20240109083253.3629967-1-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
86a3fb55 |
|
30-May-2023 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Let rkey == lkey for local access In order to conform to other drivers stop using rkey == 0 as an indication that there are no remote access flags set. Set rkey == lkey by default for all MRs. Link: https://lore.kernel.org/r/20230530221334.89432-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
d11442c6 |
|
30-May-2023 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Rename IB_ACCESS_REMOTE Rename IB_ACCESS_REMOTE to RXE_ACCESS_REMOTE and move to rxe_verbs.h as an enum instead of a #define. Shouldn't use IB_xxx for rxe symbols. Link: https://lore.kernel.org/r/20230530221334.89432-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
8d7c7c0e |
|
14-Apr-2023 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA: Add ib_virt_dma_to_page() Make it clearer what is going on by adding a function to go back from the "virtual" dma_addr to a kva and another to a struct page. This is used in the ib_uses_virt_dma() style drivers (siw, rxe, hfi, qib). Call them instead of a naked casting and virt_to_page() when working with dma_addr values encoded by the various ib_map functions. This also fixes the virt_to_page() casting problem Linus Walleij has been chasing. Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v2-05ea785520ed+10-ib_virt_page_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
5bf944f2 |
|
03-Mar-2023 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Add error messages This patch adds error and debug messages so that every interaction with rdma-core through a verbs API call or a completion error return will generate at least one error message backed up by debug messages with more detail. With dynamic debugging one can follow up after seeing an error message by turning on the appropriate debug messages. Link: https://lore.kernel.org/r/20230303221623.8053-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
72a03627 |
|
13-Feb-2023 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Remove rxe_alloc() Currently all the object types in the rxe driver are allocated in rdma-core except for MRs. By moving tha kzalloc() call outside of the pool code the rxe_alloc() subroutine can be eliminated and code checking for MR as a special case can be removed. This patch moves the kzalloc() and kfree_rcu() calls into the mr registration and destruction verbs. It removes that code from rxe_pool.c including the rxe_alloc() subroutine which is no longer used. Link: https://lore.kernel.org/r/20230213225551.12437-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
5ff31dfc |
|
01-Feb-2023 |
Bob Pearson <rpearsonhpe@gmail.com> |
Subject: RDMA/rxe: Handle zero length rdma Currently the rxe driver does not handle all cases of zero length rdma operations correctly. The client does not have to provide an rkey for zero length RDMA read or write operations so the rkey provided may be invalid and should not be used to lookup an mr. This patch corrects the driver to ignore the provided rkey if the reth length is zero for read or write operations and make sure to set the mr to NULL. In read_reply() if length is zero rxe_recheck_mr() is not called. Warnings are added in the routines in rxe_mr.c to catch NULL MRs when the length is non-zero. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20230202044240.6304-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
592627cc |
|
19-Jan-2023 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray Replace struct rxe-phys_buf and struct rxe_map by struct xarray in rxe_verbs.h. This allows using rcu locking on reads for the memory maps stored in each mr. This is based off of a sketch of a patch from Jason Gunthorpe in the link below. Some changes were needed to make this work. It applies cleanly to the current for-next and passes the pyverbs, perftest and the same blktests test cases which run today. Link: https://lore.kernel.org/r/20230119235936.19728-7-rpearsonhpe@gmail.com Link: https://lore.kernel.org/linux-rdma/Y3gvZr6%2FNCii9Avy@nvidia.com/ Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
325a7eb8 |
|
19-Jan-2023 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Cleanup page variables in rxe_mr.c Cleanup usage of mr->page_shift and mr->page_mask and introduce an extractor for mr->ibmr.page_size. Normal usage in the kernel has page_mask masking out offset in page rather than masking out the page number. The rxe driver had reversed that which was confusing. Implicitly there can be a per mr page_size which was not uniformly supported. Link: https://lore.kernel.org/r/20230119235936.19728-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
d8bdb0eb |
|
19-Jan-2023 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA-rxe: Isolate mr code from atomic_write_reply() Isolate mr specific code from atomic_write_reply() in rxe_resp.c into a subroutine rxe_mr_do_atomic_write() in rxe_mr.c. Check length for atomic write operation. Make iova_to_vaddr() static. Link: https://lore.kernel.org/r/20230119235936.19728-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
f04d5b3d |
|
19-Jan-2023 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA-rxe: Isolate mr code from atomic_reply() Isolate mr specific code from atomic_reply() in rxe_resp.c into a subroutine rxe_mr_do_atomic_op() in rxe_mr.c. Minor cleanups to rxe_check_range() and iova_to_vaddr(). Move enum resp_state to rxe.h Link: https://lore.kernel.org/r/20230119235936.19728-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
db4729a5 |
|
19-Jan-2023 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Move rxe_map_mr_sg to rxe_mr.c Move rxe_map_mr_sg() to rxe_mr.c where it makes a little more sense. Link: https://lore.kernel.org/r/20230119235936.19728-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
ade58da2 |
|
19-Jan-2023 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Cleanup mr_check_range Remove blank lines and replace EFAULT by EINVAL when an invalid mr type is used. Link: https://lore.kernel.org/r/20230119235936.19728-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
ea1bb00e |
|
06-Dec-2022 |
Li Zhijian <lizhijian@fujitsu.com> |
RDMA/rxe: Implement flush execution in responder side Only the requested placement types that also registered in the destination memory region are acceptable. Otherwise, responder will also reply NAK "Remote Access Error" if it found a placement type violation. We will persist data via arch_wb_cache_pmem(), which could be architecture specific. This commit also adds 2 helpers to update qp.resp from the incoming packet. Link: https://lore.kernel.org/r/20221206130201.30986-8-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
02ea0a51 |
|
06-Dec-2022 |
Li Zhijian <lizhijian@fujitsu.com> |
RDMA/rxe: Allow registering persistent flag for pmem MR only Memory region could support at most 2 flush access flags: IB_ACCESS_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL But we only allow user to register persistent flush flags to the pmem MR where it has the ability of persisting data across power cycles. So registering a persistent access flag to a non-pmem MR will be rejected. Link: https://lore.kernel.org/r/20221206130201.30986-5-lizhijian@fujitsu.com CC: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
cb6562c3 |
|
22-Nov-2022 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/rxe: Do not NULL deref on debugging failure path Correct the mistake, mr is obviously NULL in this code path. Fixes: 2778b72b1df0 ("RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mr.c") Link: https://lore.kernel.org/r/Y3eeJW0AdyJYhYyQ@kili Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
7d984dac |
|
29-Oct-2022 |
Li Zhijian <lizhijian@fujitsu.com> |
RDMA/rxe: Fix mr->map double free rxe_mr_cleanup() which tries to free mr->map again will be called when rxe_mr_init_user() fails: CPU: 0 PID: 4917 Comm: rdma_flush_serv Kdump: loaded Not tainted 6.1.0-rc1-roce-flush+ #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x5d panic+0x19e/0x349 end_report.part.0+0x54/0x7c kasan_report.cold+0xa/0xf rxe_mr_cleanup+0x9d/0xf0 [rdma_rxe] __rxe_cleanup+0x10a/0x1e0 [rdma_rxe] rxe_reg_user_mr+0xb7/0xd0 [rdma_rxe] ib_uverbs_reg_mr+0x26a/0x480 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x1a2/0x250 [ib_uverbs] ib_uverbs_cmd_verbs+0x1397/0x15a0 [ib_uverbs] This issue was firstly exposed since commit b18c7da63fcb ("RDMA/rxe: Fix memory leak in error path code") and then we fixed it in commit 8ff5f5d9d8cf ("RDMA/rxe: Prevent double freeing rxe_map_set()") but this fix was reverted together at last by commit 1e75550648da (Revert "RDMA/rxe: Create duplicate mapping tables for FMRs") Simply let rxe_mr_cleanup() always handle freeing the mr->map once it is successfully allocated. Fixes: 1e75550648da ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"") Link: https://lore.kernel.org/r/1667099073-2-1-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
2778b72b |
|
02-Nov-2022 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mr.c Replace calls to pr_xxx() in rxe_mr.c by rxe_dbg_mr(). Link: https://lore.kernel.org/r/20221103171013.20659-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
b071850e |
|
27-Oct-2022 |
Xiao Yang <yangx.jy@fujitsu.com> |
RDMA/rxe: Remove the duplicate assignment of mr->map_shift mr->map_shift is set to ilog2(RXE_BUF_PER_MAP) in both rxe_mr_init() and rxe_mr_alloc() so remove the duplicate one in rxe_mr_init(). Link: https://lore.kernel.org/r/1666855893-145-1-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
875ab4a8 |
|
26-Sep-2022 |
Li Zhijian <lizhijian@fujitsu.com> |
RDMA/rxe: Make sure requested access is a subset of {mr,mw}->access We should reject the requests with access flags that is not registered by MR/MW. For example, lookup_mr() should return NULL when requested access is 0x03 and mr->access is 0x01. Link: https://lore.kernel.org/r/20220927055337.22630-2-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
71d23639 |
|
21-Oct-2022 |
yangx.jy@fujitsu.com <yangx.jy@fujitsu.com> |
RDMA/rxe: Remove the member 'type' of struct rxe_mr The member 'type' is included in both struct rxe_mr and struct ib_mr so remove the duplicate one of struct rxe_mr. Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Link: https://lore.kernel.org/r/20221021134513.17730-1-yangx.jy@fujitsu.com Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
58651bbb |
|
05-Aug-2022 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Set pd early in mr alloc routines Move setting of pd in mr objects ahead of any possible errors so that it will always be set in rxe_mr_cleanup() to avoid seg faults when rxe_put(mr_pd(mr)) is called. Fixes: cf40367961d8 ("RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup()") Link: https://lore.kernel.org/r/20220805183153.32007-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
954afc5a |
|
21-Sep-2022 |
Daisuke Matsuda <matsuda-daisuke@fujitsu.com> |
RDMA/rxe: Use members of generic struct in rxe_mr rxe_mr and ib_mr have interchangeable members. Remove device specific members and use ones in the generic struct. Both 'iova' and 'length' are filled in ib_uverbs or ib_core layer after MR registration. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://lore.kernel.org/r/20220921080844.1616883-2-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
d4ecb56e |
|
28-Aug-2022 |
Daisuke Matsuda <matsuda-daisuke@fujitsu.com> |
RDMA/rxe: Remove an unused member from struct rxe_mr Commit 1e75550648da ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"") brought back the member 'va' to struct rxe_mr. However, it is actually used by nobody and thus can be removed. Fixes: 1e75550648da ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"") Link: https://lore.kernel.org/r/20220829012335.1212697-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
1e755506 |
|
25-Jul-2022 |
Li Zhijian <lizhijian@fujitsu.com> |
Revert "RDMA/rxe: Create duplicate mapping tables for FMRs" Below 2 commits will be reverted: commit 8ff5f5d9d8cf ("RDMA/rxe: Prevent double freeing rxe_map_set()") commit 647bf13ce944 ("RDMA/rxe: Create duplicate mapping tables for FMRs") The community has a few bug reports which pointed this commit at last. Some proposals are raised up in the meantime but all of them have no follow-up operation. The previous commit led the map_set of FMR to be not available any more if the MR is registered again after invalidating. Although the mentioned patch try to fix a potential race in building/accessing the same table for fast memory regions, it broke rtrs etc ULPs. Since the latter could be worse, revert this patch. With previous commit, it's observed that a same MR in rnbd server will trigger below code path: -> rxe_mr_init_fast() |-> alloc map_set() # map_set is uninitialized |...-> rxe_map_mr_sg() # build the map_set |-> rxe_mr_set_page() |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means # we can access host memory(such rxe_mr_copy) |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE, # but map_set was not built again |...-> rxe_mr_copy() # kernel crash due to access wild addresses # that lookup from the map_set The backtraces are not always identical. [1st]---------- RIP: 0010:lookup_iova+0x66/0xa0 [rdma_rxe] Code: 00 00 00 48 d3 ee 89 32 c3 4c 8b 18 49 8b 3b 48 8b 47 08 48 39 c6 72 38 48 29 c6 45 31 d2 b8 01 00 00 00 48 63 c8 48 c1 e1 04 <48> 8b 4c 0f 08 48 39 f1 77 21 83 c0 01 48 29 ce 3d 00 01 00 00 75 RSP: 0018:ffffb7ff80063bf0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9b9949d86800 RCX: 0000000000000000 RDX: ffffb7ff80063c00 RSI: 0000000049f6b378 RDI: 002818da00000004 RBP: 0000000000000120 R08: ffffb7ff80063c08 R09: ffffb7ff80063c04 R10: 0000000000000002 R11: ffff9b9916f7eef8 R12: ffff9b99488a0038 R13: ffff9b99488a0038 R14: ffff9b9914fb346a R15: ffff9b990ab27000 FS: 0000000000000000(0000) GS:ffff9b997dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007efc33a98ed0 CR3: 0000000014f32004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> rxe_mr_copy.part.0+0x6f/0x140 [rdma_rxe] rxe_responder+0x12ee/0x1b60 [rdma_rxe] ? rxe_icrc_check+0x7e/0x100 [rdma_rxe] ? rxe_rcv+0x1d0/0x780 [rdma_rxe] ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe] rxe_do_task+0x67/0xb0 [rdma_rxe] rxe_xmit_packet+0xc7/0x210 [rdma_rxe] rxe_requester+0x680/0xee0 [rdma_rxe] ? update_load_avg+0x5f/0x690 ? update_load_avg+0x5f/0x690 ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client] [2nd]---------- RIP: 0010:rxe_mr_copy.part.0+0xa8/0x140 [rdma_rxe] Code: 00 00 49 c1 e7 04 48 8b 00 4c 8d 2c d0 48 8b 44 24 10 4d 03 7d 00 85 ed 7f 10 eb 6c 89 54 24 0c 49 83 c7 10 31 c0 85 ed 7e 5e <49> 8b 3f 8b 14 24 4c 89 f6 48 01 c7 85 d2 74 06 48 89 fe 4c 89 f7 RSP: 0018:ffffae3580063bf8 EFLAGS: 00010202 RAX: 0000000000018978 RBX: ffff9d7ef7a03600 RCX: 0000000000000008 RDX: 000000000000007c RSI: 000000000000007c RDI: ffff9d7ef7a03600 RBP: 0000000000000120 R08: ffffae3580063c08 R09: ffffae3580063c04 R10: ffff9d7efece0038 R11: ffff9d7ec4b1db00 R12: ffff9d7efece0038 R13: ffff9d7ef4098260 R14: ffff9d7f11e23c6a R15: 4c79500065708144 FS: 0000000000000000(0000) GS:ffff9d7f3dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fce47276c60 CR3: 0000000003f66004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> rxe_responder+0x12ee/0x1b60 [rdma_rxe] ? rxe_icrc_check+0x7e/0x100 [rdma_rxe] ? rxe_rcv+0x1d0/0x780 [rdma_rxe] ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe] rxe_do_task+0x67/0xb0 [rdma_rxe] rxe_xmit_packet+0xc7/0x210 [rdma_rxe] rxe_requester+0x680/0xee0 [rdma_rxe] ? update_load_avg+0x5f/0x690 ? update_load_avg+0x5f/0x690 ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client] rxe_do_task+0x67/0xb0 [rdma_rxe] tasklet_action_common.constprop.0+0x92/0xc0 __do_softirq+0xe1/0x2d8 run_ksoftirqd+0x21/0x30 smpboot_thread_fn+0x183/0x220 ? sort_range+0x20/0x20 kthread+0xe2/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 Link: https://lore.kernel.org/r/1658805386-2-1-git-send-email-lizhijian@fujitsu.com Link: https://lore.kernel.org/all/20220210073655.42281-1-guoqing.jiang@linux.dev/T/ Link: https://www.spinics.net/lists/linux-rdma/msg110836.html Link: https://lore.kernel.org/lkml/94a5ea93-b8bb-3a01-9497-e2021f29598a@linux.dev/t/ Tested-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
174e7b13 |
|
07-Jul-2022 |
Md Haris Iqbal <haris.phnx@gmail.com> |
RDMA/rxe: For invalidate compare according to set keys in mr The 'rkey' input can be an lkey or rkey, and in rxe the lkey or rkey have the same value, including the variant bits. So, if mr->rkey is set, compare the invalidate key with it, otherwise compare with the mr->lkey. Since we already did a lookup on the non-varient bits to get this far, the check's only purpose is to confirm that the wqe has the correct variant bits. Fixes: 001345339f4c ("RDMA/rxe: Separate HW and SW l/rkeys") Link: https://lore.kernel.org/r/20220707073006.328737-1-haris.phnx@gmail.com Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
215d0a75 |
|
12-Jun-2022 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Stop lookup of partially built objects Currently the rdma_rxe driver has a security weakness due to giving objects which are partially initialized indices allowing external actors to gain access to them by sending packets which refer to their index (e.g. qpn, rkey, etc) causing unpredictable results. This patch adds a new API rxe_finalize(obj) which enables looking up pool objects from indices using rxe_pool_get_index() for AH, QP, MR, and MW. They are added in create verbs only after the objects are fully initialized. It also adds wait for completion to destroy/dealloc verbs to assure that all references have been dropped before returning to rdma_core by implementing a new rxe_pool API rxe_cleanup() which drops a reference to the object and then waits for all other references to be dropped. When the last reference is dropped the object is completed by kref. After that it cleans up the object and if locally allocated frees the memory. In the special case of address handle objects the delay is implemented separately if the destroy_ah call is not sleepable. Combined with deferring cleanup code to type specific cleanup routines this allows all pending activity referring to objects to complete before returning to rdma_core. Link: https://lore.kernel.org/r/20220612223434.31462-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
cf403679 |
|
20-Apr-2022 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup() Move the code which tears down an mr to rxe_mr_cleanup to allow operations holding a reference to the mr to complete. Link: https://lore.kernel.org/r/20220421014042.26985-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
3197706a |
|
03-Mar-2022 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Use standard names for ref counting Rename rxe_add_ref() to rxe_get() and rxe_drop_ref() to rxe_put(). Significantly improves readability for new readers. Link: https://lore.kernel.org/r/20220304000808.225811-10-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
3225717f |
|
03-Mar-2022 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Replace red-black trees by xarrays Currently the rxe driver uses red-black trees to add indices to the rxe object pools. Linux xarrays provide a better way to implement the same functionality for indices. This patch replaces red-black trees by xarrays for pool objects. Since xarrays already have a spinlock use that in place of the pool rwlock. Make sure that all changes in the xarray(index) and kref(ref counnt) occur atomically. Link: https://lore.kernel.org/r/20220304000808.225811-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
02827b67 |
|
02-Nov-2021 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Cleanup rxe_pool_entry Currently three different names are used to describe rxe pool elements. They are referred to as entries, elems or pelems. This patch chooses one 'elem' and changes the other ones. Link: https://lore.kernel.org/r/20211103050241.61293-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
8ff5f5d9 |
|
27-Dec-2021 |
Li Zhijian <lizhijian@cn.fujitsu.com> |
RDMA/rxe: Prevent double freeing rxe_map_set() The same rxe_map_set could be freed twice: rxe_reg_user_mr() -> rxe_mr_init_user() -> rxe_mr_free_map_set() # 1st -> rxe_drop_ref() ... -> rxe_mr_cleanup() -> rxe_mr_free_map_set() # 2nd Follow normal convection and put resource cleanup either in the error unwind of the allocator, or the overall free function. Leave the object unchanged with a NULL cur_map_set on failure and remove the unncessary free in rxe_mr_init_user(). Link: https://lore.kernel.org/r/20211228014406.1033444-1-lizhijian@cn.fujitsu.com Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
450f4f6a |
|
14-Sep-2021 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Only allow invalidate for appropriate MRs Local and remote invalidate operations are not allowed by IBA for MRs created by (re)register memory verbs. This patch checks the MR type in rxe_invalidate_mr(). Link: https://lore.kernel.org/r/20210914164206.19768-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
647bf13c |
|
14-Sep-2021 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Create duplicate mapping tables for FMRs For fast memory regions create duplicate mapping tables so ib_map_mr_sg() can build a new mapping table which is then swapped into place synchronously with the execution of an IB_WR_REG_MR work request. Currently the rxe driver uses the same table for receiving RDMA operations and for building new tables in preparation for reusing the MR. This exposes users to potentially incorrect results. Link: https://lore.kernel.org/r/20210914164206.19768-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
00134533 |
|
14-Sep-2021 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Separate HW and SW l/rkeys Separate software and simulated hardware lkeys and rkeys for MRs and MWs. This makes struct ib_mr and struct ib_mw isolated from hardware changes triggered by executing work requests. This change fixes a bug seen in blktest. Link: https://lore.kernel.org/r/20210914164206.19768-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
47b7f706 |
|
14-Sep-2021 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Cleanup MR status and type enums Eliminate RXE_MR_STATE_ZOMBIE which is not compatible with IBA. RXE_MR_STATE_INVALID is better. Replace RXE_MR_TYPE_XXX by IB_MR_TYPE_XXX which covers all the needed types. Link: https://lore.kernel.org/r/20210914164206.19768-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
79fbd3e1 |
|
24-Aug-2021 |
Maor Gottlieb <maorg@nvidia.com> |
RDMA: Use the sg_table directly and remove the opencoded version from umem This allows using the normal sg_table APIs and makes all the code cleaner. Remove sgt, nents and nmapd from ib_umem. Link: https://lore.kernel.org/r/20210824142531.3877007-4-maorg@nvidia.com Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
1117f26e |
|
06-Jul-2021 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Move ICRC generation to a subroutine Isolate ICRC generation into a single subroutine named rxe_generate_icrc() in rxe_icrc.c. Remove scattered crc generation code from elsewhere. Link: https://lore.kernel.org/r/20210707040040.15434-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
cdbdb772 |
|
02-Jul-2021 |
Xiao Yang <yangx.jy@fujitsu.com> |
RDMA/rxe: Remove the repeated 'mr->umem = umem' Drop duplicated code Link: https://lore.kernel.org/r/20210702123024.37025-1-ice_yangxiao@163.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
b18c7da6 |
|
05-Jul-2021 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Fix memory leak in error path code In rxe_mr_init_user() at the third error the driver fails to free the memory at mr->map. This patch adds code to do that. This error only occurs if page_address() fails to return a non zero address which should never happen for 64 bit architectures. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20210705164153.17652-1-rpearsonhpe@gmail.com Reported by: Haakon Bugge <haakon.bugge@oracle.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
20ec0a6d |
|
21-Jun-2021 |
Xiao Yang <yangx.jy@fujitsu.com> |
RDMA/rxe: Don't overwrite errno from ib_umem_get() rxe_mr_init_user() always returns the fixed -EINVAL when ib_umem_get() fails so it's hard for user to know which actual error happens in ib_umem_get(). For example, ib_umem_get() will return -EOPNOTSUPP when trying to pin pages on a DAX file. Return actual error as mlx4/mlx5 does. Link: https://lore.kernel.org/r/20210621071456.4259-1-ice_yangxiao@163.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
570d2b99 |
|
07-Jun-2021 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Disallow MR dereg and invalidate when bound Check that an MR has no bound MWs before allowing a dereg or invalidate operation. Link: https://lore.kernel.org/r/20210608042552.33275-11-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
3902b429 |
|
07-Jun-2021 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Implement invalidate MW operations Implement invalidate MW and cleaned up invalidate MR operations. Added code to perform remote invalidate for send with invalidate. Added code to perform local invalidation. Deleted some blank lines in rxe_loc.h. Link: https://lore.kernel.org/r/20210608042552.33275-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
beec0239 |
|
07-Jun-2021 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs Add ib_alloc_mw and ib_dealloc_mw verbs APIs. Added new file rxe_mw.c focused on MWs. Changed the 8 bit random key generator. Added a cleanup routine for MWs. Added verbs routines to ib_device_ops. Link: https://lore.kernel.org/r/20210608042552.33275-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
cd5b010f |
|
12-May-2021 |
Lang Cheng <chenglang@huawei.com> |
RDMA/rxe: Remove unused parameter udata The old version of ib_umem_get() need these udata as a parameter but now they are unnecessary. Fixes: c320e527e154 ("IB: Allow calls to ib_umem_get from kernel ULPs") Link: https://lore.kernel.org/r/1620807142-39157-5-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
364e282c |
|
25-Mar-2021 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Split MEM into MR and MW In the original rxe implementation it was intended to use a common object to represent MRs and MWs but they are different enough to separate these into two objects. This allows replacing the mem name with mr for MRs which is more consistent with the style for the other objects and less likely to be confusing. This is a long patch that mostly changes mem to mr where it makes sense and adds a new rxe_mw struct. Link: https://lore.kernel.org/r/20210325212425.2792-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
eeed6965 |
|
09-Oct-2020 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Remove unused RXE_MR_TYPE_FMR This is a left over from the past. It is no longer used. Link: https://lore.kernel.org/r/20201009165112.271143-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
1858d98b |
|
08-Oct-2020 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Remove duplicate entries in struct rxe_mr Struct rxe_mem had pd, lkey and rkey values both in itself and in the struct ib_mr which is also included in rxe_mem. Delete these entries and replace references with the ones in ibmr.Add mr_pd, mr_lkey and mr_rkey macros which extract these values from mr. Link: https://lore.kernel.org/r/20201008212818.265303-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
63fa15db |
|
27-Aug-2020 |
Bob Pearson <rpearsonhpe@gmail.com> |
RDMA/rxe: Add SPDX hdrs to rxe source files Add SPDX headers to all rxe .c and .h files. Link: https://lore.kernel.org/r/20200827145439.2273-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
e3ddd606 |
|
19-Aug-2020 |
Dinghao Liu <dinghao.liu@zju.edu.cn> |
RDMA/rxe: Fix memleak in rxe_mem_init_user When page_address() fails, umem should be freed just like when rxe_mem_alloc() fails. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20200819075632.22285-1-dinghao.liu@zju.edu.cn Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
293d8440 |
|
05-Jul-2020 |
Kamal Heib <kamalheib1@gmail.com> |
RDMA/rxe: Return void from rxe_mem_init_dma() The return value from rxe_mem_init_dma() is always 0 - change it to be void and fix the callers accordingly. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20200705104313.283034-4-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
f6b4c11f |
|
22-Jun-2020 |
Kamal Heib <kamalheib1@gmail.com> |
RDMA/rxe: Remove unused rxe_mem_map_pages This function is not in use - delete it. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20200622100731.27359-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
c320e527 |
|
15-Jan-2020 |
Moni Shoua <monis@mellanox.com> |
IB: Allow calls to ib_umem_get from kernel ULPs So far the assumption was that ib_umem_get() and ib_umem_odp_get() are called from flows that start in UVERBS and therefore has a user context. This assumption restricts flows that are initiated by ULPs and need the service that ib_umem_get() provides. This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device directly by relying on the fact that both UVERBS and ULPs sets that field correctly. Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
72b894b0 |
|
13-Nov-2019 |
Christoph Hellwig <hch@lst.de> |
IB/umem: remove the dmasync argument to ib_umem_get The argument is always ignored, so remove it. Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
836a0fbb |
|
16-Jun-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Check umem pointer validity prior to release Update ib_umem_release() to behave similarly to kfree() and allow submitting NULL pointer as safe input to this function. Fixes: a52c8e2469c3 ("RDMA: Clean destroy CQ in drivers do not return errors") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
93923d30 |
|
28-Mar-2019 |
Shiraz Saleem <shiraz.saleem@intel.com> |
RDMA/rxe: Use correct sizing on buffers holding page DMA addresses The buffer that holds the page DMA addresses is sized off umem->nmap. This can potentially cause out of bound accesses on the PBL array when iterating the umem DMA-mapped SGL. This is because if umem pages are combined, umem->nmap can be much lower than the number of system pages in umem. Use ib_umem_num_pages() to size this buffer. Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
a4b7013d |
|
12-Mar-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA/rxe: Fix slab-out-bounds access which lead to kernel crash later BUG: KASAN: slab-out-of-bounds in rxe_mem_init_user+0x6c1/0x740 [rdma_rxe] Read of size 8 at addr ffff88805c01a608 by task ib_send_bw/573 CPU: 24 PID: 573 Comm: ib_send_bw Not tainted 5.0.0-rc5+ #189 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 Call Trace: rxe_mem_init_user+0x6c1/0x740 [rdma_rxe] rxe_reg_user_mr+0x9b/0x110 [rdma_rxe] ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs] ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs] rxe_mem_init_user+0x6c1/0x740 [rdma_rxe] rxe_reg_user_mr+0x9b/0x110 [rdma_rxe] ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs] ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs] ib_uverbs_cmd_verbs+0x5f2/0xf20 [ib_uverbs] ib_uverbs_ioctl+0x202/0x310 [ib_uverbs] do_vfs_ioctl+0x193/0x1440 ksys_ioctl+0x3a/0x70 __x64_sys_ioctl+0x6f/0xb0 do_syscall_64+0x13f/0x570 entry_SYSCALL_64_after_hwframe+0x49/0xbe Allocated by task 573: __kasan_kmalloc.constprop.5+0xc1/0xd0 __kmalloc+0x161/0x310 rxe_mem_alloc+0x52/0x470 [rdma_rxe] rxe_mem_init_user+0x113/0x740 [rdma_rxe] rxe_reg_user_mr+0x9b/0x110 [rdma_rxe] ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs] ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs] ib_uverbs_cmd_verbs+0x5f2/0xf20 [ib_uverbs] ib_uverbs_ioctl+0x202/0x310 [ib_uverbs] do_vfs_ioctl+0x193/0x1440 ksys_ioctl+0x3a/0x70 __x64_sys_ioctl+0x6f/0xb0 do_syscall_64+0x13f/0x570 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 0: __kasan_slab_free+0x12e/0x180 kfree+0x10a/0x2c0 rcu_process_callbacks+0xa77/0x1260 __do_softirq+0x2ad/0xacb Test scenario: ib_send_bw -x 1 -d rxe0 -a & ib_send_bw -x 1 -d rxe0 -a localhost Fixes: 8700e3e7c485 ("Soft RoCE driver") Reported-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Tested-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
8317d6cd |
|
11-Feb-2019 |
Shiraz, Saleem <shiraz.saleem@intel.com> |
RDMA/rxe: Use for_each_sg_page iterator on umem SGL The driver walks the umem SGL assuming a 1:1 mapping between SGE and system page. Update to use the for_each_sg_page iterator to get individual pages contained in the SGEs. This is a pre-requisite before adding page combining into SGEs while building the scatter table in IB core. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
b0ea0fa5 |
|
09-Jan-2019 |
Jason Gunthorpe <jgg@ziepe.ca> |
IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udata ib_umem_get() can only be called in a method callback, which always has a udata parameter. This allows ib_umem_get() to derive the ucontext pointer directly from the udata without requiring the drivers to find it in some way or another. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
|
#
1703129e |
|
26-Aug-2018 |
Parav Pandit <parav@mellanox.com> |
IB/rxe: Refactor lookup memory function Consolidate all error checks under single if() condition and use helper unlikely() macro for them, in addition drop unneeded goto labels. rxe_pool_get_index() already provides RB tree based efficient lookup. Avoid doing extra checks for error cases which are rare and already covered by rxe_pool_get_index(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e12ee8ce |
|
23-Apr-2018 |
Zhu Yanjun <yanjun.zhu@oracle.com> |
IB/rxe: remove unused function variable In the functions rxe_mem_init_dma, rxe_mem_init_user, rxe_mem_init_fast and copy_data, the function variable rxe is not used. So this function variable rxe is removed. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
13eb1e21 |
|
28-Aug-2017 |
Andrew Boyer <andrew.boyer@dell.com> |
IB/rxe: Avoid ICRC errors by copying into the skb first The current process is to first calculate the CRC and then copy the client data into the packet. This leaves a window in which the packet contents and CRC can get out of sync, if the client changes the data after the CRC is calculated but before the data is copied. By copying the data into the packet and then calculating the CRC directly from the packet contents we eliminate the window. This can be seen with qperf's ud_bi_bw test. This seems like very strange/reckless client behavior, but whether the client has mangled its data or not RXE should be able to transfer it reliably. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
af5df5fb |
|
04-May-2017 |
Leon Romanovsky <leon@kernel.org> |
IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type Callers of rxe_mem_copy() provide pointer to store updated CRC value. That pointer was supposed to be updated, but the commit cee2688e3cd6 ("IB/rxe: Offload CRC calculation when possible") mistakenly removed that assignment for RXE_MEM_TYPE_DMA memory type. The code worked because there are no actual callers with RXE_MEM_TYPE_DMA, who are interested in returned value of crcp. The one caller in read_reply(), who uses the returned crcp didn't set RXE_MEM_TYPE_DMA as mem->type. Fixes: cee2688e3cd6 ("IB/rxe: Offload CRC calculation when possible") Reported-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
3e7e1193 |
|
05-Apr-2017 |
Artemy Kovalyov <artemyko@mellanox.com> |
IB: Replace ib_umem page_size by page_shift Size of pages are held by struct ib_umem in page_size field. It is better to store it as an exponent, because page size by nature is always power-of-two and used as a factor, divisor or ilog2's argument. The conversion of page_size to be page_shift allows to have portable code and avoid following error while compiling on ARM: ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined! CC: Selvin Xavier <selvin.xavier@broadcom.com> CC: Steve Wise <swise@chelsio.com> CC: Lijun Ou <oulijun@huawei.com> CC: Shiraz Saleem <shiraz.saleem@intel.com> CC: Adit Ranadive <aditr@vmware.com> CC: Dennis Dalessandro <dennis.dalessandro@intel.com> CC: Ram Amrani <Ram.Amrani@Cavium.com> Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Ram Amrani <Ram.Amrani@cavium.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
cee2688e |
|
20-Apr-2017 |
yonatanc <yonatanc@mellanox.com> |
IB/rxe: Offload CRC calculation when possible Use CPU ability to perform CRC calculations, by replacing direct calls to crc32_le() with crypto_shash_updata(). The overall performance gain measured with ib_send_bw tool is 10% and it was tested on "Intel CPU ES-2660 v2 @ 2.20Ghz" CPU. ib_send_bw -d rxe0 -x 1 -n 9000 -e -s $((1024 * 1024 )) -l 100 --------------------------------------------------------------------------------------------- | | bytes | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] | --------------------------------------------------------------------------------------------- | crc32_le | 1048576 | 9000 | inf | 497.60 | 0.000498 | | CRC offload | 1048576 | 9000 | inf | 546.70 | 0.000547 | --------------------------------------------------------------------------------------------- Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
647bf3d8 |
|
07-Feb-2017 |
Eyal Itkin <eyal.itkin@gmail.com> |
IB/rxe: Fix mem_check_range integer overflow Update the range check to avoid integer-overflow in edge case. Resolves CVE 2016-8636. Signed-off-by: Eyal Itkin <eyal.itkin@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
43553b47 |
|
10-Jan-2017 |
Bart Van Assche <bvanassche@acm.org> |
IB/rxe: Issue warnings once It is strongly recommended to report kernel warnings once instead of every time a condition is hit. Hence change WARN_ON() into WARN_ON_ONCE() / BUILD_BUG_ON() as appropriate. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
32404fb7 |
|
10-Jan-2017 |
Bart Van Assche <bvanassche@acm.org> |
IB/rxe: Let the compiler check the type of the cleanup functions Change the argument type of these functions from void * into struct rxe_pool_entry *. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
d4fb5925 |
|
22-Nov-2016 |
Andrew Boyer <andrew.boyer@dell.com> |
IB/rxe: Add support for zero-byte operations The last_psn algorithm fails in the zero-byte case: it calculates first_psn = N, last_psn = N-1. This makes the operation unretryable since the res structure will fail the (first_psn <= psn <= last_psn) test in find_resource(). While here, use BTH_PSN_MASK to mask the calculated last_psn. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e404f945 |
|
28-Sep-2016 |
Parav Pandit <pandit.parav@gmail.com> |
IB/rxe: improved debug prints & code cleanup 1. Debugging qp state transitions and qp errors in loopback and multiple QP tests is difficult without qp numbers in debug logs. This patch adds qp number to important debug logs. 2. Instead of having rxe: prefix in few logs and not having in few logs, using uniform module name prefix using pr_fmt macro. 3. Code cleanup for various warnings reported by checkpatch for incomplete unsigned data type, line over 80 characters, return statements. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
8700e3e7 |
|
16-Jun-2016 |
Moni Shoua <monis@mellanox.com> |
Soft RoCE driver Soft RoCE (RXE) - The software RoCE driver ib_rxe implements the RDMA transport and registers to the RDMA core device as a kernel verbs provider. It also implements the packet IO layer. On the other hand ib_rxe registers to the Linux netdev stack as a udp encapsulating protocol, in that case RDMA, for sending and receiving packets over any Ethernet device. This yields a RDMA transport over the UDP/Ethernet network layer forming a RoCEv2 compatible device. The configuration procedure of the Soft RoCE drivers requires binding to any existing Ethernet network device. This is done with /sys interface. A userspace Soft RoCE library (librxe) provides user applications the ability to run with Soft RoCE devices. The use of rxe verbs ins user space requires the inclusion of librxe as a device specifics plug-in to libibverbs. librxe is packaged separately. Architecture: +-----------------------------------------------------------+ | Application | +-----------------------------------------------------------+ +-----------------------------------+ | libibverbs | User +-----------------------------------+ +----------------+ +----------------+ | librxe | | HW RoCE lib | +----------------+ +----------------+ +---------------------------------------------------------------+ +--------------+ +------------+ | Sockets | | RDMA ULP | +--------------+ +------------+ +--------------+ +---------------------+ | TCP/IP | | ib_core | +--------------+ +---------------------+ +------------+ +----------------+ Kernel | ib_rxe | | HW RoCE driver | +------------+ +----------------+ +------------------------------------+ | NIC driver | +------------------------------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-----------------------------------------------------------+ | Application | +-----------------------------------------------------------+ +-----------------------------------+ | libibverbs | User +-----------------------------------+ +----------------+ +----------------+ | librxe | | HW RoCE lib | +----------------+ +----------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--------------+ +------------+ | Sockets | | RDMA ULP | +--------------+ +------------+ +--------------+ +---------------------+ | TCP/IP | | ib_core | +--------------+ +---------------------+ +------------+ +----------------+ Kernel | ib_rxe | | HW RoCE driver | +------------+ +----------------+ +------------------------------------+ | NIC driver | +------------------------------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Soft RoCE resources: [1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in Github [2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE Wiki page [3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|