Cross Reference: /linux-master/drivers/infiniband/hw/mlx5/cq.c

History log of /linux-master/drivers/infiniband/hw/mlx5/cq.c
Revision	Date	Author	Comments
# f14c1a14	12-Jun-2023	Maher Sanalla <msanalla@nvidia.com>	net/mlx5: Allocate completion EQs dynamically This commit enables the dynamic allocation of EQs at runtime, allowing for more flexibility in managing completion EQs and reducing the memory overhead of driver load. Whenever a CQ is created for a given vector index, the driver will lookup to see if there is an already mapped completion EQ for that vector, if so, utilize it. Otherwise, allocate a new EQ on demand and then utilize it for the CQ completion events. Add a protection lock to the EQ table to protect from concurrent EQ creation attempts. While at it, replace mlx5_vector2irqn()/mlx5_vector2eqn() with mlx5_comp_eqn_get() and mlx5_comp_irqn_get() which will allocate an EQ on demand if no EQ is found for the given vector. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# abef378c	01-Nov-2022	Arumugam Kolappan <aru.kolappan@oracle.com>	RDMA/mlx5: Change debug log level for remote access error syndromes The mlx5 driver dumps the entire CQE buffer by default for few syndromes. Some syndromes are expected due to the application behavior [ex: MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR, MLX5_CQE_SYNDROME_REMOTE_OP_ERR and MLX5_CQE_SYNDROME_LOCAL_PROT_ERR]. Hence, for these syndromes, the patch converts the log level from KERN_WARNING to KERN_DEBUG. This enables the application to get the CQE buffer dump by changing to KERN_DEBUG level as and when needed. Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Arumugam Kolappan <aru.kolappan@oracle.com> Link: https://lore.kernel.org/r/1667287664-19377-1-git-send-email-aru.kolappan@oracle.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
# 158e71bb	14-May-2022	Aharon Landau <aharonl@nvidia.com>	RDMA/mlx5: Add a umr recovery flow When a UMR fails, the UMR QP state changes to an error state. Therefore, all the further UMR operations will fail too. Add a recovery flow to the UMR QP, and repost the flushed WQEs. Link: https://lore.kernel.org/r/6cc24816cca049bd8541317f5e41d3ac659445d3.1652588303.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
# a7ad9dde	27-Dec-2021	Dust Li <dust.li@linux.alibaba.com>	RDMA/mlx5: Print wc status on CQE error and dump needed mlx5_handle_error_cqe() only dump the content of the CQE which is raw hex data, and not straighforward for debug. Print WC status message when we got CQE error and dump is need. Here is an example of how the dmesg log looks like with this: infiniband mlx5_0: mlx5_handle_error_cqe:333:(pid 0): WC error: 10, message: remote access error infiniband mlx5_0: dump_cqe:272:(pid 0): dump error cqe 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 00 00 88 13 08 03 61 b3 1e a1 42 d3 Link: https://lore.kernel.org/r/20211227123806.47530-1-dust.li@linux.alibaba.com Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
# 616d5769	18-Jul-2021	Tal Gilboa <talgi@nvidia.com>	IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq is_apu_thread_cq() used to detect CQs which are attached to APU threads. This was extended to support other elements as well, so the function was renamed to is_apu_cq(). c_eqn_or_apu_element was extended from 8 bits to 32 bits, which wan't reflected when the APU support was first introduced. Acked-by: Michael S. Tsirkin <mst@redhat.com> # vdpa Signed-off-by: Tal Gilboa <talgi@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
# 563476ae	11-Apr-2021	Shay Drory <shayd@nvidia.com>	net/mlx5: Synchronize correct IRQ when destroying CQ The CQ destroy is performed based on the IRQ number that is stored in cq->irqn. That number wasn't set explicitly during CQ creation and as expected some of the API users of mlx5_core_create_cq() forgot to update it. This caused to wrong synchronization call of the wrong IRQ with a number 0 instead of the real one. As a fix, set the IRQ number directly in the mlx5_core_create_cq() and update all users accordingly. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Fixes: ef1659ade359 ("IB/mlx5: Add DEVX support for CQ events") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 33652951	16-Jun-2021	Aharon Landau <aharonl@nvidia.com>	RDMA/mlx5: Support real-time timestamp directly from the device Currently, if the user asks for a real-time timestamp, the device will return a free-running one, and the timestamp will be translated to real-time in the user-space. When the device supports only real-time timestamp and not free-running, the creation of the QP will fail even though the user needs supported the real-time one. To prevent this, we will return the real-time timestamp directly from the device. Link: https://lore.kernel.org/r/c6cfc8e6f038575c5c2de6505830f7e74e4de80d.1623829775.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
# 9ecf6ac1	19-May-2021	Maor Gottlieb <maorg@nvidia.com>	RDMA/mlx5: Take qp type from mlx5_ib_qp Change all the places in the mlx5_ib driver to take the qp type from the mlx5_ib_qp struct, except the QP initialization flow. It will ensure that we check the right QP type also for vendor specific QPs. Link: https://lore.kernel.org/r/b2e16cd65b59cd24fa81c01c7989248da44e58ea.1621413899.git.leonro@nvidia.com Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
# 0bedd3d0	12-May-2021	Lang Cheng <chenglang@huawei.com>	RDMA/mlx5: Remove unused parameter udata The old version of ib_umem_get() need these udata as a parameter but now they are unnecessary. Fixes: c320e527e154 ("IB: Allow calls to ib_umem_get from kernel ULPs") Link: https://lore.kernel.org/r/1620807142-39157-4-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
# 2ba0aa2f	10-Jun-2021	Alaa Hleihel <alaa@nvidia.com>	IB/mlx5: Fix initializing CQ fragments buffer The function init_cq_frag_buf() can be called to initialize the current CQ fragments buffer cq->buf, or the temporary cq->resize_buf that is filled during CQ resize operation. However, the offending commit started to use function get_cqe() for getting the CQEs, the issue with this change is that get_cqe() always returns CQEs from cq->buf, which leads us to initialize the wrong buffer, and in case of enlarging the CQ we try to access elements beyond the size of the current cq->buf and eventually hit a kernel panic. [exception RIP: init_cq_frag_buf+103] [ffff9f799ddcbcd8] mlx5_ib_resize_cq at ffffffffc0835d60 [mlx5_ib] [ffff9f799ddcbdb0] ib_resize_cq at ffffffffc05270df [ib_core] [ffff9f799ddcbdc0] llt_rdma_setup_qp at ffffffffc0a6a712 [llt] [ffff9f799ddcbe10] llt_rdma_cc_event_action at ffffffffc0a6b411 [llt] [ffff9f799ddcbe98] llt_rdma_client_conn_thread at ffffffffc0a6bb75 [llt] [ffff9f799ddcbec8] kthread at ffffffffa66c5da1 [ffff9f799ddcbf50] ret_from_fork_nospec_begin at ffffffffa6d95ddd Fix it by getting the needed CQE by calling mlx5_frag_buf_get_wqe() that takes the correct source buffer as a parameter. Fixes: 388ca8be0037 ("IB/mlx5: Implement fragmented completion queue (CQ)") Link: https://lore.kernel.org/r/90a0e8c924093cfa50a482880ad7e7edb73dc19a.1623309971.git.leonro@nvidia.com Signed-off-by: Alaa Hleihel <alaa@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
# c08fbdc5	15-Nov-2020	Jason Gunthorpe <jgg@ziepe.ca>	RDMA/mlx5: mlx5_umem_find_best_quantized_pgoff() for CQ This fixes a bug where the page_offset was not being considered when building a CQ. The HW specification says it 'must be zero', so use a variant of mlx5_umem_find_best_quantized_pgoff() with a 0 pgoff_bitmask to force this result. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Link: https://lore.kernel.org/r/20201115114311.136250-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
# aab8d396	26-Oct-2020	Jason Gunthorpe <jgg@ziepe.ca>	RDMA/mlx5: Change mlx5_ib_populate_pas() to use rdma_for_each_block() This routine converts the umem SGL into a list of fixed pages for DMA, which is exactly what rdma_umem_for_each_dma_block() is for, use the common code directly. Link: https://lore.kernel.org/r/20201026132314.1336717-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
# f8fb3110	26-Oct-2020	Jason Gunthorpe <jgg@ziepe.ca>	RDMA/mlx5: Remove npages from mlx5_ib_cont_pages() Most callers don't need this, and the few that do can get it as ib_umem_num_pages(umem). Link: https://lore.kernel.org/r/20201026131936.1335664-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
# 7db0eea9	26-Oct-2020	Jason Gunthorpe <jgg@ziepe.ca>	RDMA/mlx5: Remove ncont from mlx5_ib_cont_pages() This is the same as ib_umem_num_dma_blocks(umem, 1UL << page_shift), have the callers compute it directly. Link: https://lore.kernel.org/r/20201026131936.1335664-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
# 95741ee3	26-Oct-2020	Jason Gunthorpe <jgg@ziepe.ca>	RDMA/mlx5: Remove order from mlx5_ib_cont_pages() Only alloc_mr_from_cache() needs order and can trivially compute it, so lift it to the one call site and remove the NULL arguments. Link: https://lore.kernel.org/r/20201026131936.1335664-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
# 1c15b4f2	23-Sep-2020	Avihai Horon <avihaih@nvidia.com>	RDMA/core: Modify enum ib_gid_type and enum rdma_network_type Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values, so enum ib_gid_type will match the gid types of the new query GID table API which will be introduced in the following patches. This change in enum ib_gid_type requires to change also enum rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1 values. Link: https://lore.kernel.org/r/20200923165015.2491894-3-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
# 43d781b9	07-Sep-2020	Leon Romanovsky <leon@kernel.org>	RDMA: Allow fail of destroy CQ Like any other verbs objects, CQ shouldn't fail during destroy, but mlx5_ib didn't follow this contract with mixed IB verbs objects with DEVX. Such mix causes to the situation where FW and kernel are fully interdependent on the reference counting of each side. Kernel verbs and drivers that don't have DEVX flows shouldn't fail. Fixes: e39afe3d6dbd ("RDMA: Convert CQ allocations to be under core responsibility") Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
# 4b916ed9	30-Aug-2020	Leon Romanovsky <leon@kernel.org>	RDMA/mlx5: Fix potential race between destroy and CQE poll The SRQ can be destroyed right before mlx5_cmd_get_srq is called. In such case the latter will return NULL instead of expected SRQ. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Link: https://lore.kernel.org/r/20200830084010.102381-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
# df561f66	23-Aug-2020	Gustavo A. R. Silva <gustavoars@kernel.org>	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
# 3f649ab7	03-Jun-2020	Kees Cook <keescook@chromium.org>	treewide: Remove uninitialized_var() usage Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' \| cut -d: -f1 \| sort -u \| \ xargs perl -pi -e \ 's/\buninitialized_var$([^$]+)\)/\1/g; s:\s/\ (GCC be quiet\|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
# 244faedf	24-Apr-2020	Raed Salem <raeds@mellanox.com>	net/mlx5: Refactor imm_inval_pkey field in cqe struct The imm_inval_pkey field can hold four different types of data, depends on the usage, the data could be one of the below: - Immediate field of the received message - Invalidate rkey - Pkey of the packet - Flow table metadata Current implementation doesn't reflect the intended usage of the field at usage time. Reflect the different types by replace this field with a union, modify code where this field is used to reflect its intended usage. Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 333fbaa0	04-Apr-2020	Leon Romanovsky <leon@kernel.org>	net/mlx5: Move QP logic to mlx5_ib The mlx5_core doesn't need any functionality coded in qp.c, so move that file to drivers/infiniband/ be under mlx5_ib responsibility. Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
# 0a2fd01c	24-Mar-2020	Yishai Hadas <yishaih@mellanox.com>	IB/mlx5: Move to fully dynamic UAR mode once user space supports it Move to fully dynamic UAR mode once user space supports it. In this case we prevent any legacy mode of UARs on the allocated context and prevent redundant allocation of the static ones. Link: https://lore.kernel.org/r/20200324060143.1569116-6-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# 64d99f6a	24-Mar-2020	Yishai Hadas <yishaih@mellanox.com>	IB/mlx5: Extend CQ creation to get uar page index from user space Extend CQ creation to get uar page index from user space, this mode can be used with the UAR dynamic mode APIs to allocate/destroy a UAR object. Link: https://lore.kernel.org/r/20200324060143.1569116-3-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# 950bf4f1	18-Mar-2020	Leon Romanovsky <leon@kernel.org>	RDMA/mlx5: Fix access to wrong pointer while performing flush due to error The main difference between send and receive SW completions is related to separate treatment of WQ queue. For receive completions, the initial index to be flushed is stored in "tail", while for send completions, it is in deleted "last_poll". CPU: 54 PID: 53405 Comm: kworker/u161:0 Kdump: loaded Tainted: G OE --------- -t - 4.18.0-147.el8.ppc64le #1 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core] NIP: c000003c7c00a000 LR: c00800000e586af4 CTR: c000003c7c00a000 REGS: c0000036cc9db940 TRAP: 0400 Tainted: G OE --------- -t - (4.18.0-147.el8.ppc64le) MSR: 9000000010009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24004488 XER: 20040000 CFAR: c00800000e586af0 IRQMASK: 0 GPR00: c00800000e586ab4 c0000036cc9dbbc0 c00800000e5f1a00 c0000037d8433800 GPR04: c000003895a26800 c0000037293f2000 0000000000000201 0000000000000011 GPR08: c000003895a26c80 c000003c7c00a000 0000000000000000 c00800000ed30438 GPR12: c000003c7c00a000 c000003fff684b80 c00000000017c388 c00000396ec4be40 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: c00000000151e498 0000000000000010 c000003895a26848 0000000000000010 GPR24: 0000000000000010 0000000000010000 c000003895a26800 0000000000000000 GPR28: 0000000000000010 c0000037d8433800 c000003895a26c80 c000003895a26800 NIP [c000003c7c00a000] 0xc000003c7c00a000 LR [c00800000e586af4] __ib_process_cq+0xec/0x1b0 [ib_core] Call Trace: [c0000036cc9dbbc0] [c00800000e586ab4] __ib_process_cq+0xac/0x1b0 [ib_core] (unreliable) [c0000036cc9dbc40] [c00800000e586c88] ib_cq_poll_work+0x40/0xb0 [ib_core] [c0000036cc9dbc70] [c000000000171f44] process_one_work+0x2f4/0x5c0 [c0000036cc9dbd10] [c000000000172a0c] worker_thread+0xcc/0x760 [c0000036cc9dbdc0] [c00000000017c52c] kthread+0x1ac/0x1c0 [c0000036cc9dbe30] [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80 Fixes: 8e3b68830186 ("RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion") Link: https://lore.kernel.org/r/20200318091640.44069-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# c320e527	15-Jan-2020	Moni Shoua <monis@mellanox.com>	IB: Allow calls to ib_umem_get from kernel ULPs So far the assumption was that ib_umem_get() and ib_umem_odp_get() are called from flows that start in UVERBS and therefore has a user context. This assumption restricts flows that are initiated by ULPs and need the service that ib_umem_get() provides. This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device directly by relying on the fact that both UVERBS and ULPs sets that field correctly. Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
# 72b894b0	13-Nov-2019	Christoph Hellwig <hch@lst.de>	IB/umem: remove the dmasync argument to ib_umem_get The argument is always ignored, so remove it. Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# 50211ec9	09-Oct-2019	Jason Gunthorpe <jgg@ziepe.ca>	RDMA/mlx5: Split sig_err MR data into its own xarray The locking model for signature is completely different than ODP, do not share the same xarray that relies on SRCU locking to support ODP. Simply store the active mlx5_core_sig_ctx's in an xarray when signature MRs are created and rely on trivial xarray locking to serialize everything. The overhead of storing only a handful of SIG related MRs is going to be much less than an xarray full of every mkey. Link: https://lore.kernel.org/r/20191009160934.3143-3-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# 4e0e2ea1	30-Jun-2019	Yishai Hadas <yishaih@mellanox.com>	net/mlx5: Report EQE data upon CQ completion Report EQE data upon CQ completion to let upper layers use this data. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
# 38164b77	30-Jun-2019	Yishai Hadas <yishaih@mellanox.com>	net/mlx5: mlx5_core_create_cq() enhancements Enhance mlx5_core_create_cq() to get the command out buffer from the callers to let them use the output. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
# 792c4e9d	20-Jun-2019	Matthew Wilcox <willy@infradead.org>	net/mlx5: Convert mkey_table to XArray The lock protecting the data structure does not need to be an rwlock. The only read access to the lock is in an error path, and if that's limiting your scalability, you have bigger performance problems. Eliminate mlx5_mkey_table in favour of using the xarray directly. reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may be called in interrupt context. This also fixes a minor bug where SRCU locking was being used on the radix tree read side, when RCU was needed too. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 836a0fbb	16-Jun-2019	Leon Romanovsky <leon@kernel.org>	RDMA: Check umem pointer validity prior to release Update ib_umem_release() to behave similarly to kfree() and allow submitting NULL pointer as safe input to this function. Fixes: a52c8e2469c3 ("RDMA: Clean destroy CQ in drivers do not return errors") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# e39afe3d	28-May-2019	Leon Romanovsky <leon@kernel.org>	RDMA: Convert CQ allocations to be under core responsibility Ensure that CQ is allocated and freed by IB/core and not by drivers. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# a52c8e24	28-May-2019	Leon Romanovsky <leon@kernel.org>	RDMA: Clean destroy CQ in drivers do not return errors Like all other destroy commands, .destroy_cq() call is not supposed to fail. In all flows, the attempt to return earlier caused to memory leaks. This patch converts .destroy_cq() to do not return any errors. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# ff23dfa1	31-Mar-2019	Shamir Rabinovitch <shamir.rabinovitch@oracle.com>	IB: Pass only ib_udata in function prototypes Now when ib_udata is passed to all the driver's object create/destroy APIs the ib_udata will carry the ib_ucontext for every user command. There is no need to also pass the ib_ucontext via the functions prototypes. Make ib_udata the only argument psssed. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# bdeacabd	31-Mar-2019	Shamir Rabinovitch <shamir.rabinovitch@oracle.com>	IB: Remove 'uobject->context' dependency in object destroy APIs Now that we have the udata passed to all the ib_xxx object destroy APIs and the additional macro 'rdma_udata_to_drv_context' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# c4367a26	31-Mar-2019	Shamir Rabinovitch <shamir.rabinovitch@oracle.com>	IB: Pass uverbs_attr_bundle down ib_x destroy path The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x destroy path as ib_udata. The next patch will use the ib_udata to free the drivers destroy path from the dependency in 'uobject->context' as we already did for the create path. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# 10f56242	21-Jan-2019	Moni Shoua <monis@mellanox.com>	IB/mlx5: Fix the locking of SRQ objects in ODP events QP and SRQ objects are stored in different containers so the action to get and lock a common resource during ODP event needs to address that. While here get rid of 'refcount' and 'free' fields in mlx5_core_srq struct and use the fields with same semantics in common structure. Fixes: 032080ab43ac ("IB/mlx5: Lock QP during page fault handling") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# b0ea0fa5	09-Jan-2019	Jason Gunthorpe <jgg@ziepe.ca>	IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udata ib_umem_get() can only be called in a method callback, which always has a udata parameter. This allows ib_umem_get() to derive the ucontext pointer directly from the udata without requiring the drivers to find it in some way or another. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
# 8e3b6883	12-Dec-2018	Leon Romanovsky <leon@kernel.org>	RDMA/mlx5: Delete unreachable handle_atomic code by simplifying SW completion Handle atomic was left as unimplemented from 2013, remove the code till this part will be developed. Remove the dead code by simplifying SW completion logic which is supposed to be the same for send and receive paths. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # compile tested Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# bdefffd1	04-Dec-2018	Tariq Toukan <tariqt@mellanox.com>	IB/mlx5: Use helper to get CQE opcode Use the new helper that extracts the opcode from a CQE (completion queue entry) structure. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com>
# b4990804	28-Nov-2018	Leon Romanovsky <leon@kernel.org>	RDMA/mlx5: Update SRQ functions signatures to mlx5_ib format Reflect the change of moving SRQ code from mlx5_core to mlx5_ib by updating function signatures do not require mlx5_core_dev as an input, because all operations in mlx5_ib are supposed to use mlx5_ib_dev. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
# f02d0d6e	28-Nov-2018	Leon Romanovsky <leon@kernel.org>	net/mlx5: Move SRQ functions to RDMA part There is no need to keep SRQ which is RDMA object in mlx5_core. In this patch, we partially move the execution code, while next patches will move table initialization/release logic too. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
# 4972e6fa	12-Sep-2018	Tariq Toukan <tariqt@mellanox.com>	net/mlx5: Refactor fragmented buffer struct fields and init flow Take struct mlx5_frag_buf out of mlx5_frag_buf_ctrl, as it is not needed to manage and control the datapath of the fragmented buffers API. struct mlx5_frag_buf contains control info to manage the allocation and de-allocation of the fragmented buffer. Its fields are not relevant for datapath, so here I take them out of the struct mlx5_frag_buf_ctrl, except for the fragments array itself. In addition, modified mlx5_fill_fbc to initialise the frags pointers as well. This implies that the buffer must be allocated before the function is called. A set of type-specific *_get_byte_size() functions are replaced by a generic one. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 5d6ff1ba	08-Oct-2018	Yonatan Cohen <yonatanc@mellanox.com>	IB/mlx5: Support scatter to CQE for DC transport type Scatter to CQE is a HW offload that saves PCI writes by scattering the payload to the CQE. This patch extends already existing functionality to support DC transport type. Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# cf50a786	20-Sep-2018	Yishai Hadas <yishaih@mellanox.com>	IB/mlx5: Set uid as part of CQ creation Set uid as part of CQ creation so that the firmware can manage the CQ object in a secured way. The uid for the destroy and the modify commands is set by mlx5_core. This will enable using a CQ that was created by verbs application to be used by the DEVX flow in case the uid is equal. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# 26e551c5	31-Jul-2018	Kamal Heib <kamalheib1@gmail.com>	RDMA: Fix return code check in rdma_set_cq_moderation The proper return code is "-EOPNOTSUPP" when the modify_cq() callback is not supported, all drivers should generate this and all users should check for it when detecting not supported functionality. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> (for mlx5) Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# 6f1006a4	27-May-2018	Yonatan Cohen <yonatanc@mellanox.com>	IB/mlx5: Introduce a new mini-CQE format The new mini-CQE format includes the stride index, byte count and packet checksum. Stride index is needed for striding WQ feature. This patch exposes this capability and enables its setting via mlx5 UHW data as part of query device and cq creation. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# 7b74a83c	21-May-2018	Erez Shitrit <erezsh@mellanox.com>	IB/mlx5: Fetch soft WQE's on fatal error state On fatal error the driver simulates CQE's for ULPs that rely on completion of all their posted work-request. For the GSI traffic, the mlx5 has its own mechanism that sends the completions via software CQE's directly to the relevant CQ. This should be kept in fatal error too, so the driver should simulate such CQE's with the specified error state in order to complete GSI QP work requests. Without the fix the next deadlock might appears: schedule_timeout+0x274/0x350 wait_for_common+0xec/0x240 mcast_remove_one+0xd0/0x120 [ib_core] ib_unregister_device+0x12c/0x230 [ib_core] mlx5_ib_remove+0xc4/0x270 [mlx5_ib] mlx5_detach_device+0x184/0x1a0 [mlx5_core] mlx5_unload_one+0x308/0x340 [mlx5_core] mlx5_pci_err_detected+0x74/0xe0 [mlx5_core] Cc: <stable@vger.kernel.org> # 4.7 Fixes: 89ea94a7b6c4 ("IB/mlx5: Reset flow support for IB kernel ULPs") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# 909d4344	16-May-2018	Christophe JAILLET <christophe.jaillet@wanadoo.fr>	IB/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()' When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to free it. Fixes: 1cbe6fc86ccfe ("IB/mlx5: Add support for CQE compressing") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 1acae6b0	30-Dec-2017	Eran Ben Elisha <eranbe@mellanox.com>	mlx5: Move dump error CQE function out of mlx5_ib for code sharing Move mlx5_ib dump error CQE implementation to mlx5 CQ header file in order to use it in a downstream patch from mlx5e. In addition, use print_hex_dump instead of manual dumping of the buffer. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 28e9091e	07-Mar-2018	Leon Romanovsky <leon@kernel.org>	RDMA/mlx5: Fix integer overflow while resizing CQ The user can provide very large cqe_size which will cause to integer overflow as it can be seen in the following UBSAN warning: ======================================================================= UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/cq.c:1192:53 signed integer overflow: 64870 * 65536 cannot be represented in type 'int' CPU: 0 PID: 267 Comm: syzkaller605279 Not tainted 4.15.0+ #90 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0xde/0x164 ? dma_virt_map_sg+0x22c/0x22c ubsan_epilogue+0xe/0x81 handle_overflow+0x1f3/0x251 ? __ubsan_handle_negate_overflow+0x19b/0x19b ? lock_acquire+0x440/0x440 mlx5_ib_resize_cq+0x17e7/0x1e40 ? cyc2ns_read_end+0x10/0x10 ? native_read_msr_safe+0x6c/0x9b ? cyc2ns_read_end+0x10/0x10 ? mlx5_ib_modify_cq+0x220/0x220 ? sched_clock_cpu+0x18/0x200 ? lookup_get_idr_uobject+0x200/0x200 ? rdma_lookup_get_uobject+0x145/0x2f0 ib_uverbs_resize_cq+0x207/0x3e0 ? ib_uverbs_ex_create_cq+0x250/0x250 ib_uverbs_write+0x7f9/0xef0 ? cyc2ns_read_end+0x10/0x10 ? print_irqtrace_events+0x280/0x280 ? ib_uverbs_ex_create_cq+0x250/0x250 ? uverbs_devnode+0x110/0x110 ? sched_clock_cpu+0x18/0x200 ? do_raw_spin_trylock+0x100/0x100 ? __lru_cache_add+0x16e/0x290 __vfs_write+0x10d/0x700 ? uverbs_devnode+0x110/0x110 ? kernel_read+0x170/0x170 ? sched_clock_cpu+0x18/0x200 ? security_file_permission+0x93/0x260 vfs_write+0x1b0/0x550 SyS_write+0xc7/0x1a0 ? SyS_read+0x1a0/0x1a0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x1e/0x8b RIP: 0033:0x433549 RSP: 002b:00007ffe63bd1ea8 EFLAGS: 00000217 ======================================================================= Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 3.13 Fixes: bde51583f49b ("IB/mlx5: Add support for resize CQ") Reported-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 212a0cbc	09-Mar-2018	Doug Ledford <dledford@redhat.com>	Revert "RDMA/mlx5: Fix integer overflow while resizing CQ" The original commit of this patch has a munged log message that is missing several of the tags the original author intended to be on the patch. This was due to patchworks misinterpreting a cut-n-paste separator line as an end of message line and munging the mbox that was used to import the patch: https://patchwork.kernel.org/patch/10264089/ The original patch will be reapplied with a fixed commit message so the proper tags are applied. This reverts commit aa0de36a40f446f5a21a7c1e677b98206e242edb. Signed-off-by: Doug Ledford <dledford@redhat.com>
# aa0de36a	07-Mar-2018	Leon Romanovsky <leon@kernel.org>	RDMA/mlx5: Fix integer overflow while resizing CQ The user can provide very large cqe_size which will cause to integer overflow as it can be seen in the following UBSAN warning: Signed-off-by: Doug Ledford <dledford@redhat.com>
# 65389322	25-Feb-2018	Moni Shoua <monis@mellanox.com>	IB/mlx: Set slid to zero in Ethernet completion struct IB spec says that a lid should be ignored when link layer is Ethernet, for example when building or parsing a CM request message (CA17-34). However, since ib_lid_be16() and ib_lid_cpu16() validates the slid, not only when link layer is IB, we set the slid to zero to prevent false warnings in the kernel log. Fixes: 62ede7779904 ("Add OPA extended LID support") Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# 388ca8be	02-Jan-2018	Yonatan Cohen <yonatanc@mellanox.com>	IB/mlx5: Implement fragmented completion queue (CQ) The current implementation of create CQ requires contiguous memory, such requirement is problematic once the memory is fragmented or the system is low in memory, it causes for failures in dma_zalloc_coherent(). This patch implements new scheme of fragmented CQ to overcome this issue by introducing new type: 'struct mlx5_frag_buf_ctrl' to allocate fragmented buffers, rather than contiguous ones. Base the Completion Queues (CQs) on this new fragmented buffer. It fixes following crashes: kworker/29:0: page allocation failure: order:6, mode:0x80d0 CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0 Workqueue: ib_cm cm_work_handler [ib_cm] Call Trace: [<>] dump_stack+0x19/0x1b [<>] warn_alloc_failed+0x110/0x180 [<>] __alloc_pages_slowpath+0x6b7/0x725 [<>] __alloc_pages_nodemask+0x405/0x420 [<>] dma_generic_alloc_coherent+0x8f/0x140 [<>] x86_swiotlb_alloc_coherent+0x21/0x50 [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core] [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core] [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core] [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core] [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib] [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib] Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# beb801ac	26-Jan-2018	Jason Gunthorpe <jgg@ziepe.ca>	RDMA: Move enum ib_cq_creation_flags to uapi headers The flags field the enum is used with comes directly from the uapi so it belongs in the uapi headers for clarity and so userspace can use it. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
# b0e9df6d	13-Nov-2017	Yonatan Cohen <yonatanc@mellanox.com>	IB/mlx5: Exposing modify CQ callback to uverbs layer Exposed mlx5_ib_modify_cq to be called from ib device verb list. Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 7a0c8f42	18-Oct-2017	Guy Levi <guyle@mellanox.com>	IB/mlx5: Support padded 128B CQE feature In some benchmarks and some CPU architectures, writing the CQE on a full cache line size improves performance by saving memory access operations (read-modify-write) relative to partial cache line change. This patch lets the user to configure the device to pad the CQE up to 128B in case its content is less than 128B. Currently the driver supports only padding for a CQE size of 128B. Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
# de57f2ad	18-Oct-2017	Guy Levi <guyle@mellanox.com>	IB/mlx5: Support 128B CQE compression feature In commit 1cbe6fc86ccf ("IB/mlx5: Add support for CQE compressing") the concept of CQE compression was introduced and added a support for 64B CQE size. This change update the code to support 128B CQE size as well. Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
# f6b1ee34	11-Oct-2017	Bart Van Assche <bvanassche@acm.org>	IB/mlx5: Suppress gcc 7 fall-through complaints Avoid that gcc 7 reports the following warning when building with W=1: warning: this statement may fall through [-Wimplicit-fallthrough=] Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 4edf8d5c	17-Aug-2017	Talat Batheesh <talatb@mellanox.com>	IB/mlx5: Fix some spelling mistakes Fix spelling mistakes in remarks "retrun"->"return" "Decalring"->"Declaring" Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
# e093111d	27-Jun-2017	Amrani, Ram <Ram.Amrani@cavium.com>	IB/core: Fix input len in multiple user verbs Most user verbs pass user data to the kernel with the inclusion of the ib_uverbs_cmd_hdr structure. This is problematic because the vendor has no ideas if the verb was called by a legacy verb or an extended verb. Also, the incosistency between the verbs is confusing. Fixes: 565197dd8fb1 ("IB/core: Extend ib_uverbs_create_cq") Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 1b9a07ee	10-May-2017	Leon Romanovsky <leon@kernel.org>	{net, IB}/mlx5: Replace mlx5_vzalloc with kvzalloc Commit a7c3e901a46f ("mm: introduce kv[mz]alloc helpers") added proper implementation of mlx5_vzalloc function to the MM core. This made the mlx5_vzalloc function useless, so let's remove it. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 12f8fede	20-Apr-2017	Moni Shoua <monis@mellanox.com>	IB/mlx5: Set correct SL in completion for RoCE There is a difference when parsing a completion entry between Ethernet and IB ports. When link layer is Ethernet the bits describe the type of L3 header in the packet. In the case when link layer is Ethernet and VLAN header is present the value of SL is equal to the 3 UP bits in the VLAN header. If VLAN header is not present then the SL is undefined and consumer of the completion should check if IB_WC_WITH_VLAN is set. While that, this patch also fills the vlan_id field in the completion if present. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 44f2e99e	28-Mar-2017	Bodong Wang <bodong@mellanox.com>	IB/mlx5: Fix wrong use of kfree at bad flow in create_cq_user The kfree was called to free cqb, while it should free *cqb. Fixes: 1cbe6fc86ccf ("IB/mlx5: Add support for CQE compressing") Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
# b037c29a	03-Jan-2017	Eli Cohen <eli@mellanox.com>	IB/mlx5: Allow future extension of libmlx5 input data Current check requests that new fields in struct mlx5_ib_alloc_ucontext_req_v2 that are not known to the driver be zero. This was introduced so new libraries passing additional information to the kernel through struct mlx5_ib_alloc_ucontext_req_v2 will be notified by old kernels that do not support their request by failing the operation. This schecme is problematic since it requires libmlx5 to issue the requests with descending input size for struct mlx5_ib_alloc_ucontext_req_v2. To avoid this, we require that new features that will obey the following rules: If the feature requires one or more fields in the response and the at least one of the fields can be encoded such that a zero value means the kernel ignored the request then this field will provide the indication to the library. If no response is required or if zero is a valid response, a new field should be added that indicates to the library whether its request was processed. Fixes: b368d7cb8ceb ('IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext') Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 5fe9dec0	03-Jan-2017	Eli Cohen <eli@mellanox.com>	IB/mlx5: Use blue flame register allocator in mlx5_ib Make use of the blue flame registers allocator at mlx5_ib. Since blue flame was not really supported we remove all the code that is related to blue flame and we let all consumers to use the same blue flame register. Once blue flame is supported we will add the code. As part of this patch we also move the definition of struct mlx5_bf to mlx5_ib.h as it is only used by mlx5_ib. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 2f5ff264	03-Jan-2017	Eli Cohen <eli@mellanox.com>	mlx5: Fix naming convention with respect to UARs This establishes a solid naming conventions for UARs. A UAR (User Access Region) can have size identical to a system page or can be fixed 4KB depending on a value queried by firmware. Each UAR always has 4 blue flame register which are used to post doorbell to send queue. In addition, a UAR has section used for posting doorbells to CQs or EQs. In this patch we change names to reflect this conventions. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 1cbe6fc8	30-Oct-2016	Bodong Wang <bodong@mellanox.com>	IB/mlx5: Add support for CQE compressing CQE compressing reduces PCI overhead by coalescing and compressing multiple CQEs into a single merged CQE. Successful compressing improves message rate especially for small packet traffic. CQE compressing is supported for all 64B CQE formats (with certain limitations) generated by RQ/Responder or by SQ/Requestor. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 762f899a	27-Oct-2016	Majd Dibbiny <majd@mellanox.com>	IB/mlx5: Limit mkey page size to 2GB The maximum page size in the mkey context is 2GB. Until today, we didn't enforce this requirement in the code, and therefore, if we got a page size larger than 2GB, we have passed zeros in the log_page_shift instead of the actual value and the registration failed. This patch limits the driver to use compound pages of 2GB for mkeys. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 16b0e069	27-Oct-2016	Daniel Jurgens <danielj@mellanox.com>	IB/mlx5: Use cache line size to select CQE stride When creating kernel CQs use 128B CQE stride if the cache line size is 128B, 64B otherwise. This prevents multiple CQEs from residing in a 128B cache line, which can cause retries when there are concurrent read and writes in one cache line. Tested with IPoIB on PPC64, saw ~5% throughput improvement. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 3085e29e	22-Sep-2016	Leon Romanovsky <leon@kernel.org>	IB/mlx5: Move and decouple user vendor structures This patch decouples and moves vendors specific structures to common UAPI folder which will be visible to all consumers. These structures are used by user-space library driver (libmlx5) and currently manually copied to that library. This move will allow cross-compile against these files and simplify introduction of vendor specific data. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
# dbdf7d4e	28-Aug-2016	Leon Romanovsky <leon@kernel.org>	IB/mlx5: Don't return errors from poll_cq Remove returning errors from mlx5 poll_cq function. Polling CQ operation in kernel never fails by Mellanox HCA architecture and respective driver design. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 27827786	15-Jul-2016	Saeed Mahameed <saeedm@mellanox.com>	{net,IB}/mlx5: CQ commands via mlx5 ifc Remove old representation of manually created CQ commands layout, and use mlx5_ifc canonical structures and defines. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
# 89ea94a7	17-Jun-2016	Maor Gottlieb <maorg@mellanox.com>	IB/mlx5: Reset flow support for IB kernel ULPs The driver exposes interfaces that directly relate to HW state. Upon fatal error, consumers of these interfaces (ULPs) that rely on completion of all their posted work-request could hang, thereby introducing dependencies in shutdown order. To prevent this from happening, we manage the relevant resources (CQs, QPs) that are used by the device. Upon a fatal error, we now generate simulated completions for outstanding WQEs that were not completed at the time the HW was reset. It includes invoking the completion event handler for all involved CQs so that the ULPs will poll those CQs. When polled we return simulated CQEs with IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and not wait forever for completions upon receiving remove_one. The above change requires an extra check in the data path to make sure that when device is in error state, the simulated CQEs will be returned and no further WQEs will be posted. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 3c4c3774	04-Jun-2016	Noa Osherovich <noaos@mellanox.com>	IB/mlx5: Fix entries check in mlx5_ib_resize_cq Verify that number of entries is less than device capability. Add an appropriate warning message for error flow. Fixes: bde51583f49b ('IB/mlx5: Add support for resize CQ') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 9ea57852	04-Jun-2016	Noa Osherovich <noaos@mellanox.com>	IB/mlx5: Fix entries checks in mlx5_ib_create_cq Number of entries shouldn't be greater than the device's max capability. This should be checked before rounding the entries number to power of two. Fixes: 51ee86a4af639 ('IB/mlx5: Fix check of number of entries...') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
# c16d2750	17-Apr-2016	Matan Barak <matanb@mellanox.com>	IB/mlx5: Fire the CQ completion handler from tasklet Previously, mlx5_ib_cq_comp was executed from interrupt context. Under heavy load, this could cause the CPU core to be in an interrupt context too long. Instead of executing the handler from the interrupt context we execute it from a much friendly tasklet context. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# a606b0f6	29-Feb-2016	Matan Barak <matanb@mellanox.com>	net/mlx5: Refactor mlx5_core_mr to mkey Mlx5's mkey mechanism is also used for memory windows. The current code base uses MR (memory region) naming, which is inaccurate. Changing MR to mkey in order to represent its different usages more accurately. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 25361e02	29-Feb-2016	Haggai Eran <haggaie@mellanox.com>	IB/mlx5: Generate completions in software The GSI QP emulation requires also emulating completions for transmitted MADs. The CQ on which these completions are generated can also be used by the hardware, and the MAD layer is free to use any CQ of the device for the GSI QP. Add a method for generating software completions to each mlx5 CQ. Software completions are polled first, and generate calls to the completion handler callback if necessary. Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# c7ce833b	21-Feb-2016	Erez Shitrit <erezsh@mellanox.com>	IB/mlx5: Add support for CSUM in RX flow The driver checks the csum from the HW when completion arrived and marks it in the wc->wc_flags field for the ulp drivers. These is for packets from type IB_WC_RECV only. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 34356f64	29-Dec-2015	Leon Romanovsky <leon@kernel.org>	IB/mlx5: Unify CQ create flags check The create_cq() can receive creation flags which were used differently by two commits which added create_cq extended command and cross-channel. The merged code caused to not accept any flags at all. This patch unifies the check into one function and one return error code. Fixes: 972ecb821379 ("IB/mlx5: Add create_cq extended command") Fixes: 051f263098a9 ("IB/mlx5: Add driver cross-channel support") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 0b6e26ce	17-Jan-2016	Doron Tsur <doront@mellanox.com>	net/mlx5_core: Fix trimming down IRQ number With several ConnectX-4 cards installed on a server, one may receive irqn > 255 from the kernel API, which we mistakenly trim to 8bit. This causes EQ creation failure with the following stack trace: [<ffffffff812a11f4>] dump_stack+0x48/0x64 [<ffffffff810ace21>] __setup_irq+0x3a1/0x4f0 [<ffffffff810ad7e0>] request_threaded_irq+0x120/0x180 [<ffffffffa0923660>] ? mlx5_eq_int+0x450/0x450 [mlx5_core] [<ffffffffa0922f64>] mlx5_create_map_eq+0x1e4/0x2b0 [mlx5_core] [<ffffffffa091de01>] alloc_comp_eqs+0xb1/0x180 [mlx5_core] [<ffffffffa091ea99>] mlx5_dev_init+0x5e9/0x6e0 [mlx5_core] [<ffffffffa091ec29>] init_one+0x99/0x1c0 [mlx5_core] [<ffffffff812e2afc>] local_pci_probe+0x4c/0xa0 Fixing it by changing of the irqn type from u8 to unsigned int to support values > 255 Fixes: 61d0e73e0a5a ('net/mlx5_core: Use the the real irqn in eq->irqn') Reported-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Doron Tsur <doront@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# 051f2630	19-Dec-2015	Leon Romanovsky <leon@kernel.org>	IB/mlx5: Add driver cross-channel support Add support of cross-channel functionality to mlx5 driver. This includes ability to ignore overrun for CQ which intended for cross-channel, export device capability and configure the QP to be sync master/slave queues. The cross-channel enabled QP supports combination of three possible properties: * WQE processing on the receive queue of this QP * WQE processing on the send queue of this QP * WQE are supported on the send queue Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 972ecb82	15-Dec-2015	Matan Barak <matanb@mellanox.com>	IB/mlx5: Add create_cq extended command In order to create a CQ that supports timestamp, mlx5 needs to support the extended create CQ command with the timestamp flag. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# feb7c1e3	23-Dec-2015	Christoph Hellwig <hch@lst.de>	IB: remove in-kernel support for memory windows Remove the unused ib_allow_mw and ib_bind_mw functions, remove the unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw into the uverbs module. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# cb34be6d	23-Dec-2015	Achiad Shochat <achiad@mellanox.com>	IB/mlx5: Set network_hdr_type upon RoCE responder completion When handling a responder completion, if the link layer is Ethernet, set the work completion network_hdr_type field according to CQE's info and the IB_WC_WITH_NETWORK_HDR_TYPE flag. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# dd01e66a	13-Oct-2015	Sagi Grimberg <sagig@mellanox.com>	IB/mlx5: Remove old FRWR API support No ULP uses it anymore, go ahead and remove it. Keep only the local invalidate part of the handlers. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
# 8a187ee5	13-Oct-2015	Sagi Grimberg <sagig@mellanox.com>	IB/mlx5: Support the new memory registration API Support the new memory registration API by allocating a private page list array in mlx5_ib_mr and populate it when mlx5_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR by setting the exact WQE as IB_WR_FAST_REG_MR, just take the needed information from different places: - page_size, iova, length, access flags (ib_mr) - page array (mlx5_ib_mr) - key (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
# b636401f	02-Sep-2015	Sagi Grimberg <sagig@mellanox.com>	mlx5: Fix incorrect wc pkey_index assignment for GSI messages Since patch series "Demux IB CM requests in the rdma_cm module" the P_Key index is taken from the work completion rather than the message itself. The HCA provides us with the message P_Key. In order to provide the P_Key index, we need to look it up. Given that this is relevant only for GSI messages (session establishments) which is less performance critical, micro-optimize against the GSI (is_qp1) branch. Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM") Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
# bcf4c1ea	11-Jun-2015	Matan Barak <matanb@mellanox.com>	IB/core: Change provider's API of create_cq to be extendible Add a new ib_cq_init_attr structure which contains the previous cqe (minimum number of CQ entries) and comp_vector (completion vector) in addition to a new flags field. All vendors' create_cq callbacks are changed in order to work with the new API. This commit does not change any functionality. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2 Signed-off-by: Doug Ledford <dledford@redhat.com>
# 938fe83c	28-May-2015	Saeed Mahameed <saeedm@mellanox.com>	net/mlx5_core: New device capabilities handling - Query all supported types of dev caps on driver load. - Store the Cap data outbox per cap type into driver private data. - Introduce new Macros to access/dump stored caps (using the auto generated data types). - Obsolete SW representation of dev caps (no need for SW copy for each cap). - Modify IB driver to use new macros for checking caps. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# 64ffaa21	28-May-2015	Amir Vadai <amirv@mellanox.com>	net/mlx5_core,mlx5_ib: Do not use vmap() on coherent memory As David Daney pointed in mlx4_core driver [1], mlx5_core is also misusing the DMA-API. This patch is removing the code that vmap() memory allocated by dma_alloc_coherent(). After this patch, users of this drivers might fail allocating resources on memory fragmeneted systems. This will be fixed later on. [1] - https://patchwork.ozlabs.org/patch/458531/ CC: David Daney <david.daney@cavium.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# ce0f7509	02-Apr-2015	Saeed Mahameed <saeedm@mellanox.com>	net/mlx5_core: Modify arm CQ in preparation for upcoming Ethernet driver Pass consumer index as a parameter to arm CQ Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# 233d05d2	02-Apr-2015	Saeed Mahameed <saeedm@mellanox.com>	net/mlx5_core: Move completion eqs from mlx5_ib to mlx5_core Preparation for ethernet driver. These functions will be used in drivers other than mlx5_ib. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# 6cf0a15f	02-Apr-2015	Saeed Mahameed <saeedm@mellanox.com>	IB/mlx5: Fix Mellanox copyright note Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# b812b544	02-Apr-2015	Saeed Mahameed <saeedm@mellanox.com>	net/mlx5_core: Clear doorbell record inside mlx5_db_alloc() Do it in one place instead of every where the function is invoked Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# 479163f4	20-Nov-2014	Al Viro <viro@ZenIV.linux.org.uk>	mlx5: don't duplicate kvfree() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# c7a08ac7	01-Oct-2014	Eli Cohen <eli@mellanox.com>	net/mlx5_core: Update device capabilities handling Rearrange struct mlx5_caps so it has a "gen" field to represent the current capabilities configured for the device. Max capabilities can also be queried from the device. Also update capabilities struct to contain more fields as per the latest revision if firmware specification. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# f241e749	28-Jul-2014	Jack Morgenstein <jackm@dev.mellanox.co.il>	mlx5: minor fixes (mainly avoidance of hidden casts) There were many places where parameters which should be u8/u16 were integer type. Additionally, in 2 places, a check for a non-null pointer was added before dereferencing the pointer (this is actually a bug fix). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# 9603b61d	28-Jul-2014	Jack Morgenstein <jackm@dev.mellanox.co.il>	mlx5: Move pci device handling from mlx5_ib to mlx5_core In preparation for a new mlx5 device which is VPI (i.e., ports can be either IB or ETH), move the pci device functionality from mlx5_ib to mlx5_core. This involves the following changes: 1. Move mlx5_core_dev struct out of mlx5_ib_dev. mlx5_core_dev is now an independent structure maintained by mlx5_core. mlx5_ib_dev now has a pointer to that struct. This requires changing a lot of places where the core_dev struct was accessed via mlx5_ib_dev (now, this needs to be a pointer dereference). 2. All PCI initializations are now done in mlx5_core. Thus, it is now mlx5_core which does pci_register_device (and not mlx5_ib, as was previously). 3. mlx5_ib now registers itself with mlx5_core as an "interface" driver. This is very similar to the mechanism employed for the mlx4 (ConnectX) driver. Once the HCA is initialized (by mlx5_core), it invokes the interface drivers to do their initializations. 4. There is a new event handler which the core registers: mlx5_core_event(). This event handler invokes the event handlers registered by the interfaces. Based on a patch by Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# a8237b32	05-May-2014	Yann Droneaud <ydroneaud@opteya.com>	IB/mlx5: add missing padding at end of struct mlx5_ib_create_cq The i386 ABI disagrees with most other ABIs regarding alignment of data type larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABI struct mlx5_ib_create_cq get padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. The tool pahole can be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name mlx5_ib_create_cq \ drivers/infiniband/hw/mlx5/mlx5_ib.o Then, structure layout can be compared between i386 and x86_64: +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-28 11:43:07.386413682 +0100 --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt 2014-03-27 13:06:17.788472721 +0100 @@ -34,9 +34,8 @@ struct mlx5_ib_create_cq { __u64 db_addr; /* 8 8 / __u32 cqe_size; / 16 4 / - / size: 20, cachelines: 1, members: 3 / - / last cacheline: 20 bytes / + / size: 24, cachelines: 1, members: 3 / + / padding: 4 / + / last cacheline: 24 bytes */ }; This ABI disagreement will make an x86_64 kernel try to read past the buffer provided by an i386 binary. When boundary check will be implemented, a x86_64 kernel will refuse to read past the i386 userspace provided buffer and the uverb will fail. Anyway, if the structure lies in memory on a page boundary and next page is not mapped, ib_copy_from_udata() will fail when trying to read the 4 bytes of padding and the uverb will fail. This patch makes create_cq_user() takes care of the input data size to handle the case where no padding is provided. This way, x86_64 kernel will be able to handle struct mlx5_ib_create_cq as sent by unpatched and patched i386 libmlx5. Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com Cc: <stable@vger.kernel.org> Fixes: e126ba97dba9e ("mlx5: Add driver for Mellanox Connect-IB adapter") Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
# d5436ba0	23-Feb-2014	Sagi Grimberg <sagig@mellanox.com>	IB/mlx5: Collect signature error completion This commit takes care of the generated signature error CQE generated by the HW (if happened). The underlying mlx5 driver will handle signature error completions and will mark the relevant memory region as dirty. Once the consumer gets the completion for the transaction, it must check for signature errors on signature memory region using a new lightweight verb ib_check_mr_status(). In case the user doesn't check for signature error (i.e. doesn't call ib_check_mr_status() with status check IB_MR_CHECK_SIG_STATUS), the memory region cannot be used for another signature operation (REG_SIG_MR work request will fail). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
# 57761d8d	15-Jan-2014	Eli Cohen <eli@dev.mellanox.co.il>	IB/mlx5: Verify reserved fields are cleared Verify that reserved fields in struct mlx5_ib_resize_cq are cleared before continuing execution of the verb. This is required to allow making use of this area in future revisions. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
# bde51583	14-Jan-2014	Eli Cohen <eli@dev.mellanox.co.il>	IB/mlx5: Add support for resize CQ Implement resize CQ which is a mandatory verb in mlx5. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
# 3bdb31f6	14-Jan-2014	Eli Cohen <eli@dev.mellanox.co.il>	IB/mlx5: Implement modify CQ Modify CQ is used by ULPs like IPoIB to change moderation parameters. This patch adds support in mlx5. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
# cf1c5e1f	31-Oct-2013	Eli Cohen <eli@dev.mellanox.co.il>	IB/mlx5: Fix page shift in create CQ for userspace When creating a CQ, we must use mlx5 adapter page shift. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
# 1b77d2bd	23-Oct-2013	Eli Cohen <eli@dev.mellanox.co.il>	mlx5: Use enum to indicate adapter page size The Connect-IB adapter has an inherent page size which equals 4K. Define an new enum that equals the page shift and use it instead of using the value 12 throughout the code. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
# bf0bf77f	23-Oct-2013	Eli Cohen <eli@dev.mellanox.co.il>	mlx5: Support communicating arbitrary host page size to firmware Connect-IB firmware requires 4K pages to be communicated with the driver. This patch breaks larger pages to 4K units to enable support for architectures utilizing larger page size, such as PowerPC. This patch also fixes several places that referred to PAGE_SHIFT instead of explicit 12 which is the inherent page shift on Connect-IB. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
# cfd8f1d4	23-Oct-2013	Moshe Lazer <moshel@mellanox.com>	IB/mlx5: Fix srq free in destroy qp On destroy QP the driver walks over the relevant CQ and removes CQEs reported for the destroyed QP. It also frees the related SRQ entry without checking that this is actually an SRQ-related CQE. In case of a CQ used for both send and receive QP, we could free SRQ entries for send CQEs. This patch resolves this issue by verifying that this is a SRQ related CQE by checking the SRQ number in the CQE is not zero. Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
# 51ee86a4	23-Oct-2013	Eli Cohen <eli@dev.mellanox.co.il>	IB/mlx5: Fix check of number of entries in create CQ Verify that the value is non negative before rounding up to power of 2. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
# e126ba97	07-Jul-2013	Eli Cohen <eli@mellanox.com>	mlx5: Add driver for Mellanox Connect-IB adapters The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This partitioning resembles what we have for mlx4, except that mlx5_ib is the pci device driver and not mlx5_core. mlx5_core is essentially a library that provides general functionality that is intended to be used by other Mellanox devices that will be introduced in the future. mlx5_ib has a similar role as any hardware device under drivers/infiniband/hw. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> [ Merge in coccinelle fixes from Fengguang Wu <fengguang.wu@intel.com>. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>