#
aca496fb |
|
12-May-2021 |
Lang Cheng <chenglang@huawei.com> |
RDMA/mlx4: Remove unused parameter udata The old version of ib_umem_get() need these udata as a parameter but now they are unnecessary. Fixes: c320e527e154 ("IB: Allow calls to ib_umem_get from kernel ULPs") Link: https://lore.kernel.org/r/1620807142-39157-3-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
81655d3c |
|
04-Sep-2020 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/mlx4: Use ib_umem_num_dma_blocks() For the calls linked to mlx4_ib_umem_calc_optimal_mtt_size() use ib_umem_num_dma_blocks() inside the function, it is just some weird static default. All other places are just using it with PAGE_SIZE, switch to ib_umem_num_dma_blocks(). As this is the last call site, remove ib_umem_num_count(). Link: https://lore.kernel.org/r/15-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
43d781b9 |
|
07-Sep-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Allow fail of destroy CQ Like any other verbs objects, CQ shouldn't fail during destroy, but mlx5_ib didn't follow this contract with mixed IB verbs objects with DEVX. Such mix causes to the situation where FW and kernel are fully interdependent on the reference counting of each side. Kernel verbs and drivers that don't have DEVX flows shouldn't fail. Fixes: e39afe3d6dbd ("RDMA: Convert CQ allocations to be under core responsibility") Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
df561f66 |
|
23-Aug-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
#
c320e527 |
|
15-Jan-2020 |
Moni Shoua <monis@mellanox.com> |
IB: Allow calls to ib_umem_get from kernel ULPs So far the assumption was that ib_umem_get() and ib_umem_odp_get() are called from flows that start in UVERBS and therefore has a user context. This assumption restricts flows that are initiated by ULPs and need the service that ib_umem_get() provides. This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device directly by relying on the fact that both UVERBS and ULPs sets that field correctly. Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
3593f69c |
|
19-Dec-2019 |
Eugene Crosser <evgenii.cherkashin@profitbricks.com> |
RDMA/mlx4: Redo TX checksum offload in line with docs Ingress checksum offload was not working for IPv6 frames because the conditional expression that checks validation status passed from the hardware was not matching the algorithm described in the documentation. This patch defines L4_CSUM flag (which falls inside the badfcs_enc field in the existing definition of the CQE layout) and replaces the conditional expression with the one defined in the "ConnectX(r) Family Programmer's Manual" document. Link: https://lore.kernel.org/r/20191219134847.413582-1-leon@kernel.org Signed-off-by: Eugene Crosser <evgenii.cherkashin@profitbricks.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
72b894b0 |
|
13-Nov-2019 |
Christoph Hellwig <hch@lst.de> |
IB/umem: remove the dmasync argument to ib_umem_get The argument is always ignored, so remove it. Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
836a0fbb |
|
16-Jun-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Check umem pointer validity prior to release Update ib_umem_release() to behave similarly to kfree() and allow submitting NULL pointer as safe input to this function. Fixes: a52c8e2469c3 ("RDMA: Clean destroy CQ in drivers do not return errors") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e39afe3d |
|
28-May-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Convert CQ allocations to be under core responsibility Ensure that CQ is allocated and freed by IB/core and not by drivers. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
a52c8e24 |
|
28-May-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Clean destroy CQ in drivers do not return errors Like all other destroy commands, .destroy_cq() call is not supposed to fail. In all flows, the attempt to return earlier caused to memory leaks. This patch converts .destroy_cq() to do not return any errors. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ff23dfa1 |
|
31-Mar-2019 |
Shamir Rabinovitch <shamir.rabinovitch@oracle.com> |
IB: Pass only ib_udata in function prototypes Now when ib_udata is passed to all the driver's object create/destroy APIs the ib_udata will carry the ib_ucontext for every user command. There is no need to also pass the ib_ucontext via the functions prototypes. Make ib_udata the only argument psssed. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
bdeacabd |
|
31-Mar-2019 |
Shamir Rabinovitch <shamir.rabinovitch@oracle.com> |
IB: Remove 'uobject->context' dependency in object destroy APIs Now that we have the udata passed to all the ib_xxx object destroy APIs and the additional macro 'rdma_udata_to_drv_context' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
c4367a26 |
|
31-Mar-2019 |
Shamir Rabinovitch <shamir.rabinovitch@oracle.com> |
IB: Pass uverbs_attr_bundle down ib_x destroy path The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x destroy path as ib_udata. The next patch will use the ib_udata to free the drivers destroy path from the dependency in 'uobject->context' as we already did for the create path. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
0975890e |
|
09-Jan-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Clear CQ objects during their allocation As part of an audit process to update drivers to use rdma_restrack_add() ensure that CQ objects is cleared before access. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
b0ea0fa5 |
|
09-Jan-2019 |
Jason Gunthorpe <jgg@ziepe.ca> |
IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udata ib_umem_get() can only be called in a method callback, which always has a udata parameter. This allows ib_umem_get() to derive the ucontext pointer directly from the udata without requiring the drivers to find it in some way or another. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
|
#
e4567897 |
|
21-Nov-2018 |
Daniel Jurgens <danielj@mellanox.com> |
{net, IB}/mlx4: Initialize CQ buffers in the driver when possible Perform CQ initialization in the driver when the capability is supported by the FW. When passing the CQ to HW indicate that the CQ buffer has been pre-initialized. Doing so decreases CQ creation time. Testing on P8 showed a single 2048 entry CQ creation time was reduced from ~395us to ~170us, which is 2.3x faster. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
65389322 |
|
25-Feb-2018 |
Moni Shoua <monis@mellanox.com> |
IB/mlx: Set slid to zero in Ethernet completion struct IB spec says that a lid should be ignored when link layer is Ethernet, for example when building or parsing a CM request message (CA17-34). However, since ib_lid_be16() and ib_lid_cpu16() validates the slid, not only when link layer is IB, we set the slid to zero to prevent false warnings in the kernel log. Fixes: 62ede7779904 ("Add OPA extended LID support") Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
beb801ac |
|
26-Jan-2018 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA: Move enum ib_cq_creation_flags to uapi headers The flags field the enum is used with comes directly from the uapi so it belongs in the uapi headers for clarity and so userspace can use it. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
ed8637d3 |
|
02-Nov-2017 |
Guy Levi <guyle@mellanox.com> |
IB/mlx4: Add contig support for control objects Taking advantage of the optimization which was introduced in previous commit ("IB/mlx4: Use optimal numbers of MTT entries") to optimize the MTT usage for QP and CQ. Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
8aff1fb3 |
|
11-Oct-2017 |
Bart Van Assche <bvanassche@acm.org> |
IB/mlx4: Suppress gcc 7 fall-through complaints Avoid that gcc 7 reports the following warning when building with W=1: warning: this statement may fall through [-Wimplicit-fallthrough=] Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
faa9141c |
|
17-Aug-2017 |
Talat Batheesh <talatb@mellanox.com> |
IB/mlx4: Fix some spelling mistakes Fix spelling mistakes in remarks "retrun"->"return" "cancell"->"cancel" Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
f3301870 |
|
21-Jun-2017 |
Moshe Shemesh <moshe@mellanox.com> |
(IB, net)/mlx4: Add resource utilization support Adding visibility of resource usage of QPs, CQs and counters used by virtual functions. This feature will be used to give the PF administrator more data while debugging VF status. Usage info was added to ALLOC_RES command, to notify the PF if the resource which is being reserved or allocated for the VF will be used by kernel driver or by user verbs. Updated reservation and allocation functions of QP, CQ and counter with additional usage parameter. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
8900b894 |
|
23-May-2017 |
Leon Romanovsky <leon@kernel.org> |
{net, IB}/mlx4: Remove gfp flags argument The caller to the driver marks GFP_NOIO allocations with help of memalloc_noio-* calls now. This makes redundant to pass down to the driver gfp flags, which can be GFP_KERNEL only. The patch removes the gfp flags argument and updates all driver paths. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
3e7e1193 |
|
05-Apr-2017 |
Artemy Kovalyov <artemyko@mellanox.com> |
IB: Replace ib_umem page_size by page_shift Size of pages are held by struct ib_umem in page_size field. It is better to store it as an exponent, because page size by nature is always power-of-two and used as a factor, divisor or ilog2's argument. The conversion of page_size to be page_shift allows to have portable code and avoid following error while compiling on ARM: ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined! CC: Selvin Xavier <selvin.xavier@broadcom.com> CC: Steve Wise <swise@chelsio.com> CC: Lijun Ou <oulijun@huawei.com> CC: Shiraz Saleem <shiraz.saleem@intel.com> CC: Adit Ranadive <aditr@vmware.com> CC: Dennis Dalessandro <dennis.dalessandro@intel.com> CC: Ram Amrani <Ram.Amrani@Cavium.com> Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Ram Amrani <Ram.Amrani@cavium.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
593ff73b |
|
10-Nov-2016 |
Matan Barak <matanb@mellanox.com> |
IB/mlx4: Fix create CQ error flow Currently, if ib_copy_to_udata fails, the CQ won't be deleted from the radix tree and the HW (HW2SW). Fixes: 225c7b1feef1 ('IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
9ce28a20 |
|
22-Sep-2016 |
Leon Romanovsky <leon@kernel.org> |
IB/mlx4: Move user vendor structures This patch moves mlx4 vendor's specific structures to common UAPI folder which will be visible to all consumers. These structures are used by user-space library driver (libmlx4) and currently manually copied to that library. This move will allow cross-compile against these files and simplify introduction of vendor specific data. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
20697434 |
|
28-Aug-2016 |
Leon Romanovsky <leon@kernel.org> |
IB/mlx4: Don't return errors from poll_cq Remove returning errors from mlx4 poll_cq function. Polling CQ operation in kernel never fails by Mellanox HCA architecture and respective driver design. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
25b64fc5 |
|
28-Aug-2016 |
Leon Romanovsky <leon@kernel.org> |
Revert "IB/mlx4: Return EAGAIN for any error in mlx4_ib_poll_one" By Mellanox HW design and SW implementation, poll_cq never fails and returns errors, so all these printks are to catch ULP bugs. In case of such bug, the reverted patch will cause reentry of the function, resulting in a printk storm. This reverts commit 5412352fcd8f ("IB/mlx4: Return EAGAIN for any error in mlx4_ib_poll_one") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
5412352f |
|
27-Jul-2016 |
Yuval Shaia <yuval.shaia@oracle.com> |
IB/mlx4: Return EAGAIN for any error in mlx4_ib_poll_one Error code EAGAIN should be used when errors are temporary and next call might succeeds. When error code other than EAGAIN is returned, the caller (mlx4_ib_poll) will assume all CQE in the same bunch are error too and will drop them all. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e6a00f66 |
|
27-Jul-2016 |
Yuval Shaia <yuval.shaia@oracle.com> |
IB/mlx4: Make function use_tunnel_data return void No need to return int if function always returns 0 Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
0c87b6720 |
|
28-Jul-2016 |
Roland Dreier <roland@purestorage.com> |
IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct We allocate a small tracking structure as part of mlx4_ib_resize_cq(). However, we don't need to use GFP_ATOMIC -- immediately after the allocation, we call mlx4_cq_resize(), which allocates a command mailbox with GFP_KERNEL and then sleeps on a firmware command, so we better not be in an atomic context. This actually has a real impact, because when this GFP_ATOMIC allocation fails (and GFP_ATOMIC does fail in practice) then a userspace consumer resizing a CQ will get a spurious failure that we can easily avoid. Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
feb7c1e3 |
|
23-Dec-2015 |
Christoph Hellwig <hch@lst.de> |
IB: remove in-kernel support for memory windows Remove the unused ib_allow_mw and ib_bind_mw functions, remove the unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw into the uverbs module. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e761c67f |
|
13-Oct-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/mlx4: Remove old FRWR API support No ULP uses it anymore, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
1b2cd0fc |
|
13-Oct-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/mlx4: Support the new memory registration API Support the new memory registration API by allocating a private page list array in mlx4_ib_mr and populate it when mlx4_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR by setting the exact WQE as IB_WR_FAST_REG_MR, just take the needed information from different places: - page_size, iova, length, access flags (ib_mr) - page array (mlx4_ib_mr) - key (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
799cdaf8 |
|
09-Aug-2015 |
Ariel Nahum <arieln@mellanox.com> |
IB/mlx4: Fix incorrect cq flushing in error state When handling a device internal error, the driver is responsible to drain the completion queue with flush errors. In case a completion queue was assigned to multiple send queues, the driver iterates over the send queues and generates flush errors of inflight wqes. The driver must correctly pass the wc array with an offset as a result of the previous send queue iteration. Not doing so will overwrite previously set completions and return a wrong number of polled completions which includes ones which were not correctly set. Fixes: 35f05dabf95a (IB/mlx4: Reset flow support for IB kernel ULPs) Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e802f8e4 |
|
27-Jul-2015 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx4: Prepare VLAN macros for 802.1ad Hardware accelerated support To add Hardware accelerated support in 802.1ad vlan, replace Current VLAN macros to CVLAN. Replace: MLX4_WQE_CTRL_INS_VLAN MLX4_CQE_VLAN_PRESENT_MASK With: MLX4_WQE_CTRL_INS_CVLAN MLX4_CQE_CVLAN_PRESENT_MASK Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4b664c43 |
|
11-Jun-2015 |
Matan Barak <matanb@mellanox.com> |
IB/mlx4: Add support for CQ time-stamping This includes: * support allocation of CQ with the TIMESTAMP_COMPLETION creation flag. * add timestamp_mask and hca_core_clock to query_device, reporting the number of supported timestamp bits (mask) and the hca_core_clock frequency. * return hca core clock's offset in query_device vendor's data, this is needed in order to read the HCA's core clock. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
bcf4c1ea |
|
11-Jun-2015 |
Matan Barak <matanb@mellanox.com> |
IB/core: Change provider's API of create_cq to be extendible Add a new ib_cq_init_attr structure which contains the previous cqe (minimum number of CQ entries) and comp_vector (completion vector) in addition to a new flags field. All vendors' create_cq callbacks are changed in order to work with the new API. This commit does not change any functionality. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2 Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
8ab9406a |
|
29-Jan-2015 |
Majd Dibbiny <majd@mellanox.com> |
IB/mlx4: Bug fixes in mlx4_ib_resize_cq 1. Before the entries alignment, we need to check that the entries doesn't exceed the device's max cqe. 2. After the alignment, we need to make sure that the aligned number doesn't exceed the max cqes+1. The additional cqe is used to denote that the resizing operation has completed. 3. If the users asks to resize the CQ with entries less than the oustanding cqes we should fail instead of returning 0. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
35f05dab |
|
08-Feb-2015 |
Yishai Hadas <yishaih@mellanox.com> |
IB/mlx4: Reset flow support for IB kernel ULPs The driver exposes interfaces that directly relate to HW state. Upon fatal error, consumers of these interfaces (ULPs) that rely on completion of all their posted work-request could hang, thereby introducing dependencies in shutdown order. To prevent this from happening, we manage the relevant resources (CQs, QPs) that are used by the device. Upon a fatal error, we now generate simulated completions for outstanding WQEs that were not completed at the time the HW was reset. It includes invoking the completion event handler for all involved CQs so that the ULPs will poll those CQs. When polled we return simulated CQEs with IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and not wait forever for completions upon receiving remove_one. The above change requires an extra check in the data path to make sure that when device is in error state, the simulated CQEs will be returned and no further WQEs will be posted. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3dca0f42 |
|
11-Dec-2014 |
Matan Barak <matanb@mellanox.com> |
net/mlx4_core: Use tasklet for user-space CQ completion events Previously, we've fired all our completion callbacks straight from our ISR. Some of those callbacks were lightweight (for example, mlx4_en's and IPoIB napi callbacks), but some of them did more work (for example, the user-space RDMA stack uverbs' completion handler). Besides that, doing more than the minimal work in ISR is generally considered wrong, it could even lead to a hard lockup of the system. Since when a lot of completion events are generated by the hardware, the loop over those events could be so long, that we'll get into a hard lockup by the system watchdog. In order to avoid that, add a new way of invoking completion events callbacks. In the interrupt itself, we add the CQs which receive completion event to a per-EQ list and schedule a tasklet. In the tasklet context we loop over all the CQs in the list and invoke the user callback. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
40f2287b |
|
11-May-2014 |
Jiri Kosina <jkosina@suse.cz> |
IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO Modify the various routines used to allocate memory resources which serve QPs in mlx4 to get an input GFP directive. Have the Ethernet driver to use GFP_KERNEL in it's QP allocations as done prior to this commit, and the IB driver to use GFP_NOIO when the IB verbs IB_QP_CREATE_USE_GFP_NOIO QP creation flag is provided. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
5ea8bbfc |
|
11-Mar-2014 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
mlx4: Implement IP based gids support for RoCE/SRIOV Since there is no connection between the MAC/VLAN and the GID when using IP-based addressing, the proxy QP1 (running on the slave) must pass the source-mac, destination-mac, and vlan_id information separately from the GID. Additionally, the Host must pass the remote source-mac and vlan_id back to the slave, This is achieved as follows: Outgoing MADs: 1. Source MAC: obtained from the CQ completion structure (struct ib_wc, smac field). 2. Destination MAC: obtained from the tunnel header 3. vlan_id: obtained from the tunnel header. Incoming MADs 1. The source (i.e., remote) MAC and vlan_id are passed in the tunnel header to the proxy QP1. VST mode support: For outgoing MADs, the vlan_id obtained from the header is discarded, and the vlan_id specified by the Hypervisor is used instead. For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the "invalid" vlan (0xffff) is substituted when forwarding to the slave. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
297e0dad |
|
12-Dec-2013 |
Moni Shoua <monis@mellanox.com> |
IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing IP based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN. Therefore, we need to extract them from the CQE and place them in struct ib_wc (to be used for cases were they were taken from the gid). Also, when modifying a QP or building address handle, instead of parsing the dgid to get the MAC and VLAN, take them from the address handle attributes. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
79d3da9c |
|
31-Oct-2013 |
Eli Cohen <eli@dev.mellanox.co.il> |
IB/mlx4: Fix device max capabilities check Move the check on max supported CQEs after the final number of entries is evaluated. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
93b80ac2 |
|
31-Oct-2013 |
Eli Cohen <eli@dev.mellanox.co.il> |
IB/mlx4: Fix endless loop in resize CQ When calling get_sw_cqe() we need pass the consumer_index and not the masked value. Failure to do so will cause incorrect result of get_sw_cqe() possibly leading to endless loop. This problem was reported and analyzed by Michael Rice from HP. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
ec693d47 |
|
23-Apr-2013 |
Amir Vadai <amirv@mellanox.com> |
net/mlx4_en: Add HW timestamping (TS) support The patch allows to enable/disable HW timestamping for incoming and/or outgoing packets. It adds and initializes all structs and callbacks needed by kernel TS API. To enable/disable HW timestamping appropriate ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and HWTSAMP_TX_ON/OFF only are supported. When enabling TS on receive flow - VLAN stripping will be disabled. Also were made all relevant changes in RX/TX flows to consider TS request and plant HW timestamps into relevant structures. mlx4_ib was fixed to compile with new mlx4_cq_alloc() signature. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f3cca4b1 |
|
10-Apr-2013 |
Shlomo Pongratz <shlomop@mellanox.com> |
IB/mlx4: Fetch XRC SRQ in the CQ polling code An XRC target QP may redirect to more than one XRC SRQ. This means that for work completions associated with a XRC TGT QP, the srq field in the QP has no usage and the real XRC SRQ need to be retrived using the information from the XRCETH placed into the CQE, do that. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
08ff3235 |
|
21-Oct-2012 |
Or Gerlitz <ogerlitz@mellanox.com> |
mlx4: 64-byte CQE/EQE support ConnectX-3 devices can use either 64- or 32-byte completion queue entries (CQEs) and event queue entries (EQEs). Using 64-byte EQEs/CQEs performs better because each entry is aligned to a complete cacheline. This patch queries the HCA's capabilities, and if it supports 64-byte CQEs and EQES the driver will configure the HW to work in 64-byte mode. The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ. Since this mode is global, userspace (libmlx4) must be updated to work with the configured CQE size, and guests using SR-IOV virtual functions need to know both EQE and CQE size. In case one of the 64-byte CQE/EQE capabilities is activated, the patch makes sure that older guest drivers that use the QUERY_DEV_FUNC command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that they need an update to be able to work with the PPF. This is done by changing the returned pf_context_behaviour not to be zero any more. In case none of these capabilities is activated that value remains zero and older guest drivers can run OK. The SRIOV related flow is as follows 1. the PPF does the detection of the new capabilities using QUERY_DEV_CAP command. 2. the PPF activates the new capabilities using INIT_HCA. 3. the VF detects if the PPF activated the capabilities using QUERY_HCA, and if this is the case activates them for itself too. Note that the VF detects that it must be aware to the new PF behaviour using QUERY_FUNC_CAP. Steps 1 and 2 apply also for native mode. User space notification is done through a new field introduced in struct mlx4_ib_ucontext which holds device capabilities for which user space must take action. This changes the binary interface so the ABI towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only when **needed** i.e. only when the driver does use 64-byte CQEs or future device capabilities which must be in sync by user space. This practice allows to work with unmodified libmlx4 on older devices (e.g A0, B0) which don't support 64-byte CQEs. In order to keep existing systems functional when they update to a newer kernel that contains these changes in VF and userspace ABI, a module parameter enable_64b_cqe_eqe must be set to enable 64-byte mode; the default is currently false. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
1ffeb2eb |
|
03-Aug-2012 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support 1. Introduce the basic SR-IOV parvirtualization context objects for multiplexing and demultiplexing MADs. 2. Introduce support for the new proxy and tunnel QP types. This patch introduces the objects required by the master for managing QP paravirtualization for guests. struct mlx4_ib_sriov is created by the master only. It is a container for the following: 1. All the info required by the PPF to multiplex and de-multiplex MADs (including those from the PF). (struct mlx4_ib_demux_ctx demux) 2. All the info required to manage alias GUIDs (i.e., the GUID at index 0 that each guest perceives. In fact, this is not the GUID which is actually at index 0, but is, in fact, the GUID which is at index[<VF number>] in the physical table. 3. structures which are used to manage CM paravirtualization 4. structures for managing the real special QPs when running in SR-IOV mode. The real SQPs are controlled by the PPF in this case. All SQPs created and controlled by the ib core layer are proxy SQP. struct mlx4_ib_demux_ctx contains the information per port needed to manage paravirtualization: 1. All multicast paravirt info 2. All tunnel-qp paravirt info for the port. 3. GUID-table and GUID-prefix for the port 4. work queues. struct mlx4_ib_demux_pv_ctx contains all the info for managing the paravirtualized QPs for one slave/port. struct mlx4_ib_demux_pv_qp contains the info need to run an individual QP (either tunnel qp or real SQP). Note: We made use of the 2 most significant bits in enum mlx4_ib_qp_flags (based on enum ib_qp_create_flags in ib_verbs.h). We need these bits in the low-level driver for internal purposes. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
e605b743 |
|
29-Apr-2012 |
Shlomo Pongratz <shlomop@mellanox.com> |
IB/mlx4: Increase the number of vectors (EQs) available for ULPs Enable IB ULPs to use a larger portion of the device EQs (which map to IRQs). The mlx4_ib driver follows the mlx4_core framework of the EQs to be divided among the device ports. In this scheme, for each IB port, the number of allocated EQs follows the number of cores, subject to other system constraints, such as number available MSI-X vectors. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
987c8f8f |
|
29-Apr-2012 |
Shlomo Pongratz <shlomop@mellanox.com> |
IB/mlx4: Replace printk(KERN_yyy...) with pr_yyy(...) Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> [ Replace one more printk_once() with pr_info_once(). - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
3616f9ce |
|
06-Mar-2012 |
Eli Cohen <eli@mellanox.com> |
IB/mlx4: Fix possible missed completion event If an erroneous CQE is polled in the first iteration (i.e. npolled == 0), we don't update the consumer index and hence the hardware could get a wrong notion of how many CQEs software polled. Fix this by unconditionally updating the doorbell record. We could change the check to be something like if (npolled || err != -EAGAIN) ... but it does not seem worth the effort since a posted write to memory should not cost too much. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
d927d505 |
|
11-Jan-2012 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB: Change CQE "csum_ok" field to a bit flag Use a bit in wc_flags rather then a whole integer to hold the "checksum OK" flag. By itself, this change doesn't reduce the size of struct ib_wc on 64bit machines -- it stays on 56 bytes because of padding. However, it will allow to add more fields in the future without enlarging the struct. Also, it will let us have a unified approach with future libibverbs checksum offload reporting, because a bit flag doesn't break the library ABI. This patch was suggested during conversation with Liran Liss <liranl@mellanox.com>. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
9106c410 |
|
11-Dec-2011 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE For IBoE, SLs 0-7 are mapped to Ethernet 802.1Q user priority bits (pbits) which are part of the VLAN tag, SLs 8-15 are reserved. Under Ethernet, the ConnectX firmware treats (decode/encode) the four bit SL field in various constructs such as QPC / UD WQE / CQE as PPP0 and not as 0PPP. This correlates well to the fact that within the vlan tag the pbits are located in bits 15-13 and not 12-14. The current code wasn't consistent around that area - the encoding was correct for the IBoE QPC.path.schedule_queue field, but was wrong for IBoE CQEs and when MLX header was built. These inconsistencies resulted in wrong SL <--> wire 802.1Q pbits mapping, which is fixed by using SL <--> PPP0 all around the place. Signed-off-by: Oren Duer <oren@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
3afa9f19 |
|
10-Jan-2011 |
Vladimir Sokolovsky <vlad@dev.mellanox.co.il> |
IB/mlx4: Don't call dma_free_coherent() with irqs disabled mlx4_ib_free_cq_buf() should not be called under spin_lock_irq() since it calls dma_free_coherent(), which needs irqs enabled. Fix this by deferring the free to outside the locked region. This was found due to the WARN_ON(irqs_disabled()); in swiotlb_free_coherent(). Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
6fa8f719 |
|
14-Apr-2010 |
Vladimir Sokolovsky <vlad@mellanox.co.il> |
IB/mlx4: Add support for masked atomic operations Add support for masked atomic operations (masked compare and swap, masked fetch and add). Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
025dfdaf |
|
16-Oct-2008 |
Frederik Schwarzer <schwarzerf@gmail.com> |
trivial: fix then -> than typos in comments and documentation - (better, more, bigger ...) then -> (...) than Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
f781a22f |
|
30-Dec-2008 |
Roland Dreier <rolandd@cisco.com> |
IB/mlx4: Fix reading SL field out of cqe->sl_vid Commit f780a9f1 ("mlx4_core: Add ethernet fields to CQE struct") introduced a bug in how wc->sl is set in mlx4_ib_poll_one() -- since cqe->sl_vid is a big-endian value, the shift must be done after converting to host endianness. This bug was found using sparse endianness checking. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
7798dbf4 |
|
24-Dec-2008 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Set ownership bit correctly when copying CQEs during CQ resize When resizing a CQ, when copying over unpolled CQEs from the old CQE buffer to the new buffer, the ownership bit must be set appropriately for the new buffer, or the ownership bit in the new buffer gets corrupted. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
b8dd786f |
|
22-Dec-2008 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
mlx4_core: Add support for multiple completion event vectors When using MSI-X mode, create a completion event queue for each CPU. Report the number of completion EQs in a new struct mlx4_caps member, num_comp_vectors, and extend the mlx4_cq_alloc() interface with a vector parameter so that consumers can specify which completion EQ should be used to report events for the CQ being created. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
42ab01c3 |
|
01-Dec-2008 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Fix MTT leakage in resize CQ When resizing a CQ, MTTs associated with the old CQE buffer were not freed. As a result, if any app used resize CQ repeatedly, all MTTs were eventually exhausted, which led to all memory registration operations failing until the driver is reloaded. Once the RESIZE_CQ command returns successfully from FW, FW no longer accesses the old CQ buffer, so it is safe to deallocate the MTT entries used by the old CQ buffer. Finally, if the RESIZE_CQ command fails, the MTTs allocated for the new CQEs buffer also need to be de-allocated. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1416>. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
f780a9f1 |
|
06-Aug-2008 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
mlx4_core: Add ethernet fields to CQE struct Add ethernet-related fields to struct mlx4_cqe so that the mlx4_en ethernet NIC driver can share the same definition. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
51a379d0 |
|
25-Jul-2008 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
mlx4: Update/add Mellanox Technologies copyright lines to mlx4 driver files Update existing Mellanox copyright lines to 2008, and add such lines to files where they are missing. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
95d04f07 |
|
23-Jul-2008 |
Roland Dreier <rolandd@cisco.com> |
IB/mlx4: Add support for memory management extensions and local DMA L_Key Add support for the following operations to mlx4 when device firmware supports them: - Send with invalidate and local invalidate send queue work requests; - Allocate/free fast register MRs; - Allocate/free fast register MR page lists; - Fast register MR send queue work requests; - Local DMA L_Key. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
00f7ec36 |
|
15-Jul-2008 |
Steve Wise <larrystevenwise@gmail.com> |
RDMA/core: Add memory management extensions support This patch adds support for the IB "base memory management extension" (BMME) and the equivalent iWARP operations (which the iWARP verbs mandates all devices must implement). The new operations are: - Allocate an ib_mr for use in fast register work requests. - Allocate/free a physical buffer lists for use in fast register work requests. This allows device drivers to allocate this memory as needed for use in posting send requests (eg via dma_alloc_coherent). - New send queue work requests: * send with remote invalidate * fast register memory region * local invalidate memory region * RDMA read with invalidate local memory region (iWARP only) Consumer interface details: - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added to indicate device support for these features. - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV, IB_WR_RDMA_READ_WITH_INV are added. - A new consumer API function, ib_alloc_mr() is added to allocate fast register memory regions. - New consumer API functions, ib_alloc_fast_reg_page_list() and ib_free_fast_reg_page_list() are added to allocate and free device-specific memory for fast registration page lists. - A new consumer API function, ib_update_fast_reg_key(), is added to allow the key portion of the R_Key and L_Key of a fast registration MR to be updated. Consumers call this if desired before posting a IB_WR_FAST_REG_MR work request. Consumers can use this as follows: - MR is allocated with ib_alloc_mr(). - Page list memory is allocated with ib_alloc_fast_reg_page_list(). - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key(). - MR made VALID and bound to a specific page list via ib_post_send(IB_WR_FAST_REG_MR) - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV), ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with invalidate operation. - MR is deallocated with ib_dereg_mr() - page lists dealloced via ib_free_fast_reg_page_list(). Applications can allocate a fast register MR once, and then can repeatedly bind the MR to different physical block lists (PBLs) via posting work requests to a send queue (SQ). For each outstanding MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be allocated (the fast_reg_page_list is owned by the low-level driver from the consumer posting a work request until the request completes). Thus pipelining can be achieved while still allowing device-specific page_list processing. The 32-bit fast register memory key/STag is composed of a 24-bit index and an 8-bit key. The application can change the key each time it fast registers thus allowing more control over the peer's use of the key/STag (ie it can effectively be changed each time the rkey is rebound to a page list). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
3ae15e16 |
|
30-Apr-2008 |
Roland Dreier <rolandd@cisco.com> |
IB/mlx4: Fix off-by-one errors in calls to mlx4_ib_free_cq_buf() When I merged bbf8eed1 ("IB/mlx4: Add support for resizing CQs") I changed things around so that mlx4_ib_alloc_cq_buf() and mlx4_ib_free_cq_buf() were used everywhere they could be. However, I screwed up the number of entries passed into mlx4_ib_alloc_cq_buf() in a couple places -- the function bumps the number of entries internally, so the caller shouldn't add 1 as well. Passing a too-big value for the number of entries to mlx4_ib_free_cq_buf() can cause the cleanup to go off the end of an array and corrupt allocator state in interesting ways. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
e463c7b1 |
|
29-Apr-2008 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
mlx4_core: Add a way to set the "collapsed" CQ flag Extend the mlx4_cq_resize() API with a way to set the "collapsed" flag for the CQ being created. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
cb9fbc5c |
|
29-Apr-2008 |
Arthur Kepner <akepner@sgi.com> |
IB: expand ib_umem_get() prototype Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1 when mapping user-allocated CQs with ib_umem_get(). Signed-off-by: Arthur Kepner <akepner@sgi.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Jes Sorensen <jes@sgi.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6296883c |
|
23-Apr-2008 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
mlx4_core: Move kernel doorbell management into core In addition to mlx4_ib, there will be ethernet and FC consumers of mlx4_core, so move the code for managing kernel doorbells into the core module to avoid having to duplicate this multiple times. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
bbf8eed1 |
|
16-Apr-2008 |
Vladimir Sokolovsky <vlad@dev.mellanox.co.il> |
IB/mlx4: Add support for resizing CQs Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
3fdcb97f |
|
16-Apr-2008 |
Eli Cohen <eli@dev.mellanox.co.il> |
IB/mlx4: Add support for modifying CQ moderation parameters Signed-off-by: Eli Cohen <eli@mellnaox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
b832be1e |
|
16-Apr-2008 |
Eli Cohen <eli@dev.mellanox.co.il> |
IB/mlx4: Add IPoIB LSO support Add TSO support to the mlx4_ib driver. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
8ff095ec |
|
16-Apr-2008 |
Eli Cohen <eli@dev.mellanox.co.il> |
IB/mlx4: Add IPoIB checksum offload support ConnectX devices support checksum generation and verification of TCP and UDP packets for UD IPoIB messages. This patch checks if the HCA supports this and sets the IB_DEVICE_UD_IP_CSUM capability flag if it does. It implements support for handling the IB_SEND_IP_CSUM send flag and setting the csum_ok field in receive work completions. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Ali Ayub <ali@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
ea54b10c |
|
28-Jan-2008 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs ConnectX HCA supports shrinking WQEs, so that a single work request can be made of multiple units of wqe_shift. This way, WRs can differ in size, and do not have to be a power of 2 in size, saving memory and speeding up send WR posting. Unfortunately, if we do this then the wqe_index field in CQEs can't be used to look up the WR ID anymore, so our implementation does this only if selective signaling is off. Further, on 32-bit platforms, we can't use vmap() to make the QP buffer virtually contigious. Thus we have to use constant-sized WRs to make sure a WR is always fully within a single page-sized chunk. Finally, we use WRs with the NOP opcode to avoid wrapping around the queue buffer in the middle of posting a WR, and we set the NoErrorCompletion bit to avoid getting completions with error for NOP WRs. However, NEC is only supported starting with firmware 2.2.232, so we use constant-sized WRs for older firmware. And, since MLX QPs only support SEND, we use constant-sized WRs in this case. When stamping during NOP posting, do stamping following setting of the NOP WQE valid bit. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
1c69fc2a |
|
06-Feb-2008 |
Roland Dreier <rolandd@cisco.com> |
IB/mlx4: Consolidate code to get an entry from a struct mlx4_buf We use struct mlx4_buf for kernel QP, CQ and SRQ buffers, and the code to look up an entry is duplicated in get_cqe_from_buf() and the QP and SRQ versions of get_wqe(). Factor this out into mlx4_buf_offset(). This will also make it easier to switch over to using vmap() for buffers. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
b3226184 |
|
25-Jan-2008 |
Roland Dreier <rolandd@cisco.com> |
IB/mlx4: Micro-optimize mlx4_ib_poll_one() Rather than byte-swapping cqe->g_mlpath_rqpn each time we extract a field from it, byte-swap it once into a temporary variable. This results in smaller, better code -- eg, on 32-bit x86: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5) function old new delta mlx4_ib_poll_cq 1188 1183 -5 Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
e1bb7843 |
|
07-Jan-2008 |
Dotan Barak <dotanb@dev.mellanox.co.il> |
IB/mlx4: Fix value of pkey_index in QP1 completions Fix the value of pkey_index in completions to get a valid value for GSI QPs. Without this fix, incoming GSI packets on port 2 get an invalid P_Key index in the completion, which prevents the MAD layer from sending back a response, which can make the second port of ConnectX HCAs completely useless. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
19891915 |
|
03-Aug-2007 |
Vu Pham <vu@mellanox.com> |
IB/mlx4: Fix opcode returned in RDMA read completion Current code has a cut-and-paste error and returns IB_WC_SEND when it should return IB_WC_RDMA_READ. Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
082dee32 |
|
18-Jun-2007 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Handle buffer wraparound in __mlx4_ib_cq_clean() When compacting CQ entries, we need to set the correct value of the ownership bit in case the value is different between the index we copy the CQE from and the index we copy it to. Found by Ronni Zimmerman of Mellanox. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
0e6e7416 |
|
18-Jun-2007 |
Roland Dreier <rolandd@cisco.com> |
IB/mlx4: Handle new FW requirement for send request prefetching New ConnectX firmware introduces FW command interface revision 2, which requires that for each QP, a chunk of send queue entries (the "headroom") is kept marked as invalid, so that the HCA doesn't get confused if it prefetches entries that haven't been posted yet. Add code to the driver to do this, and also update the user ABI so that userspace can request that the prefetcher be turned off for userspace QPs (we just leave the prefetcher on for all kernel QPs). Unfortunately, marking send queue entries this way is confuses older firmware, so we change the driver to allow only FW command interface revisions 2. This means that users will have to update their firmware to work with the new driver, but the firmware is changing quickly and the old firmware has lots of other bugs anyway, so this shouldn't be too big a deal. Based on a patch from Jack Morgenstein <jackm@dev.mellanox.co.il>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
614c3c85 |
|
12-Jun-2007 |
Roland Dreier <rolandd@cisco.com> |
IB/mlx4: Fix handling of wq->tail for send completions Cast the increment added to wq->tail when send completions are processed to u16 to avoid using wrong values caused by standard integer promotions. The same bug was fixed in libmlx4 by Eli Cohen <eli@mellanox.co.il>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
225c7b1f |
|
08-May-2007 |
Roland Dreier <rolandd@cisco.com> |
IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters Add an InfiniBand driver for Mellanox ConnectX adapters. Because these adapters can also be used as ethernet NICs and Fibre Channel HBAs, the driver is split into two modules: mlx4_core: Handles low-level things like device initialization and processing firmware commands. Also controls resource allocation so that the InfiniBand, ethernet and FC functions can share a device without stepping on each other. mlx4_ib: Handles InfiniBand-specific things; plugs into the InfiniBand midlayer. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|