#
73d68002 |
|
21-Aug-2023 |
Petr Pavlu <petr.pavlu@suse.com> |
mlx4: Replace the mlx4_interface.event callback with a notifier Use a notifier to implement mlx4_dispatch_event() in preparation to switch mlx4_en and mlx4_ib to be an auxiliary device. A problem is that if the mlx4_interface.event callback was replaced with something as mlx4_adrv.event then the implementation of mlx4_dispatch_event() would need to acquire a lock on a given device before executing this callback. That is necessary because otherwise there is no guarantee that the associated driver cannot get unbound when the callback is running. However, taking this lock is not possible because mlx4_dispatch_event() can be invoked from the hardirq context. Using an atomic notifier allows the driver to accurately record when it wants to receive these events and solves this problem. A handler registration is done by both mlx4_en and mlx4_ib at the end of their mlx4_interface.add callback. This matches the current situation when mlx4_add_device() would enable events for a given device immediately after this callback, by adding the device on the mlx4_priv.list. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
312b8f79 |
|
04-Jan-2023 |
Mark Zhang <markzhang@nvidia.com> |
RDMA/mlx: Calling qp event handler in workqueue context Move the call of qp event handler from atomic to workqueue context, so that the handler is able to block. This is needed by following patches. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/0cd17b8331e445f03942f4bb28d447f24ac5669d.1672821186.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
9cf62d91 |
|
06-May-2022 |
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> |
RDMA/mlx4: Avoid flush_scheduled_work() usage Flushing system-wide workqueues is dangerous and will be forbidden. Replace system_wq with local cm_wq. Link: https://lore.kernel.org/r/22f7183b-cc16-5a34-e879-7605f5efc6e6@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
13f30b0f |
|
08-Oct-2021 |
Aharon Landau <aharonl@nvidia.com> |
RDMA/counter: Add a descriptor in struct rdma_hw_stats Add a counter statistic descriptor structure in rdma_hw_stats. In addition to the counter name, more meta-information will be added. This code extension is needed for optional-counter support in the following patches. Link: https://lore.kernel.org/r/20211008122439.166063-4-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
514aee66 |
|
23-Jul-2021 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Globally allocate and release QP memory Convert QP object to follow IB/core general allocation scheme. That change allows us to make sure that restrack properly kref the memory. Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com Reviewed-by: Gal Pressman <galpress@amazon.com> #efa Tested-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
1fb7f897 |
|
01-Mar-2021 |
Mark Bloch <mbloch@nvidia.com> |
RDMA: Support more than 255 rdma ports Current code uses many different types when dealing with a port of a RDMA device: u8, unsigned int and u32. Switch to u32 to clean up the logic. This allows us to make (at least) the core view consistent and use the same type. Unfortunately not all places can be converted. Many uverbs functions expect port to be u8 so keep those places in order not to break UAPIs. HW/Spec defined values must also not be changed. With the switch to u32 we now can support devices with more than 255 ports. U32_MAX is reserved to make control logic a bit easier to deal with. As a device with U32_MAX ports probably isn't going to happen any time soon this seems like a non issue. When a device with more than 255 ports is created uverbs will report the RDMA device as having 255 ports as this is the max currently supported. The verbs interface is not changed yet because the IBTA spec limits the port size in too many places to be u8 and all applications that relies in verbs won't be able to cope with this change. At this stage, we are extending the interfaces that are using vendor channel solely Once the limitation is lifted mlx5 in switchdev mode will be able to have thousands of SFs created by the device. As the only instance of an RDMA device that reports more than 255 ports will be a representor device and it exposes itself as a RAW Ethernet only device CM/MAD/IPoIB and other ULPs aren't effected by this change and their sysfs/interfaces that are exposes to userspace can remain unchanged. While here cleanup some alignment issues and remove unneeded sanity checks (mainly in rdmavt), Link: https://lore.kernel.org/r/20210301070420.439400-1-leon@kernel.org Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
6e0954b1 |
|
30-Nov-2020 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr mlx5 has an ugly flow where it tries to allocate a new MR and replace the existing MR in the same memory during rereg. This is very complicated and buggy. Instead of trying to replace in-place inside the driver, provide support from uverbs to change the entire HW object assigned to a handle during rereg_mr. Since destroying a MR is allowed to fail (ie if a MW is pointing at it) and can't be detected in advance, the algorithm creates a completely new uobject to hold the new MR and swaps the IDR entries of the two objects. The old MR in the temporary IDR entry is destroyed, and if it fails rereg_mr succeeds and destruction is deferred to FD release. This complexity is why this cannot live in a driver safely. Link: https://lore.kernel.org/r/20201130075839.278575-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
bf6a4764 |
|
09-Oct-2020 |
Håkon Bugge <haakon.bugge@oracle.com> |
IB/mlx4: Convert rej_tmout radix-tree to XArray Was missed during the initial review of the below patch Fixes: 227a0e142e37 ("IB/mlx4: Add support for REJ due to timeout") Link: https://lore.kernel.org/r/1602253482-6718-1-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
915ec7ed |
|
26-Sep-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA/mlx4: Embed GSI QP into general mlx4_ib QP Refactor the storage struct of mlx4 GSI QP to be embedded in mlx4_ib QP. This allows to remove internal memory allocation of QP struct which is hidden inside the mlx4_ib_create_qp() flow. Link: https://lore.kernel.org/r/20200926102450.2966017-6-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
c0a6b5ec |
|
02-Sep-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Convert RWQ table logic to ib_core allocation scheme Move struct ib_rwq_ind_table allocation to ib_core. Link: https://lore.kernel.org/r/20200902081623.746359-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
d18bb3e1 |
|
02-Sep-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Clean MW allocation and free flows Move allocation and destruction of memory windows under ib_core responsibility and clean drivers to ensure that no updates to MW ib_core structures are done in driver layer. Link: https://lore.kernel.org/r/20200902081623.746359-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
add53535 |
|
07-Sep-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Restore ability to return error for destroy WQ Make this interface symmetrical to other destroy paths. Fixes: a49b1dc7ae44 ("RDMA: Convert destroy_wq to be void") Link: https://lore.kernel.org/r/20200907120921.476363-9-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
43d781b9 |
|
07-Sep-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Allow fail of destroy CQ Like any other verbs objects, CQ shouldn't fail during destroy, but mlx5_ib didn't follow this contract with mixed IB verbs objects with DEVX. Such mix causes to the situation where FW and kernel are fully interdependent on the reference counting of each side. Kernel verbs and drivers that don't have DEVX flows shouldn't fail. Fixes: e39afe3d6dbd ("RDMA: Convert CQ allocations to be under core responsibility") Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
119181d1 |
|
07-Sep-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Restore ability to fail on SRQ destroy In similar way to other IB objects, restore the ability to return error on SRQ destroy. Strictly speaking, this change is not necessary, and provided here to ensure a symmetrical interface like other destroy functions. Fixes: 68e326dea1db ("RDMA: Handle SRQ allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
9a9ebf8c |
|
07-Sep-2020 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Restore ability to fail on AH destroy Like any other IB verbs objects, AH are refcounted by ib_core. The release of those objects are controlled by ib_core with promise that AH destroy can't fail. Being SW object for now, this change makes dealloc_ah() to behave like any other destroy IB flows. Fixes: d345691471b4 ("RDMA: Handle AH allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
227a0e14 |
|
03-Aug-2020 |
Håkon Bugge <haakon.bugge@oracle.com> |
IB/mlx4: Add support for REJ due to timeout A CM REJ packet with its reason equal to timeout is a special beast in the sense that it doesn't have a Remote Communication ID nor does it have a Remote Port GID. Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. This proxying doesn't not handle said REJ packets. If the active side abandons its connection attempt after having sent a REQ, it will send a REJ with the reason being timeout. This example can be provoked by a simple user-verbs program, which ends up doing: rdma_connect(cm_id, &conn_param); rdma_destroy_id(cm_id); using the async librdmacm API. Having dynamic debug prints enabled in the mlx4_ib driver, we will then see: mlx4_ib_demux_cm_handler: Couldn't find an entry for pv_cm_id 0x0, attr_id 0x12 The solution is to introduce a radix-tree. When a REQ packet is received and handled in mlx4_ib_demux_cm_handler(), we know the connecting peer's para-virtual cm_id and the destination slave. We then insert an entry into the tree with said information. We also schedule work to remove this entry from the tree and free it, in order to avoid memory leak. When a REJ packet with reason timeout is received, we can look up the slave in the tree, and deliver the packet to the correct slave. When a duplicate REQ packet is received, the entry is in the tree. In this case, we adjust the delayed work in order to avoid a too premature eviction of the entry. When cleaning up, we simply traverse the tree and modify any delayed work to use a zero delay. A subsequent flush of the system_wq will ensure all entries being wiped out. Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200803061941.1139994-6-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
7fd1507d |
|
03-Aug-2020 |
Håkon Bugge <haakon.bugge@oracle.com> |
IB/mlx4: Fix starvation in paravirt mux/demux The mlx4 driver will proxy MAD packets through the PF driver. A VM or an instantiated VF will send its MAD packets to the PF driver using loop-back. The PF driver will be informed by an interrupt, but defer the handling and polling of CQEs to a worker thread running on an ordered work-queue. Consider the following scenario: the VMs will in short proximity in time, for example due to a network event, send many MAD packets to the PF driver. Lets say there are K VMs, each sending N packets. The interrupt from the first VM will start the worker thread, which will poll N CQEs. A common case here is where the PF driver will multiplex the packets received from the VMs out on the wire QP. But before the wire QP has returned a send CQE and associated interrupt, the other K - 1 VMs have sent their N packets as well. The PF driver has to multiplex K * N packets out on the wire QP. But the send-queue on the wire QP has a finite capacity. So, in this scenario, if K * N is larger than the send-queue capacity of the wire QP, we will get MAD packets dropped on the floor with this dynamic debug message: mlx4_ib_multiplex_mad: failed sending GSI to wire on behalf of slave 2 (-11) and this despite the fact that the wire send-queue could have capacity, but the PF driver isn't aware, because the wire send CQEs have not yet been polled. We can also have a similar scenario inbound, with a wire recv-queue larger than the tunnel QP's send-queue. If many remote peers send MAD packets to the very same VM, the tunnel send-queue destined to the VM could allegedly be construed to be full by the PF driver. This starvation is fixed by introducing separate work queues for the wire QPs vs. the tunnel QPs. With this fix, using a dual ported HCA, 8 VFs instantiated, we could run cmtime on each of the 18 interfaces towards a similar configured peer, each cmtime instance with 800 QPs (all in all 14400 QPs) without a single CM packet getting lost. Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200803061941.1139994-5-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
0ae207fb |
|
03-Aug-2020 |
Håkon Bugge <haakon.bugge@oracle.com> |
IB/mlx4: Separate tunnel and wire bufs parameters Using CX-3 in virtualized mode, MAD packets are proxied through the PF driver. The feed is N tunnel QPs, and what is received from the VFs is multiplexed out on the wire QP. Since this is a many-to-one scenario, it is better to have separate initialization parameters for the two usages. The number of wire and tunnel bufs are yanked up to 2K and 512 respectively. With this set of parameters, a system consisting of eight physical servers, each with eight VMs and 14 I/O servers (BM), can run switch fail-over without seeing: mlx4_ib_demux_mad: failed sending GSI to slave 3 via tunnel qp (-11) or mlx4_ib_multiplex_mad: failed sending GSI to wire on behalf of slave 2 (-11) Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200803061941.1139994-4-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
42a3b153 |
|
06-Jul-2020 |
Gal Pressman <galpress@amazon.com> |
RDMA: Remove the udata parameter from alloc_mr callback Allocating an MR flow can only be initiated by kernel users, and not from userspace so a udata parameter is redundant. Link: https://lore.kernel.org/r/20200706120343.10816-4-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
#
1f55b7ab |
|
28-May-2020 |
Max Gurtovoy <maxg@mellanox.com> |
RDMA/mlx4: Remove FMR support for memory registration HCA's that are driven by mlx4 driver support FRWR method to register memory. Remove the ancient and unsafe FMR method. Link: https://lore.kernel.org/r/8-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
fa5d010c |
|
30-Apr-2020 |
Maor Gottlieb <maorg@mellanox.com> |
RDMA: Group create AH arguments in struct Following patch adds additional argument to the create AH function, so it make sense to group ah_attr and flags arguments in struct. Link: https://lore.kernel.org/r/20200430192146.12863-13-maorg@mellanox.com Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Acked-by: Devesh Sharma <devesh.sharma@broadcom.com> Acked-by: Gal Pressman <galpress@amazon.com> Acked-by: Weihang Li <liweihang@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
ff3195b3 |
|
15-Nov-2019 |
Danit Goldberg <danitg@mellanox.com> |
IB/mlx4: Update HW GID table while adding vlan GID When adding a new GID compare the vlan along with the GID and type. This allows vlan's to have GIDs that alias each other, such as the default GID. Otherwise they the GID cache view can become inconsistent with the HW view. Link: https://lore.kernel.org/r/20191115154457.247763-1-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
e26e7b88 |
|
29-Oct-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Change MAD processing function to remove extra casting and parameter All users of process_mad() converts input pointers from ib_mad_hdr to be ib_mad, update the function declaration to use ib_mad directly. Also remove not used input MAD size parameter. Link: https://lore.kernel.org/r/20191029062745.7932-17-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-By: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
a49b1dc7 |
|
12-Jun-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Convert destroy_wq to be void All callers of destroy WQ are always success and there is no need to check their return value, so convert destroy_wq to be void. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e39afe3d |
|
28-May-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Convert CQ allocations to be under core responsibility Ensure that CQ is allocated and freed by IB/core and not by drivers. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
a52c8e24 |
|
28-May-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Clean destroy CQ in drivers do not return errors Like all other destroy commands, .destroy_cq() call is not supposed to fail. In all flows, the attempt to return earlier caused to memory leaks. This patch converts .destroy_cq() to do not return any errors. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
68e326de |
|
03-Apr-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Handle SRQ allocations by IB/core Convert SRQ allocation from drivers to be in the IB/core Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
d3456914 |
|
03-Apr-2019 |
Leon Romanovsky <leon@kernel.org> |
RDMA: Handle AH allocations by IB/core Simplify drivers by ensuring lifetime of ib_ah object. The changes in .create_ah() go hand in hand with relevant update in .destroy_ah(). We will use this opportunity and convert .destroy_ah() to don't fail, as it was suggested a long time ago, because there is nothing to do in case of failure during destroy. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
ff23dfa1 |
|
31-Mar-2019 |
Shamir Rabinovitch <shamir.rabinovitch@oracle.com> |
IB: Pass only ib_udata in function prototypes Now when ib_udata is passed to all the driver's object create/destroy APIs the ib_udata will carry the ib_ucontext for every user command. There is no need to also pass the ib_ucontext via the functions prototypes. Make ib_udata the only argument psssed. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
c4367a26 |
|
31-Mar-2019 |
Shamir Rabinovitch <shamir.rabinovitch@oracle.com> |
IB: Pass uverbs_attr_bundle down ib_x destroy path The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x destroy path as ib_udata. The next patch will use the ib_udata to free the drivers destroy path from the dependency in 'uobject->context' as we already did for the create path. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
f1430536 |
|
20-Feb-2019 |
Matthew Wilcox <willy@infradead.org> |
mlx4: Convert pv_id_table to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
b0ea0fa5 |
|
09-Jan-2019 |
Jason Gunthorpe <jgg@ziepe.ca> |
IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udata ib_umem_get() can only be called in a method callback, which always has a udata parameter. This allows ib_umem_get() to derive the ucontext pointer directly from the udata without requiring the drivers to find it in some way or another. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
|
#
2553ba21 |
|
12-Dec-2018 |
Gal Pressman <galpress@amazon.com> |
RDMA: Mark if destroy address handle is in a sleepable context Introduce a 'flags' field to destroy address handle callback and add a flag that marks whether the callback is executed in an atomic context or not. This will allow drivers to wait for completion instead of polling for it when it is allowed. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
b090c4e3 |
|
12-Dec-2018 |
Gal Pressman <galpress@amazon.com> |
RDMA: Mark if create address handle is in a sleepable context Introduce a 'flags' field to create address handle callback and add a flag that marks whether the callback is executed in an atomic context or not. This will allow drivers to wait for completion instead of polling for it when it is allowed. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
fc6526fb |
|
11-Nov-2018 |
Ursula Braun <ubraun@linux.ibm.com> |
mlx4: trigger IB events needed by SMC The mlx4 driver does not trigger an IB_EVENT_PORT_ACTIVE when the RoCE network interface is activated. When SMC determines the RoCE device port to be used, it checks the port states. This patch triggers IB events for NETDEV_UP and NETDEV_DOWN. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
c282da41 |
|
16-Sep-2018 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/mlx4: Use rdma_user_mmap_io Rely on the new core code helper to map BAR memory from the driver. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
f95ccffc |
|
26-Jul-2018 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Use 4K pages for kernel QP's WQE buffer In the current implementation, the driver tries to allocate contiguous memory, and if it fails, it falls back to 4K fragmented allocation. Once the memory is fragmented, the first allocation might take a lot of time, and even fail, which can cause connection failures. This patch changes the logic to always allocate with 4K granularity, since it's more robust and more likely to succeed. This patch was tested with Lustre and no performance degradation was observed. Note: This commit eliminates the "shrinking WQE" feature. This feature depended on using vmap to create a virtually contiguous send WQ. vmap use was abandoned due to problems with several processors (see the commit cited below). As a result, shrinking WQE was available only with physically contiguous send WQs. Allocating such send WQs caused the problems described above. Therefore, as a side effect of eliminating the use of large physically contiguous send WQs, the shrinking WQE feature became unavailable. Warning example: worker/20:1: page allocation failure: order:8, mode:0x80d0 CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G OE ------------ Workqueue: ib_cm cm_work_handler [ib_cm] Call Trace: [<ffffffff81686d81>] dump_stack+0x19/0x1b [<ffffffff81186160>] warn_alloc_failed+0x110/0x180 [<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0 [<ffffffff811ce868>] alloc_pages_current+0x98/0x110 [<ffffffff81184fae>] __get_free_pages+0xe/0x50 [<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150 [<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50 [<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core] [<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core] [<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib] [<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0 [<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib] [<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib] [<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib] [<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core] [<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm] [<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd] [<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs] [<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd] Fixes: 73898db04301 ("net/mlx4: Avoid wrong virtual mappings") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
d34ac5cd |
|
18-Jul-2018 |
Bart Van Assche <bvanassche@acm.org> |
RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const Since neither ib_post_send() nor ib_post_recv() modify the data structure their second argument points at, declare that argument const. This change makes it necessary to declare the 'bad_wr' argument const too and also to modify all ULPs that call ib_post_send(), ib_post_recv() or ib_post_srq_recv(). This patch does not change any functionality but makes it possible for the compiler to verify whether the ib_post_(send|recv|srq_recv) really do not modify the posted work request. To make this possible, only one cast had to be introduce that casts away constness, namely in rpcrdma_post_recvs(). The only way I can think of to avoid that cast is to introduce an additional loop in that function or to change the data type of bad_wr from struct ib_recv_wr ** into int (an index that refers to an element in the work request list). However, both approaches would require even more extensive changes than this patch. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
5e62d5ff |
|
24-Jun-2018 |
Jason Gunthorpe <jgg@ziepe.ca> |
IB/mlx4: Create slave AH's directly Since slave GID's do not exist in the core gid table we can no longer use the core code to help do this without creating inconsistencies. Directly create the AH using mlx4 internal APIs. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
1975acd9 |
|
19-Jun-2018 |
Yishai Hadas <yishaih@mellanox.com> |
IB/mlx4: Add support for drain SQ & RQ This patch follows the logic from ib_core but considers the internal device state upon executing the involved commands. Specifically, Upon internal error state modify QP to an error state can be assumed to be success as each in-progress WR going to be flushed in error in any case as expected by that modify command. In addition, As the drain should never fail the driver makes sure that post_send/recv will succeed even if the device is already in an internal error state. As such once the driver will supply the simulated/SW CQEs the CQE for the drain WR will be handled as well. In case of an internal error state the CQE for the drain WR may be completed as part of the main task that handled the error state or by the task that issued the drain WR. As the above depends on scheduling the code takes the relevant locks and actions to make sure that the completion handler for that WR will always be called after that the post_send/recv were issued but not in parallel to the other task that handles the error flow. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
7492052a |
|
13-Jun-2018 |
Parav Pandit <parav@mellanox.com> |
IB/mlx4: Use GID attribute from ah attribute While converting GID index from attribute to that of the HCA, GID attribute is available from the ah_attr. Make use of GID attribute to simplify the code and also avoid avoid GID query. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
6d06c9aa |
|
15-Mar-2018 |
Guy Levi <guyle@mellanox.com> |
IB/mlx4: Add Scatter FCS support over WQ creation As a default, for Ethernet packets, the device scatters only the payload of ingress packets. The scatter FCS feature lets the user to get the FCS (Ethernet's frame check sequence) in the received WR's buffer as a 4 Bytes trailer following the packet's payload. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
48962f5c |
|
13-Mar-2018 |
Jason Gunthorpe <jgg@ziepe.ca> |
RDMA/mlx4: Move flag constants to uapi header MLX4_USER_DEV_CAP_LARGE_CQE (via mlx4_ib_alloc_ucontext_resp.dev_caps) and MLX4_IB_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET (via mlx4_uverbs_ex_query_device_resp.comp_mask) are copied directly to userspace and form part of the uAPI. Move them to the uapi header where they belong. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
d50a8a96 |
|
26-Feb-2018 |
Yishai Hadas <yishaih@mellanox.com> |
IB/mlx4: Move mlx4_uverbs_ex_query_device_resp to include/uapi/ This struct is involved in the user API for mlx4 and should not be hidden inside a driver header file. Fixes: 09d208b258a2 ("IB/mlx4: Add report for RSS capabilities by vendor channel") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
#
34d9a270 |
|
13-Nov-2017 |
Yonatan Cohen <yonatanc@mellanox.com> |
IB/mlx4: Exposing modify CQ callback to uverbs layer Exposed mlx4_ib_modify_cq to be called from ib device verb list. Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ed8637d3 |
|
02-Nov-2017 |
Guy Levi <guyle@mellanox.com> |
IB/mlx4: Add contig support for control objects Taking advantage of the optimization which was introduced in previous commit ("IB/mlx4: Use optimal numbers of MTT entries") to optimize the MTT usage for QP and CQ. Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
09d208b2 |
|
25-Oct-2017 |
Guy Levi <guyle@mellanox.com> |
IB/mlx4: Add report for RSS capabilities by vendor channel The mlx4's RSS patches submission missed a report of RSS capabilities which should be reported by the vendor channel in query_device. Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
3078f5f1 |
|
04-Jul-2017 |
Guy Levi <guyle@mellanox.com> |
IB/mlx4: Add support for RSS QP Add support to work with a RSS QP by using an indirection table object upon QP creation. Other related QP verbs (e.g. modify/destroy/query) were updated as well for that QP mode. Notes: - The RX hash properties are supplied as driver private data. - The RSS QP port is used on the associated WQs in its indirection table. Applying different ports during WQ life time is not allowed. - The expected RSS QP flow is: create, modify(RST->INIT), modify(RST->RTR), destroy. Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
b8d46ca0 |
|
04-Jul-2017 |
Guy Levi <guyle@mellanox.com> |
IB/mlx4: Add support for WQ indirection table related verbs To enable RSS functionality the IB indirection table object (i.e. ib_rwq_ind_table) should be used. This patch implements the related verbs as of create and destroy an indirection table. In downstream patches the indirection table will be used as part of RSS QP creation. Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
400b1ebc |
|
04-Jul-2017 |
Guy Levi <guyle@mellanox.com> |
IB/mlx4: Add support for WQ related verbs Support create/modify/destroy WQ related verbs. The base IB object to enable RSS functionality is a WQ (i.e. ib_wq). This patch implements the related WQ verbs as of create, modify and destroy. In downstream patches the WQ will be used as part of an indirection table (i.e. ib_rwq_ind_table) to enable RSS QP creation. Notes: ConnectX-3 hardware requires consecutive WQNs list as receive descriptor queues for the RSS QP. Hence, the driver manages consecutive ranges lists per context which the user must respect. Destroying the WQ does not return its WQN back to its range for reusing. However, destroying all WQs from the same range releases the range and in turn releases its WQNs for reusing. Since the WQ object is not a natural object in the hardware, the driver implements the WQ by the hardware QP. As such, the WQ inherits its port from its RSS QP parent upon its RST->INIT transition and by that time its state is applied to the hardware. Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ea30b966 |
|
21-Jun-2017 |
Maor Gottlieb <maorg@mellanox.com> |
IB/mlx4: Add inline-receive support When inline-receive is enabled, the HCA may write received data into the receive WQE. Inline-receive is enabled by setting its matching bit in the QP context and each single-packet message with payload not exceeding the receive WQE size will be delivered to the WQE. The completion report will indicate that the payload was placed to the WQE. It includes: 1) Return maximum supported size of inline-receive by the hardware in query_device vendor's data part. 2) Enable the feature when requested by the vendor data input. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
8900b894 |
|
23-May-2017 |
Leon Romanovsky <leon@kernel.org> |
{net, IB}/mlx4: Remove gfp flags argument The caller to the driver marks GFP_NOIO allocations with help of memalloc_noio-* calls now. This makes redundant to pass down to the driver gfp flags, which can be GFP_KERNEL only. The patch removes the gfp flags argument and updates all driver paths. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
90898850 |
|
29-Apr-2017 |
Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> |
IB/core: Rename struct ib_ah_attr to rdma_ah_attr This patch simply renames struct ib_ah_attr to rdma_ah_attr as these fields specify attributes that are not necessarily specific to IB. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
d66c88a8 |
|
20-Jan-2017 |
Bart Van Assche <bvanassche@acm.org> |
IB/mlx4: Switch from dma_device to dev.parent Prepare for removal of ib_device.dma_device. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
477864c8 |
|
22-Nov-2016 |
Moni Shoua <monis@mellanox.com> |
IB/core: Let create_ah return extended response to user Add struct ib_udata to the signature of create_ah callback that is implemented by IB device drivers. This allows HW drivers to return extra data to the userspace library. This patch prepares the ground for mlx5 driver to resolve destination mac address for a given GID and return it to userspace. This patch was previously submitted by Knut Omang as a part of the patch set to support Oracle's Infiniband HCA (SIF). Signed-off-by: Knut Omang <knut.omang@oracle.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
fd10ed8e |
|
12-Sep-2016 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets In MLX qp packets, the LRH (built by the driver) has both a VL field and an SL field. When building a QP1 packet, the VL field should reflect the SLtoVL mapping and not arbitrarily contain zero (as is done now). This bug causes credit problems in IB switches at high rates of QP1 packets. The fix is to cache the SL to VL mapping in the driver, and look up the VL mapped to the SL provided in the send request when sending QP1 packets. For FW versions which support generating a port_management_config_change event with subtype sl-to-vl-table-change, the driver uses that event to update its sl-to-vl mapping cache. Otherwise, the driver snoops incoming SMP mads to update the cache. There remains the case where the FW is running in secure-host mode (so no QP0 packets are delivered to the driver), and the FW does not generate the sl2vl mapping change event. To support this case, the driver updates (via querying the FW) its sl2vl mapping cache when running in secure-host mode when it receives either a Port Up event or a client-reregister event (where the port is still up, but there may have been an opensm failover). OpenSM modifies the sl2vl mapping before Port Up and Client-reregister events occur, so if there is a mapping change the driver's cache will be properly updated. Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
8ec07bf8 |
|
12-Sep-2016 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Use correct subnet-prefix in QP1 mads under SR-IOV When sending QP1 MAD packets which use a GRH, the source GID (which consists of the 64-bit subnet prefix, and the 64 bit port GUID) must be included in the packet GRH. For SR-IOV, a GID cache is used, since the source GID needs to be the slave's source GID, and not the Hypervisor's GID. This cache also included a subnet_prefix. Unfortunately, the subnet_prefix field in the cache was never initialized (to the default subnet prefix 0xfe80::0). As a result, this field remained all zeroes. Therefore, when SR-IOV was active, all QP1 packets which included a GRH had a source GID subnet prefix of all-zeroes. However, the subnet-prefix should initially be 0xfe80::0 (the default subnet prefix). In addition, if OpenSM modifies a port's subnet prefix, the new subnet prefix must be used in the GRH when sending QP1 packets. To fix this we now initialize the subnet prefix in the SR-IOV GID cache to the default subnet prefix. We update the cached value if/when OpenSM modifies the port's subnet prefix. We take this cached value when sending QP1 packets when SR-IOV is active. Note that the value is stored as an atomic64. This eliminates any need for locking when the subnet prefix is being updated. Note also that we depend on the FW generating the "port management change" event for tracking subnet-prefix changes performed by OpenSM. If running early FW (before 2.9.4630), subnet prefix changes will not be tracked (but the default subnet prefix still will be stored in the cache; therefore users who do not modify the subnet prefix will not have a problem). IF there is a need for such tracking also for early FW, we will add that capability in a subsequent patch. Fixes: 1ffeb2eb8be9 ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
3f85f2aa |
|
19-Jul-2016 |
Mark Bloch <markb@mellanox.com> |
IB/mlx4: Add diagnostic hardware counters Expose IB diagnostic hardware counters. The counters count IB events and are applicable for IB and RoCE. The counters can be divided into two groups, per device and per port. Device counters are always exposed. Port counters are exposed only if the firmware supports per port counters. rq_num_dup and sq_num_to are only exposed if we have firmware support for them, if we do, we expose them per device and per port. rq_num_udsdprd and num_cqovf are device only counters. rq - denotes responder. sq - denotes requester. |-----------------------|---------------------------------------| | Name | Description | |-----------------------|---------------------------------------| |rq_num_lle | Number of local length errors | |-----------------------|---------------------------------------| |sq_num_lle | number of local length errors | |-----------------------|---------------------------------------| |rq_num_lqpoe | Number of local QP operation errors | |-----------------------|---------------------------------------| |sq_num_lqpoe | Number of local QP operation errors | |-----------------------|---------------------------------------| |rq_num_lpe | Number of local protection errors | |-----------------------|---------------------------------------| |sq_num_lpe | Number of local protection errors | |-----------------------|---------------------------------------| |rq_num_wrfe | Number of CQEs with error | |-----------------------|---------------------------------------| |sq_num_wrfe | Number of CQEs with error | |-----------------------|---------------------------------------| |sq_num_mwbe | Number of Memory Window bind errors | |-----------------------|---------------------------------------| |sq_num_bre | Number of bad response errors | |-----------------------|---------------------------------------| |sq_num_rire | Number of Remote Invalid request | | | errors | |-----------------------|---------------------------------------| |rq_num_rire | Number of Remote Invalid request | | | errors | |-----------------------|---------------------------------------| |sq_num_rae | Number of remote access errors | |-----------------------|---------------------------------------| |rq_num_rae | Number of remote access errors | |-----------------------|---------------------------------------| |sq_num_roe | Number of remote operation errors | |-----------------------|---------------------------------------| |sq_num_tree | Number of transport retries exceeded | | | errors | |-----------------------|---------------------------------------| |sq_num_rree | Number of RNR NAK retries exceeded | | | errors | |-----------------------|---------------------------------------| |rq_num_rnr | Number of RNR NAKs sent | |-----------------------|---------------------------------------| |sq_num_rnr | Number of RNR NAKs received | |-----------------------|---------------------------------------| |rq_num_oos | Number of Out of Sequence requests | | | received | |-----------------------|---------------------------------------| |sq_num_oos | Number of Out of Sequence NAKs | | | received | |-----------------------|---------------------------------------| |rq_num_udsdprd | Number of UD packets silently | | | discarded on the Receive Queue due to | | | lack of receive descriptor | |-----------------------|---------------------------------------| |rq_num_dup | Number of duplicate requests received | |-----------------------|---------------------------------------| |sq_num_to | Number of time out received | |-----------------------|---------------------------------------| |num_cqovf | Number of CQ overflows | |-----------------------|---------------------------------------| Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
cbc9355a |
|
22-Jun-2016 |
Chuck Lever <chuck.lever@oracle.com> |
IB/mlx4: Prevent cross page boundary allocation Prevent cross page boundary allocation by allocating new page, this is required to be aligned with ConnectX-3 HW requirements. Not doing that might cause to "RDMA read local protection" error. Fixes: 1b2cd0fc673c ('IB/mlx4: Support the new memory registration API') Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
9aa8b321 |
|
12-May-2016 |
Bart Van Assche <bvanassche@acm.org> |
IB/core: Enhance ib_map_mr_sg() The SRP initiator allows to set max_sectors to a value that exceeds the largest amount of data that can be mapped at once with an mlx4 HCA using fast registration and a page size of 4 KB. Hence modify ib_map_mr_sg() such that it can map partial sg-elements. If an sg-element has been mapped partially, let the caller know which fraction has been mapped by adjusting *sg_offset. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ff2ba993 |
|
03-May-2016 |
Christoph Hellwig <hch@lst.de> |
IB/core: Add passing an offset into the SG to ib_map_mr_sg Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
b2a239df |
|
29-Feb-2016 |
Matan Barak <matanb@mellanox.com> |
IB/core: Add vendor's specific data to alloc mw Passing udata to the vendor's driver in order to pass data from the user-space driver to the kernel-space driver. This data will be used in downstream patches. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e1b866c6 |
|
14-Jan-2016 |
Moni Shoua <monis@mellanox.com> |
IB/mlx4: Create and use another QP1 for RoCEv2 The mlx4 driver uses a special QP to implement the GSI QP. This kind of QP allows to build the InfiniBand headers in software. When mlx4 hardware builds the packet, it calculates the ICRC and puts it at the end of the payload. However, this ICRC calculation depends on the QP configuration, which is determined when the QP is modified (roce_mode during INIT->RTR). When receiving a packet, the ICRC verification doesn't depend on this configuration. Therefore, using two GSI QPs for send (one for each RoCE version) and one GSI QP for receive are required. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
b699a859 |
|
14-Jan-2016 |
Moni Shoua <monis@mellanox.com> |
IB/mlx4: Add gid_type to GID properties IB core driver adds a property of type to struct ib_gid_attr. The mlx4 driver should take that in consideration when modifying or querying the hardware gid table. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
feb7c1e3 |
|
23-Dec-2015 |
Christoph Hellwig <hch@lst.de> |
IB: remove in-kernel support for memory windows Remove the unused ib_allow_mw and ib_bind_mw functions, remove the unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw into the uverbs module. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e761c67f |
|
13-Oct-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/mlx4: Remove old FRWR API support No ULP uses it anymore, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
1b2cd0fc |
|
13-Oct-2015 |
Sagi Grimberg <sagig@mellanox.com> |
IB/mlx4: Support the new memory registration API Support the new memory registration API by allocating a private page list array in mlx4_ib_mr and populate it when mlx4_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR by setting the exact WQE as IB_WR_FAST_REG_MR, just take the needed information from different places: - page_size, iova, length, access flags (ib_mr) - page array (mlx4_ib_mr) - key (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
dbf727de |
|
15-Oct-2015 |
Matan Barak <matanb@mellanox.com> |
IB/core: Use GID table in AH creation and dmac resolution Previously, vlan id and source MAC were used from QP attributes. Since the net device is now stored in the GID attributes, they could be used instead of getting this information from the QP attributes. IB_QP_SMAC, IB_QP_ALT_SMAC, IB_QP_VID and IB_QP_ALT_VID were removed because there is no known libibverbs that uses them. This commit also modifies the vendors (mlx4, ocrdma) drivers in order to use the new approach. ocrdma driver changes were done by Somnath Kotur <Somnath.Kotur@Avagotech.Com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
7b59f0f9 |
|
15-Oct-2015 |
Eran Ben Elisha <eranbe@mellanox.com> |
IB/mlx4: Add counter based implementation for QP multicast loopback block Current implementation for MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is not supported when link layer is Ethernet. This patch will add counter based implementation for multicast loopback prevention. HW can drop multicast loopback packets if sender QP counter index is equal to receiver QP counter index. If qp flag MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is set and link layer is Ethernet, create a new counter and attach it to the QP so it will continue receiving multicast loopback traffic but it's own. The decision if to create a new counter is being made at the qp modification to RTR after the QP's port is set. When QP is destroyed or moved back to reset state, delete the counter. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
3ba8e31d |
|
15-Oct-2015 |
Eran Ben Elisha <eranbe@mellanox.com> |
IB/mlx4: Add IB counters table This is an infrastructure step for allocating and attaching more than one counter to QPs on the same port. Allocate a counters table and manage the insertion and removals of the counters in load and unload of mlx4 IB. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
ae184dde |
|
13-Aug-2015 |
Yishai Hadas <yishaih@mellanox.com> |
IB/mlx4_ib: Disassociate support Implements the IB core disassociate_ucontext API. The driver detaches the HW resources for a given user context to prevent a dependency between application termination and device disconnecting. This is done by managing the VMAs that were mapped to the HW bars such as door bell and blueflame. When need to detach remap them to an arbitrary kernel page returned by the zap API. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
7dd97576 |
|
30-Jul-2015 |
Jason Gunthorpe <jgg@ziepe.ca> |
IB/mlx4: Remove ib_get_dma_mr calls The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
5070cd22 |
|
30-Jul-2015 |
Moni Shoua <monis@mellanox.com> |
IB/mlx4: Replace mechanism for RoCE GID management Manage RoCE gid table with logic in IB/core, which is common to all vendors, and remove the mechanism from the mlx4 IB driver. Since management of the GID cache may lead to index mismatch with the hardware GID table, a translation between indexes is required when modifying a QP or creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
e26be1bf |
|
30-Jul-2015 |
Moni Shoua <monis@mellanox.com> |
IB/mlx4: Implement ib_device callbacks get_netdev: get the net_device on the physical port of the IB transport port. In port aggregation mode it is required to return the netdev of the active port. modify_gid: note for a change in the RoCE gid cache. Handle this by writing to the harsware GID table. It is possible that indexes in cahce and hardware tables won't match so a translation is required when modifying a QP or creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
679e34d1 |
|
30-Jul-2015 |
Sagi Grimberg <sagig@mellanox.com> |
mlx4: Support ib_alloc_mr verb Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
c3abb51b |
|
15-Jun-2015 |
Eran Ben Elisha <eranbe@mellanox.com> |
IB/mlx4: Add RoCE/IB dedicated counters This is an infrastructure step to attach all the QPs opened from the IB driver to a counter in order to collect VF stats from the PF using those counters. If the port's type is Ethernet, the counter policy demands two counters per port (one for RoCE and one for Ethernet). The port default counter (allocated in mlx4_core) is used for the Ethernet netdev QPs and we allocate another counter for RoCE. If the port's traffic is Infiniband, the counter policy demands one counter per port, so it can use the port's default counter. Also, Add 'allocated' flag for each counter in order to clean it at unload. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4cd7c947 |
|
06-Jun-2015 |
Ira Weiny <ira.weiny@intel.com> |
IB/mad: Add support for additional MAD info to/from drivers In order to support alternate sized MADs (and variable sized MADs on OPA devices) add in/out MAD size parameters to the process_mad core call. In addition, add an out_mad_pkey_index to communicate the pkey index the driver wishes the MAD stack to use when sending OPA MAD responses. The out MAD size and the out MAD PKey index are required by the MAD stack to generate responses on OPA devices. Furthermore, the in and out MAD parameters are made generic by specifying them as ib_mad_hdr rather than ib_mad. Drivers are modified as needed and are protected by BUG_ON flags if the MAD sizes passed to them is incorrect. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
4b664c43 |
|
11-Jun-2015 |
Matan Barak <matanb@mellanox.com> |
IB/mlx4: Add support for CQ time-stamping This includes: * support allocation of CQ with the TIMESTAMP_COMPLETION creation flag. * add timestamp_mask and hca_core_clock to query_device, reporting the number of supported timestamp bits (mask) and the hca_core_clock frequency. * return hca core clock's offset in query_device vendor's data, this is needed in order to read the HCA's core clock. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
bcf4c1ea |
|
11-Jun-2015 |
Matan Barak <matanb@mellanox.com> |
IB/core: Change provider's API of create_cq to be extendible Add a new ib_cq_init_attr structure which contains the previous cqe (minimum number of CQ entries) and comp_vector (completion vector) in addition to a new flags field. All vendors' create_cq callbacks are changed in order to work with the new API. This commit does not change any functionality. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2 Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
a97e2d86 |
|
31-May-2015 |
Ira Weiny <ira.weiny@intel.com> |
IB/core cleanup: Add const on args - device->process_mad The process_mad device function declares some parameters as "in". Make those parameters const and adjust the call tree under process_mad in the various drivers accordingly. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
c66fa19c |
|
31-May-2015 |
Matan Barak <matanb@mellanox.com> |
net/mlx4: Add EQ pool Previously, mlx4_en allocated EQs and used them exclusively. This affected RoCE performance, as applications which are events sensitive were limited to use only the legacy EQs. Change that by introducing an EQ pool. This pool is managed by mlx4_core. EQs are assigned to ports (when there are limited number of EQs, multiple ports could be assigned to the same EQs). An exception to this rule is the ASYNC EQ which handles various events. Legacy EQs are completely removed as all EQs could be shared. When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for EQ serving on a specific port. The core driver calculates which EQ should be assigned to that request. Because IRQs are shared between IB and Ethernet modules, their names only include the PCI device BDF address. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee59fa0d |
|
03-Mar-2015 |
Yishai Hadas <yishaih@mellanox.com> |
IB/mlx4: Request alias GUID on demand Request GIDs from the SM on demand, i.e., when a VF actually needs them, and release them when the GIDs are no longer in use. In cloud environments, this is useful for GID migrations, in which a GID is assigned to a VF on the destination HCA, while the VF on the source HCA is shutdown (but the GID was not administratively released). Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
f5479601 |
|
03-Mar-2015 |
Yishai Hadas <yishaih@mellanox.com> |
IB/mlx4: Change init flow to request alias GUIDs for active VFs Change the init flow to ask GUIDs only for active VFs. This is done for both SM & HOST modes so that there is no need any more to maintain the ownership record type. In case SM mode is used, the initial value will be 0, ask the SM to assign, for the HOST mode the initial value will be the HOST generated GUID. This will enable out of the box experience for both probed and attached VFs. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
99ee4df6 |
|
03-Feb-2015 |
Yishai Hadas <yishaih@mellanox.com> |
IB/mlx4: Alias GUID adding persistency support If the SM rejects an alias GUID request the PF driver keeps trying to acquire the specified GUID indefinitely, utilizing an exponential backoff scheme. Retrying is managed per GUID entry. Each entry that wasn't applied holds its next retry information. Retry requests to the SM consist of records of 8 consecutive GUIDS. Each record that contains GUIDs requiring retries holds its next time-to-run based on the retry information of all its GUID entries. The record having the lowest retry time will run first when that retry time arrives. Since the method (SET or DELETE) as sent to the SM applies to all the GUIDs in the record, we must handle SET requests and DELETE requests in separate SM messages (one for SETs and the other for DELETEs). To avoid race conditions where a GUID entry request (set or delete) was modified after the SM request was sent, we save the method and the requested indices as part of the callback's context -- thus, only the requested indexes are evaluated when the response is received. When an GUID entry is approved we turn off its retry-required bit, this prevents redundant SM retries from occurring on that record. The port down event should be sent only when previously it was up. Likewise, the port up event should be sent only if previously the port was down. Synchronization was added around the flows that change entries and record state to prevent race conditions. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
|
#
35f05dab |
|
08-Feb-2015 |
Yishai Hadas <yishaih@mellanox.com> |
IB/mlx4: Reset flow support for IB kernel ULPs The driver exposes interfaces that directly relate to HW state. Upon fatal error, consumers of these interfaces (ULPs) that rely on completion of all their posted work-request could hang, thereby introducing dependencies in shutdown order. To prevent this from happening, we manage the relevant resources (CQs, QPs) that are used by the device. Upon a fatal error, we now generate simulated completions for outstanding WQEs that were not completed at the time the HW was reset. It includes invoking the completion event handler for all involved CQs so that the ULPs will poll those CQs. When polled we return simulated CQEs with IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and not wait forever for completions upon receiving remove_one. The above change requires an extra check in the data path to make sure that when device is in error state, the simulated CQEs will be returned and no further WQEs will be posted. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c6215745 |
|
03-Feb-2015 |
Moni Shoua <monis@mellanox.com> |
IB/mlx4: Load balance ports in port aggregation mode When the mlx4 IB (RoCE) device works in link aggregation mode, it exposes a single port to upper layers. Therefore, applications always set '1' in port_num attribute when modifying a QP or creating an address handle. To make sure that a node uses all available ports the mlx4 driver will override the port_num attribute with a round robin policy. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
146d6e19 |
|
03-Feb-2015 |
Moni Shoua <monis@mellanox.com> |
IB/mlx4: Create mirror flows in port aggregation mode In port aggregation mode flows for port #1 (the only port) should be mirrored on port #2. This is because packets can arrive from either physical ports. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3e0629cb |
|
11-Sep-2014 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header The source MAC is needed in RoCE when building the QP1 header. Currently, this is obtained from the source net device. However, the net device may not yet exist, or can be destroyed in parallel to this QP1 send operation (e.g through the VPI port change flow) so accessing it may cause a kernel crash. To fix this, we maintain a source MAC cache per port for the net device in struct mlx4_ib_roce. This cached MAC is initialized to be the default MAC address obtained during HCA initialization via QUERY_PORT. This cached MAC is updated via the netdev event notifier handler. Since the cached MAC is held in an atomic64 object, we do not need locking when accessing it. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
9376932d |
|
31-Jul-2014 |
Matan Barak <matanb@mellanox.com> |
IB/mlx4_ib: Add support for user MR re-registration This enables the user to change the protection domain, access flags and translation (address and length) of the MR. Use basic mlx4_core helper functions to get, update and set MPT and MTT objects according to the required modifications. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
40f2287b |
|
11-May-2014 |
Jiri Kosina <jkosina@suse.cz> |
IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO Modify the various routines used to allocate memory resources which serve QPs in mlx4 to get an input GFP directive. Have the Ethernet driver to use GFP_KERNEL in it's QP allocations as done prior to this commit, and the IB driver to use GFP_NOIO when the IB verbs IB_QP_CREATE_USE_GFP_NOIO QP creation flag is provided. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
9433c188 |
|
15-May-2014 |
Matan Barak <matanb@mellanox.com> |
IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes When we receive a netdev event indicating a netdev change and/or a netdev address change, we must change the MAC index used by the proxy QP1 (in the QP context), otherwise RoCE CM packets sent by the VF will not carry the same source MAC address as the non-CM packets. We use the UPDATE_QP command to perform this change. In order to avoid modifying a QP context based on netdev event, while the driver attempts to destroy this QP (e.g either the mlx4_ib or ib_mad modules are unloaded), we use mutex locking in both flows. Since the relevant mlx4 proxy GSI QP is created indirectly by the mad module when they create their GSI QP, the mlx4 didn't need to keep track on that QP prior to this change. Now, when QP modifications are needed to this QP from within the driver, we added refernece to it. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5ea8bbfc |
|
11-Mar-2014 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
mlx4: Implement IP based gids support for RoCE/SRIOV Since there is no connection between the MAC/VLAN and the GID when using IP-based addressing, the proxy QP1 (running on the slave) must pass the source-mac, destination-mac, and vlan_id information separately from the GID. Additionally, the Host must pass the remote source-mac and vlan_id back to the slave, This is achieved as follows: Outgoing MADs: 1. Source MAC: obtained from the CQ completion structure (struct ib_wc, smac field). 2. Destination MAC: obtained from the tunnel header 3. vlan_id: obtained from the tunnel header. Incoming MADs 1. The source (i.e., remote) MAC and vlan_id are passed in the tunnel header to the proxy QP1. VST mode support: For outgoing MADs, the vlan_id obtained from the header is discarded, and the vlan_id specified by the Hypervisor is used instead. For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the "invalid" vlan (0xffff) is substituted when forwarding to the slave. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f5bb473 |
|
11-Mar-2014 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
mlx4: Add ref counting to port MAC table for RoCE The IB side of RoCE requires the MAC table index of the MAC address used by its QPs. To obtain the real MAC index, the IB side registers the MAC (increasing its ref count, and also returning the real MAC index) during the modify-qp sequence. This protects against the ETH side deleting or modifying that MAC table entry while the QP is active. Note that until the modify-qp command returns success, the MAC and VLAN information only has "candidate" status. If the modify-qp succeeds, the "candidate" info is promoted to the operational MAC/VLAN info for the qp. If the modify fails, the candidate MAC/VLAN is unregistered, and the old qp info is preserved. The patch is a bit complex, because there are multiple qp transitions where the primary-path information may be modified: INIT-to-RTR, and SQD-to-SQD. Similarly for the alternate path information. Therefore the code must handle cases where path information has already been entered into the QP context by previous qp transitions. For the MAC address, the success logic is as follows: 1. If there was no previous MAC, simply move the candidate MAC information to the operational information, and reset the candidate MAC info. 2. If there was a previous MAC, unregister it. Then move the MAC information from candidate to operational, and reset the candidate info (as in 1. above). The MAC address failure logic is the same for all cases: - Unregister the candidate MAC, and reset the candidate MAC info. For Vlan registration, the logic is similar. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
297e0dad |
|
12-Dec-2013 |
Moni Shoua <monis@mellanox.com> |
IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing IP based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN. Therefore, we need to extract them from the CQE and place them in struct ib_wc (to be used for cases were they were taken from the gid). Also, when modifying a QP or building address handle, instead of parsing the dgid to get the MAC and VLAN, take them from the address handle attributes. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
d487ee77 |
|
12-Dec-2013 |
Moni Shoua <monis@mellanox.com> |
IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table Currently, the mlx4 driver set IBoE (RoCE) gids to encode related Ethernet netdevice interface MAC address and possibly VLAN id. Change this scheme such that gids encode interface IP addresses (both IP4 and IPv6). This requires learning the IP addresses which are of use by a netdevice associated with the HCA port, formatting them to gids and adding them to the port gid table. Furthermore, events of add and delete address are caught to maintain the gid table accordingly. Associated IP addresses may belong to a master of an Ethernet netdevice on top of that port so this should be considered when building and maintaining the gid table. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
c1c98501 |
|
07-Nov-2013 |
Matan Barak <matanb@mellanox.com> |
IB/mlx4: Add support for steerable IB UD QPs This patch adds support for steerable (NETIF) QP creation. When we create the device, we allocate a range of steerable QPs. Afterward when a QP is created with the NETIF flag, it's allocated from this range. Allocation is managed by bitmap allocator. Internal steering rules for those QPs is automatically generated on their creation. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
0a9b7d59 |
|
07-Nov-2013 |
Matan Barak <matanb@mellanox.com> |
IB/mlx4: Enable device-managed steering support for IB ports too Up until now, flow steering wasn't supported when using IB ports. This patch enables support for flow steering if all hardware ports support that, for example the new MLX4_DEV_CAP_FLAG2_DMFS_IPOIB mlx4 device capability. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
f77c0162 |
|
14-Aug-2013 |
Hadar Hen Zion <hadarh@mellanox.com> |
IB/mlx4: Add receive flow steering support Implement ib_create_flow() and ib_destroy_flow(). Translate the verbs structures provided by the user to HW structures and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands. On the ATTACH command completion, the firmware provides a 64-bit registration ID, which is placed into struct mlx4_ib_flow that wraps the instance of struct ib_flow which is retuned to caller. Later, this reg ID is used for detaching that flow from the firmware. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
6ff63e19 |
|
06-Feb-2013 |
Shani Michaeli <shanim@mellanox.com> |
IB/mlx4: Support memory window binding * Implement memory windows binding in mlx4_ib_post_send. * Implement mlx4_ib_bind_mw by deferring to mlx4_ib_post_send. * Rename MLX4_WQE_FMR_PERM_* flags to MLX4_WQE_FMR_AND_BIND_PERM_*, indicating that they are used both for fast registration work requests, and for memory window bind work requests. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
804d6a89 |
|
06-Feb-2013 |
Shani Michaeli <shanim@mellanox.com> |
mlx4: Implement memory windows allocation and deallocation Implement MW allocation and deallocation in mlx4_core and mlx4_ib. Pass down the enable bind flag when registering memory regions. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
a29bec12 |
|
25-Feb-2013 |
Roland Dreier <roland@purestorage.com> |
IB/mlx4: Convert is_xxx variables in build_mlx_header() to bool Matches the way they're used, and actually lets at least x86-64 generate better code: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-38 (-38) function old new delta mlx4_ib_post_send 4416 4378 -38 Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
08ff3235 |
|
21-Oct-2012 |
Or Gerlitz <ogerlitz@mellanox.com> |
mlx4: 64-byte CQE/EQE support ConnectX-3 devices can use either 64- or 32-byte completion queue entries (CQEs) and event queue entries (EQEs). Using 64-byte EQEs/CQEs performs better because each entry is aligned to a complete cacheline. This patch queries the HCA's capabilities, and if it supports 64-byte CQEs and EQES the driver will configure the HW to work in 64-byte mode. The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ. Since this mode is global, userspace (libmlx4) must be updated to work with the configured CQE size, and guests using SR-IOV virtual functions need to know both EQE and CQE size. In case one of the 64-byte CQE/EQE capabilities is activated, the patch makes sure that older guest drivers that use the QUERY_DEV_FUNC command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that they need an update to be able to work with the PPF. This is done by changing the returned pf_context_behaviour not to be zero any more. In case none of these capabilities is activated that value remains zero and older guest drivers can run OK. The SRIOV related flow is as follows 1. the PPF does the detection of the new capabilities using QUERY_DEV_CAP command. 2. the PPF activates the new capabilities using INIT_HCA. 3. the VF detects if the PPF activated the capabilities using QUERY_HCA, and if this is the case activates them for itself too. Note that the VF detects that it must be aware to the new PF behaviour using QUERY_FUNC_CAP. Steps 1 and 2 apply also for native mode. User space notification is done through a new field introduced in struct mlx4_ib_ucontext which holds device capabilities for which user space must take action. This changes the binary interface so the ABI towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only when **needed** i.e. only when the driver does use 64-byte CQEs or future device capabilities which must be in sync by user space. This practice allows to work with unmodified libmlx4 on older devices (e.g A0, B0) which don't support 64-byte CQEs. In order to keep existing systems functional when they update to a newer kernel that contains these changes in VF and userspace ABI, a module parameter enable_64b_cqe_eqe must be set to enable 64-byte mode; the default is currently false. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
afa8fd1d |
|
03-Aug-2012 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
mlx4: Paravirtualize Node Guids for slaves This is necessary in order to support > 1 VF/PF in a VM for software that uses the node guid as a discriminator, such as librdmacm. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
c1e7e466 |
|
03-Aug-2012 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Add iov directory in sysfs under the ib device This directory is added only for the master -- slaves do not have it. The sysfs iov directory is used to manage and examine the port P_Key and guid paravirtualization. Under iov/ports, the administrator may examine the gid and P_Key tables as they are present in the device (and as are seen in the "network view" presented to the SM). Under the iov/<pci slot number> directories, the admin may map the index numbers in the physical tables (as under iov/ports) to the paravirtualized index numbers that guests see. For example, if the administrator, for port 1 on guest 2 maps physical pkey index 10 to virtual index 1, then that guest, whenever it uses its pkey index 1, will actually be using the real pkey index 10. Based on patch from Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
a0c64a17 |
|
03-Aug-2012 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
mlx4: Add alias_guid mechanism For IB ports, we paravirtualize the GUID at index 0 on slaves. The GUID at index 0 seen by a slave is the actual GUID occupying the GUID table at the slave-id index. The driver, by default, requests at startup time that subnet manager populate its entire guid table with GUIDs. These guids are then mapped (paravirtualized) to the slaves, and appear for each slave as its GUID at index 0. Until each slave has such a guid, its port status is DOWN. The guid table is cached to support special QP paravirtualization, and event propagation to slaves on guid change (we test to see if the guid really changed before propagating an event to the slave). To support this caching, add capability to __mlx4_ib_query_gid() to obtain the network view (i.e., physical view) gid at index X, not just the host (paravirtualized) view. Based on a patch from Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
3cf69cc8 |
|
03-Aug-2012 |
Amir Vadai <amirv@mellanox.co.il> |
IB/mlx4: Add CM paravirtualization In CM para-virtualization: 1. Incoming requests are steered to the correct vHCA according to the embedded GID. 2. Communication IDs on outgoing requests are replaced by a globally unique ID, generated by the PPF, since there is no synchronization of ID generation between guests (and so these IDs are not guaranteed to be globally unique). The guest's comm ID is stored, and is returned to the response MAD when it arrives. Signed-off-by: Amir Vadai <amirv@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
b9c5d6a6 |
|
03-Aug-2012 |
Oren Duer <oren@mellanox.co.il> |
IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV MCG paravirtualization support includes: - Creating multicast groups by VFs, and keeping accounting of them - Leaving multicast groups by VFs - Updating SM only with real changes in the overall picture of MCGs status - Creation of MGID=0 groups (let SM choose MGID) Note that the MCG module maintains its own internal MCG object reference counts. The reason for this is that the IB core is used to track only the multicast groups joins generated by the PF it runs over. The PF IB core layer is unaware of slaves, so it cannot be used to keep track of MCG joins they generate. Signed-off-by: Oren Duer <oren@mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
0a9a0188 |
|
03-Aug-2012 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
mlx4: MAD_IFC paravirtualization The MAD_IFC firmware command fulfills two functions. First, it is used in the QP0/QP1 MAD-handling flow to obtain information from the FW (for answering queries), and for setting variables in the HCA (MAD SET packets). For this, MAD_IFC should provide the FW (physical) view of the data. This is the view that OpenSM needs. We call this the "network view". In the second case, MAD_IFC is used by various verbs to obtain data regarding the local HCA (e.g., ib_query_device()). We call this the "host view". This data needs to be paravirtualized. MAD_IFC therefore needs a wrapper function, and also needs another flag indicating whether it should provide the network view (when it is called by ib_process_mad in special-qp packet handling), or the host view (when it is called while implementing a verb). There are currently 2 flag parameters in mlx4_MAD_IFC already: ignore_bkey and ignore_mkey. These two parameters are replaced by a single "mad_ifc_flags" parameter, with different bits set for each flag. A third flag is added: "network-view/host-view". Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
fc06573d |
|
03-Aug-2012 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Initialize SR-IOV IB support for slaves in master context Allocate SR-IOV paravirtualization resources and MAD demuxing contexts on the master. This has two parts. The first part is to initialize the structures to contain the contexts. This is done at master startup time in mlx4_ib_init_sriov(). The second part is to actually create the tunneling resources required on the master to support a slave. This is performed the master detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event generated when a slave initializes its comm channel). For the master, there is no such startup event, so it creates its own tunneling resources when it starts up. In addition, the master also creates the real special QPs. The ib_core layer on the master causes creation of proxy special QPs, since the master is also paravirtualized at the ib_core layer. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
1ffeb2eb |
|
03-Aug-2012 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support 1. Introduce the basic SR-IOV parvirtualization context objects for multiplexing and demultiplexing MADs. 2. Introduce support for the new proxy and tunnel QP types. This patch introduces the objects required by the master for managing QP paravirtualization for guests. struct mlx4_ib_sriov is created by the master only. It is a container for the following: 1. All the info required by the PPF to multiplex and de-multiplex MADs (including those from the PF). (struct mlx4_ib_demux_ctx demux) 2. All the info required to manage alias GUIDs (i.e., the GUID at index 0 that each guest perceives. In fact, this is not the GUID which is actually at index 0, but is, in fact, the GUID which is at index[<VF number>] in the physical table. 3. structures which are used to manage CM paravirtualization 4. structures for managing the real special QPs when running in SR-IOV mode. The real SQPs are controlled by the PPF in this case. All SQPs created and controlled by the ib core layer are proxy SQP. struct mlx4_ib_demux_ctx contains the information per port needed to manage paravirtualization: 1. All multicast paravirt info 2. All tunnel-qp paravirt info for the port. 3. GUID-table and GUID-prefix for the port 4. work queues. struct mlx4_ib_demux_pv_ctx contains all the info for managing the paravirtualized QPs for one slave/port. struct mlx4_ib_demux_pv_qp contains the info need to run an individual QP (either tunnel qp or real SQP). Note: We made use of the 2 most significant bits in enum mlx4_ib_qp_flags (based on enum ib_qp_create_flags in ib_verbs.h). We need these bits in the low-level driver for internal purposes. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
00f5ce99 |
|
19-Jun-2012 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
mlx4: Use port management change event instead of smp_snoop The port management change event can replace smp_snoop. If the capability bit for this event is set in dev-caps, the event is used (by the driver setting the PORT_MNG_CHG_EVENT bit in the async event mask in the MAP_EQ fw command). In this case, when the driver passes incoming SMP PORT_INFO SET mads to the FW, the FW generates port management change events to signal any changes to the driver. If the FW generates these events, smp_snoop shouldn't be invoked in ib_process_mad(), or duplicate events will occur (once from the FW-generated event, and once from smp_snoop). In the case where the FW does not generate port management change events smp_snoop needs to be invoked to create these events. The flow in smp_snoop has been modified to make use of the same procedures as in the fw-generated-event event case to generate the port management events (LID change, Client-rereg, Pkey change, and/or GID change). Port management change event handling required changing the mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument (last argument) had to be changed to unsigned long in order to accomodate passing the EQE pointer. We also needed to move the definition of struct mlx4_eqe from net/mlx4.h to file device.h -- to make it available to the IB driver, to handle port management change events. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
b1d8eb5a |
|
19-Jun-2012 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Add debug prints Define pr_fmt and add some pr_debug prints. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
0ff1fb65 |
|
04-Jul-2012 |
Hadar Hen Zion <hadarh@mellanox.co.il> |
{NET, IB}/mlx4: Add device managed flow steering firmware API The driver is modified to support three operation modes. If supported by firmware use the device managed flow steering API, that which we call device managed steering mode. Else, if the firmware supports the B0 steering mode use it, and finally, if none of the above, use the A0 steering mode. When the steering mode is device managed, the code is modified such that L2 based rules set by the mlx4_en driver for Ethernet unicast and multicast, and the IB stack multicast attach calls done through the mlx4_ib driver are all routed to use the device managed API. When attaching rule using device managed flow steering API, the firmware returns a 64 bit registration id, which is to be provided during detach. Currently the firmware is always programmed during HCA initialization to use standard L2 hashing. Future work should be done to allow configuring the flow-steering hash function with common, non proprietary means. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fc2d0044 |
|
24-May-2012 |
Sagi Grimberg <sagig@mellanox.co.il> |
IB/mlx4: Fix max_wqe capacity reported from query device 1. Limit the max number of WQEs per QP reported when querying the device, so that ib_create_qp() will not fail for a QP size that the device claimed to support due to additional headroom WQEs being allocated. 2. Limit qp resources accepted for ib_create_qp() to the limits reported in ib_query_device(). In kernel space, make sure that the limits returned to the caller following qp creation also lie within the reported device limits. For userspace, report as before, and do adjustment in libmlx4 (so as not to break ABI). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Sagi Grimberg <sagig@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
e605b743 |
|
29-Apr-2012 |
Shlomo Pongratz <shlomop@mellanox.com> |
IB/mlx4: Increase the number of vectors (EQs) available for ULPs Enable IB ULPs to use a larger portion of the device EQs (which map to IRQs). The mlx4_ib driver follows the mlx4_core framework of the EQs to be divided among the device ports. In this scheme, for each IB port, the number of allocated EQs follows the number of cores, subject to other system constraints, such as number available MSI-X vectors. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
0a1405da |
|
02-Jun-2011 |
Sean Hefty <sean.hefty@intel.com> |
IB/mlx4: Add support for XRC QPs Support the creation of XRC INI and TGT QPs. To handle the case where a CQ or PD is not provided, we allocate them internally with the xrcd. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
012a8ff5 |
|
02-Jun-2011 |
Sean Hefty <sean.hefty@intel.com> |
IB/mlx4: Add support for XRC domains Support creating and destroying XRC domains. Any sharing of the XRCD is managed above the low-level driver. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
cfcde11c |
|
15-Jun-2011 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/mlx4: Use flow counters on IBoE ports Allocate flow counter per Ethernet/IBoE port, and attach this counter to all the QPs created on that port. Based on patch by Eli Cohen <eli@mellanox.co.il>. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
#
fa417f7b |
|
24-Oct-2010 |
Eli Cohen <eli@mellanox.co.il> |
IB/mlx4: Add support for IBoE Add support for IBoE to mlx4_ib. The bulk of the code is handling the new address vector fields; mlx4 needs the MAC address of a remote node to include it in a WQE (for datagrams) or in the QP context (for connected QPs). Address resolution is done by assuming all unicast GIDs are either link-local IPv6 addresses. Multicast group attach/detach needs to update the NIC's multicast filters; but since attaching a QP to a multicast group can be done before the QP is bound to a port, for IBoE we need to keep track of all multicast groups that a QP is attached too before it transitions from INIT to RTR (since it does not have a port in the INIT state). Signed-off-by: Eli Cohen <eli@mellanox.co.il> [ Many things cleaned up and otherwise monkeyed with; hope I didn't introduce too many bugs. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
3b4a8cd5 |
|
05-Sep-2009 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Don't allow userspace open while recovering from catastrophic error Userspace apps are supposed to release all ib device resources if they receive a fatal async event (IBV_EVENT_DEVICE_FATAL). However, the app has no way of knowing when the device has come back up, except to repeatedly attempt ibv_open_device() until it succeeds. However, currently there is no protection against the open succeeding while the device is in being removed following the fatal event. In this case, the open will succeed, but as a result the device waits in the middle of its removal until the new app releases its resources -- and the new app will not do so, since the open succeeded at a point following the fatal event generation. This patch adds an "active" flag to the device. The active flag is set to false (in the fatal event flow) before the "fatal" event is generated, so any subsequent ibv_dev_open() call to the device will fail until the device comes back up, thus preventing the above deadlock. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
2b6b7d4b |
|
07-May-2009 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Don't overwrite fast registration page list when posting work request The low-level mlx4 driver modified the page-list addresses for fast register work requests post send to big-endian, and set a "present" bit. This caused problems later when the consumer attempted to unmap the pages using the page-list (using the list addresses which were assumed to be still in CPU-endian order). Fix the mlx4 driver to allocate two buffers and use a private buffer for the hardware-format bus addresses. This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1571>, an NFS/RDMA server crash. The cause of the crash was found by Vu Pham of Mellanox. The fix is along the lines suggested by Steve Wise in comment #21 in bug 1571. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
7ff93f8b |
|
22-Oct-2008 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
mlx4_core: Multiple port type support Multi-protocol adapters support different port types. Each consumer of mlx4_core queries for supported port types; in particular mlx4_ib can no longer assume that all physical ports belong to it. Port type is configured through a sysfs interface. When the type of a port is changed, all mlx4 interfaces are unregistered, and then registered again with the new port types. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
51a379d0 |
|
25-Jul-2008 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
mlx4: Update/add Mellanox Technologies copyright lines to mlx4 driver files Update existing Mellanox copyright lines to 2008, and add such lines to files where they are missing. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
95d04f07 |
|
23-Jul-2008 |
Roland Dreier <rolandd@cisco.com> |
IB/mlx4: Add support for memory management extensions and local DMA L_Key Add support for the following operations to mlx4 when device firmware supports them: - Send with invalidate and local invalidate send queue work requests; - Allocate/free fast register MRs; - Allocate/free fast register MR page lists; - Fast register MR send queue work requests; - Local DMA L_Key. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
521e575b |
|
15-Jul-2008 |
Ron Livne <ronli@voltaire.com> |
IB/mlx4: Add support for blocking multicast loopback packets Add support for handling the IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK flag by using the per-multicast group loopback blocking feature of mlx4 hardware. Signed-off-by: Ron Livne <ronli@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
6296883c |
|
23-Apr-2008 |
Yevgeny Petrilin <yevgenyp@mellanox.co.il> |
mlx4_core: Move kernel doorbell management into core In addition to mlx4_ib, there will be ethernet and FC consumers of mlx4_core, so move the code for managing kernel doorbells into the core module to avoid having to duplicate this multiple times. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
bbf8eed1 |
|
16-Apr-2008 |
Vladimir Sokolovsky <vlad@dev.mellanox.co.il> |
IB/mlx4: Add support for resizing CQs Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
3fdcb97f |
|
16-Apr-2008 |
Eli Cohen <eli@dev.mellanox.co.il> |
IB/mlx4: Add support for modifying CQ moderation parameters Signed-off-by: Eli Cohen <eli@mellnaox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
b832be1e |
|
16-Apr-2008 |
Eli Cohen <eli@dev.mellanox.co.il> |
IB/mlx4: Add IPoIB LSO support Add TSO support to the mlx4_ib driver. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
ea54b10c |
|
28-Jan-2008 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Use multiple WQ blocks to post smaller send WQEs ConnectX HCA supports shrinking WQEs, so that a single work request can be made of multiple units of wqe_shift. This way, WRs can differ in size, and do not have to be a power of 2 in size, saving memory and speeding up send WR posting. Unfortunately, if we do this then the wqe_index field in CQEs can't be used to look up the WR ID anymore, so our implementation does this only if selective signaling is off. Further, on 32-bit platforms, we can't use vmap() to make the QP buffer virtually contigious. Thus we have to use constant-sized WRs to make sure a WR is always fully within a single page-sized chunk. Finally, we use WRs with the NOP opcode to avoid wrapping around the queue buffer in the middle of posting a WR, and we set the NoErrorCompletion bit to avoid getting completions with error for NOP WRs. However, NEC is only supported starting with firmware 2.2.232, so we use constant-sized WRs for older firmware. And, since MLX QPs only support SEND, we use constant-sized WRs in this case. When stamping during NOP posting, do stamping following setting of the NOP WQE valid bit. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
8ad11fb6 |
|
31-Jul-2007 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Implement FMRs Implement FMRs for mlx4. This is an adaptation of code from mthca. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
65541cb7 |
|
21-Jun-2007 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Implement query SRQ Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
6a775e2b |
|
20-Jun-2007 |
Jack Morgenstein <jackm@dev.mellanox.co.il> |
IB/mlx4: Implement query QP Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
63019d93 |
|
10-Jun-2007 |
Michael S. Tsirkin <mst@dev.mellanox.co.il> |
IB/mlx4: Include linux/mutex.h from mlx4_ib.h mlx4_ib.h uses struct mutex, so although <linux/mutex.h> seems to be pulled in indirectly by one of the headers it includes, the right thing is to include <linux/mutex.h> directly. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
0e6e7416 |
|
18-Jun-2007 |
Roland Dreier <rolandd@cisco.com> |
IB/mlx4: Handle new FW requirement for send request prefetching New ConnectX firmware introduces FW command interface revision 2, which requires that for each QP, a chunk of send queue entries (the "headroom") is kept marked as invalid, so that the HCA doesn't get confused if it prefetches entries that haven't been posted yet. Add code to the driver to do this, and also update the user ABI so that userspace can request that the prefetcher be turned off for userspace QPs (we just leave the prefetcher on for all kernel QPs). Unfortunately, marking send queue entries this way is confuses older firmware, so we change the driver to allow only FW command interface revisions 2. This means that users will have to update their firmware to work with the new driver, but the firmware is changing quickly and the old firmware has lots of other bugs anyway, so this shouldn't be too big a deal. Based on a patch from Jack Morgenstein <jackm@dev.mellanox.co.il>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
#
225c7b1f |
|
08-May-2007 |
Roland Dreier <rolandd@cisco.com> |
IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters Add an InfiniBand driver for Mellanox ConnectX adapters. Because these adapters can also be used as ethernet NICs and Fibre Channel HBAs, the driver is split into two modules: mlx4_core: Handles low-level things like device initialization and processing firmware commands. Also controls resource allocation so that the InfiniBand, ethernet and FC functions can share a device without stepping on each other. mlx4_ib: Handles InfiniBand-specific things; plugs into the InfiniBand midlayer. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|