History log of /freebsd-11-stable/sys/dev/mlx5/mlx5_ib/mlx5_ib_main.c
Revision Date Author Comments
# 369102 22-Jan-2021 hselasky

MFC 9a47ae044b48:
Bump driver versions for mlx5en(4) and mlx4en(4).

Sponsored by: Mellanox Technologies // NVIDIA Networking

Git Hash: e87e3e82f3a062856118ed42751b498277eb09a5
Git Author: hselasky@FreeBSD.org


# 363151 13-Jul-2020 hselasky

MFC r362953:
Infiniband clients must be attached and detached in a specific order in ibcore.

Currently the linking order of the infiniband, IB, modules decide in which
order the clients are attached and detached. For example one IB client may
use resources from another IB client. This can lead to a potential deadlock
at shutdown. For example if the ipoib is unregistered after the ib_multicast
client is detached, then if ipoib is using multicast addresses a deadlock may
happen, because ib_multicast will wait for all its resources to be freed before
returning from the remove method.

Fix this by using module_xxx_order() instead of module_xxx().

Differential Revision: https://reviews.freebsd.org/D23973
Sponsored by: Mellanox Technologies


# 353268 07-Oct-2019 hselasky

MFC r352998:
Bump driver version for mlx5core, mlx5en(4) and mlx5ib(4).

Sponsored by: Mellanox Technologies


# 353185 07-Oct-2019 hselasky

MFC r352956:
Fix reported max SGE calculation in mlx5ib.

Add the 512 bytes limit of RDMA READ and the size of remote address to the max
SGE calculation.

Submitted by: slavash@
Linux commit: 288c01b746aa
Sponsored by: Mellanox Technologies


# 347883 16-May-2019 hselasky

MFC r347325:
Bump the Mellanox driver version numbers and the FreeBSD version number.

Sponsored by: Mellanox Technologies


# 347860 16-May-2019 hselasky

MFC r347304:
Always return success for RoCE modify port in mlx5ib.

CM layer calls ib_modify_port() regardless of the link layer.

For the Ethernet ports, qkey violation and Port capabilities
are meaningless. Therefore, always return success for ib_modify_port
calls on the Ethernet ports.

Linux Commit:
ec2558796d25e6024071b6bcb8e11392538d57bf

Submitted by: slavash@
Sponsored by: Mellanox Technologies


# 347859 16-May-2019 hselasky

MFC r347303:
Add support for new rates to mlx5ib.

Submitted by: slavash@
Sponsored by: Mellanox Technologies


# 347855 16-May-2019 hselasky

MFC r347299:
Add support for 200Gb ethernet speeds to mlx5core.

Submitted by: slavash@
Sponsored by: Mellanox Technologies


# 347809 16-May-2019 hselasky

MFC r347259:
Remove unused module parameter in mlx5ib.

Sponsored by: Mellanox Technologies


# 347801 16-May-2019 hselasky

MFC r347251:
Import Linux code to implement mlx5_ib_disassociate_ucontext() in mlx5ib.

Submitted by: kib@
Sponsored by: Mellanox Technologies


# 341987 12-Dec-2018 hselasky

MFC r341587:
mlx4/mlx5: Updated driver version to 3.5.0

Sponsored by: Mellanox Technologies


# 341956 12-Dec-2018 hselasky

MFC r341571:
mlx5ib: Set default active width and speed when querying port.

Make sure the active width and speed is set in case the
translate_eth_proto_oper() function doesn't recognize the
current port operation mask.

Linux commit:
7672ed33c4c15dbe9d56880683baaba4227cf940

Sponsored by: Mellanox Technologies


# 341950 12-Dec-2018 hselasky

MFC r341568:
mlx5ib: Fix sign extension in mlx5_ib_query_device

"fw_rev_min(dev->mdev)" with type "unsigned short" (16 bits, unsigned) is
promoted in "fw_rev_min(dev->mdev) << 16" to type "int" (32 bits, signed), then
sign-extended to type "unsigned long" (64 bits, unsigned). If
"fw_rev_min(dev->mdev) << 16" is greater than 0x7FFFFFFF, the upper bits of the
result will all be 1.

Sponsored by: Mellanox Technologies


# 341948 12-Dec-2018 hselasky

MFC r341567:
mlx5: Fix driver version location

Driver description should be set by core and not by the Ethernet driver.

Sponsored by: Mellanox Technologies


# 341922 12-Dec-2018 hselasky

MFC r341554:
mlx5: Raise fatal IB event when sys error occurs

All other mlx5_events report the port number as 1 based, which is how FW
reports it in the port event EQE. Reporting 0 for this event causes
mlx5_ib to not raise a fatal event notification to registered clients
due to a seemingly invalid port.

All switch cases in mlx5_ib_event that go through the port check are
supposed to set the port now, so just do it once at variable
declaration.

Linux commit:
aba462134634b502d720e15b23154f21cfa277e5

Sponsored by: Mellanox Technologies


# 337101 02-Aug-2018 hselasky

MFC r336395:
Update version information for the mlx5ib module.

Sponsored by: Mellanox Technologies


# 337100 02-Aug-2018 hselasky

MFC r336394:
Don't pass unsupported events to ibcore from mlx5ib.

Sponsored by: Mellanox Technologies


# 337099 02-Aug-2018 hselasky

MFC r336393:
Use static device naming instead of dynamic one in mlx5ib.

When resetting mlx5core instances it can happen that the order of attach and
detach for mlx5ib instances is changed. Take the unit number for mlx5_%d from
the parent PCI device, similarly to what is done in mlx5en(4), so that there
is a direct relationship between mce<N> and mlx5_<N>.

Sponsored by: Mellanox Technologies


# 337098 02-Aug-2018 hselasky

MFC r336392:
Implement support for Differentiated Service Code Point, DSCP, in mlx5en(4).

The DSCP feature is controlled using a set of sysctl(8) fields under
the qos sysctl directory entry for mlx5en(4).

For Routable RoCE QPs, the DSCP should be set in the QP's address path.
The DSCP's value is derived from the traffic class.

Linux commit:
ed88451e1f2d400fd6a743d0a481631cf9f97550

Sponsored by: Mellanox Technologies


# 337078 02-Aug-2018 hselasky

MFC r336372:
Add support for prio-tagged traffic for RDMA in ibcore.

When receiving a PCP change all GID entries are reloaded.
This ensures the relevant GID entries use prio tagging,
by setting VLAN present and VLAN ID to zero.

The priority for prio tagged traffic is set using the regular
rdma_set_service_type() function.

Fake the real network device to have a VLAN ID of zero
when prio tagging is enabled. This is logic is hidden inside
the rdma_vlan_dev_vlan_id() function which must always be used
to retrieve the VLAN ID throughout all of ibcore and the
infiniband network drivers.

The VLAN presence information then propagates through all
of ibcore and so incoming connections will have the VLAN
bit set. The incoming VLAN ID is then checked against the
return value of rdma_vlan_dev_vlan_id().

Sponsored by: Mellanox Technologies


# 331808 30-Mar-2018 hselasky

MFC r330648:
Add support for explicit congestion notification, ECN, to mlx5ib(4).

ECN configuration and statistics is available through a set of sysctl(8)
nodes under sys.class.infiniband.mlx5_X.cong . The ECN configuration
nodes can also be used as loader tunables.

Sponsored by: Mellanox Technologies


# 331805 30-Mar-2018 hselasky

MFC r330606:
Implement missing query for current port rate in mlx5ib(4).

This is a direct commit.

Sponsored by: Mellanox Technologies


# 331795 30-Mar-2018 hselasky

MFC r330597:
Disable unsupported disassociate ucontext functionality in mlx5ib(4).

Sponsored by: Mellanox Technologies


# 331784 30-Mar-2018 hselasky

MFC r330508:
Optimize ibcore RoCE address handle creation from user-space.

Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.

It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.

This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.

This commit combines the following Linux upstream commits:

IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah

Sponsored by: Mellanox Technologies


# 331769 30-Mar-2018 hselasky

MFC r303505, r303506, r303512, r303513, r303646, r320418, r323082,
r326169, r326563, r326649, r326716, r326764, r326765 and r329222:

RoCE/infiniband/iWarp upgrade to Linux 4.9 for kernel and userspace.
This commit merges projects/bsd_rdma_4_9 to 11-stable.

Compatibility wrappers have been made for existing 11-stable ibcore
APIs, including ib_reg_phys_mr().
Refer to "sys/ofed/include/rdma/ib_verbs_compat.h" for more information.

The iw_cxgb driver has not been updated and has been disconnected from
the build.

Sponsored by: Mellanox Technologies

MFC r326169 and r326563:
RoCE/infiniband upgrade to Linux v4.9 for kernel and userspace.

List of kernel sources used:
============================

1) kernel sources were cloned from git://github.com/torvalds/linux.git
Top commit 69973b830859bc6529a7a0468ba0d80ee5117826 - tag: v4.9, linux-4.9

2) krping was cloned from https://github.com/larrystevenwise/krping
Top commit 292a2f1abf0348285e678a82264740d52e4dcfe4

List of userspace sources used:
===============================

1) rdma-core was cloned from https://github.com/linux-rdma/rdma-core.git
Top commit d65138ef93af30b3ea249f3a84aa6a24ba7f8a75

2) OpenSM was cloned from git://git.openfabrics.org/~halr/opensm.git
Top commit 85f841cf209f791c89a075048a907020e924528d

3) libibmad was cloned from git://git.openfabrics.org/~iraweiny/libibmad.git
Tag 1.3.13 with some additional patches from Mellanox.

4) infiniband-diags was cloned from git://git.openfabrics.org/~iraweiny/infiniband-diags.git
Tag 1.6.7 with some additional patches from Mellanox.

NOTES:
======

1) The mthca driver has been removed from userspace.
2) All GPLv2 only sources have been removed and where applicable
rewritten from scratch under a BSD license.
3) List of fully supported drivers in userspace and kernel:
a) iw_cxgbe (Chelsio)
b) mlx4ib (Mellanox)
c) mlx5ib (Mellanox)
4) WITH_OFED=YES is still required by make in order to build
OFED userspace and kernel code.
5) Full support has been added for routable RoCE, RoCE v2.

MFC r326649:
Disconnect OFED after r326169 broke all DIRDEPS support for it.

MFC r326716:
Correctly define the unordered_map namespace in ofed/libibnetdisc .

This should fix ofed/libibnetdisc compilation with C-compilers
different from clang and GCC v4.2.1.

Submitted by: kib
Sponsored by: Mellanox Technologies

MFC r326764:
ofed: Remove duplicated symbols from the version file.

ld.bfd accepts multiple listing of the same symbol in the version script.
lld is stricter and errors out. Since arm64 and sometimes amd64 use lld,
we should correct this cosmetic issue.

Sponsored by: Mellanox Technologies
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D13329

MFC r326765:
ofed: Define barriers for mips and arm.

I used the strongest barriers available on the architectures, so if
the future analysis show that it is excessive, the barriers could be
relaxed. Still, it is unlikely that it is meaningful to run IB on 32bit
ARM or current MIPS machines, so the change is to make WITH_OFED to pass
tinderbox.

Sponsored by: Mellanox Technologies
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D13329

MFC r303505:
sdp: Use an mbufq for received control packets.

This is simpler than the hand-rolled queue, and fixes a use-after-free.

Sponsored by: EMC / Isilon Storage Division

MFC r303506:
sdp: Destroy the PCB lock before freeing to the zone.

Sponsored by: EMC / Isilon Storage Division

MFC r303512:
sdp: Use malloc(9) instead of the Linux compat layer.

SDP transmit and receive rings are always created in a sleepable context,
so we can use M_WAITOK and remove error checks.

Sponsored by: EMC / Isilon Storage Division

MFC r303513:
sdp: Destroy the RDMA ID after destroying the connection's queue pair.

This is the ordering documented by rdma_destroy_qp(). Also add a useful
KASSERT to sdp_pcbfree().

Sponsored by: EMC / Isilon Storage Division

MFC r303646:
ipoib: Bound the number of egress mbufs buffered during pathrec lookups.

In pathological situations where the master subnet manager becomes
unresponsive for an extended period, we may otherwise end up queuing all
of the system's mbufs while waiting for a response to a path record lookup.

This addresses the same issue as commit 1e85b806f9 in Linux.

Reviewed by: cem, ngie
Sponsored by: EMC / Isilon Storage Division

MFC r329222:
Import the mthca kernel side infiniband driver from Linux 4.9 and fix
compilation under FreeBSD. The mthca driver was temporarily removed as
part of the Linux 4.9 RoCE/infinband upgrade.

Top commit in Linux source tree:
69973b830859bc6529a7a0468ba0d80ee5117826

Sponsored by: Mellanox Technologies

MFC r320418. Note that the socket lock _is_ the same as so_rcv's lock
in 11 and this is a no-op in this branch.

Sponsored by: Chelsio Communications

MFC r323082:
cxgbe/iw_cxgbe: Set TCP_NODELAY before initiating connection so that
t4_tom picks it up right away. This is less work than waiting for
the connection to be established before applying the setting.

Sponsored by: Chelsio Communications


# 331576 26-Mar-2018 hselasky

MFC r330606:
Implement support for querying the current port rate in mlx5core.
The mlx5ib(4) part will be merged separately.

- Factor out port speed definitions into new port.h header file,
similarly as done in Linux upstream.
- Correct two existing port speed definitions in mlx5en according to
Linux upstream.

Sponsored by: Mellanox Technologies


# 325604 09-Nov-2017 hselasky

MFC r324792:
The remote DMA TCP portspace selector, RDMA_PS_TCP, is used for both
iWarp and RoCE in ibcore. The selection of RDMA_PS_TCP can not be used
to indicate iWarp protocol use. Backport the proper IB device
capabilities from Linux upstream to distinguish between iWarp and
RoCE. Only allocate the additional socket required for iWarp for RDMA
IDs when at least one iWarp device present. This resolves
interopability issues between iWarp and RoCE in ibcore

Reviewed by: np @
Differential Revision: https://reviews.freebsd.org/D12563
Sponsored by: Mellanox Technologies


# 325603 09-Nov-2017 hselasky

MFC r324491:
Use common rdma_ip2gid() function instead of custom mlx5_ip2gid() one.

Sponsored by: Mellanox Technologies


# 323218 06-Sep-2017 hselasky

MFC r322810 and r322830:
Add new mlx5ib(4) driver to the kernel source tree which supports
Remote DMA over Converged Ethernet, RoCE, for the ConnectX-4 series of
PCI express network cards.

There is currently no user-space support and this driver only supports
kernel side non-routable RoCE V1. The krping kernel module can be used
to test this driver. Full user-space support including RoCE V2 will be
added as part of the ongoing upgrade to ibcore from Linux 4.9. Otherwise
this driver is feature equivalent to mlx4ib(4). The mlx5ib(4) kernel
module will only be built when WITH_OFED=YES is specified.

Sponsored by: Mellanox Technologies