History log of /freebsd-11-stable/sys/ofed/drivers/infiniband/core/ib_addr.c
Revision Date Author Comments
# 341882 12-Dec-2018 hselasky

MFC r341534:
ibcore: Fix clearing of bound device interface.

Binding to a loopback device is not allowed. Make sure the destination
device address is global by clearing the bound device interface.
Only do this conditionally, else link local addresses won't work.

Sponsored by: Mellanox Technologies


# 341880 12-Dec-2018 hselasky

MFC r341533:
ibcore: ip6_dev_find() needs to know the scope ID.

Else the wrong network device can be returned for link-local addresses.

Sponsored by: Mellanox Technologies


# 337096 02-Aug-2018 hselasky

MFC r336391:
Use __FBSDID() for RCS tags in ibcore.

Sponsored by: Mellanox Technologies


# 337089 02-Aug-2018 hselasky

MFC r336384:
Fix for loopback detection in address resolve logic in ibcore.

When a loopback address is detected use the network interface which
has the loopback flag set to trigger loopback logic in address resolve.

Sponsored by: Mellanox Technologies


# 337085 02-Aug-2018 hselasky

MFC r336380:
Check AF family prior resolving address and introduce safer rdma_addr_size() variants in ibcore.

Garbage supplied by user will cause to UCMA module provide zero
memory size for memcpy(), because it wasn't checked, it will
produce unpredictable results in rdma_resolve_addr().

There are several places in the ucma ABI where userspace can pass in a
sockaddr but set the address family to AF_IB. When that happens,
rdma_addr_size() will return a size bigger than sizeof struct sockaddr_in6,
and the ucma kernel code might end up copying past the end of a buffer
not sized for a struct sockaddr_ib.

Fix this by introducing new variants
int rdma_addr_size_in6(struct sockaddr_in6 *addr);
int rdma_addr_size_kss(struct __kernel_sockaddr_storage *addr);

that are type-safe for the types used in the ucma ABI and return 0 if the
size computed is bigger than the size of the type passed in. We can use
these new variants to check what size userspace has passed in before
copying any addresses.

Linux commit:
2975d5de6428ff6d9317e9948f0968f7d42e5d74
09abfe7b5b2f442a85f4c4d59ecf582ad76088d7
84652aefb347297aa08e91e283adf7b18f77c2d5

Sponsored by: Mellanox Technologies


# 337074 02-Aug-2018 hselasky

MFC r336368:
Fix for RDMA loopback over VLAN in ibcore.

Implement a more generic solution for detecting loopback.
The problem was that the default netdevice was resolved
for loopback also when VLAN was used. Use real network
device instead of loopback device for bound device
interface.

How to test:
ucmatose -b 127.0.0.1 -p 20090
ucmatose -s 5.6.5.1 -p 20090

Note that RDMA treats the IPv4 and IPv6 loopback
addresses like any address.

Sponsored by: Mellanox Technologies


# 337073 02-Aug-2018 hselasky

MFC r336367:
Add native FreeBSD support for multicast in ibcore.

This change adds support for registering multicast addresses,
both IPv4 and IPv6.

Sponsored by: Mellanox Technologies


# 337070 02-Aug-2018 hselasky

MFC r336364:
Only update source address when resolving is successful in ibcore.

When resolving an IP address in ibcore, only update the source address
upon normal completion. The ibcore address resolve function does not
care about the scope ID value of the IPv6 link-local addresses and expects
this information has already been extracted into the bound_dev_if field.
Because the same IPv6 link-local address can exist on multiple interfaces
the ibcore address resolver gets confused and returns ENETUNREACH.

Instead of updating both source address and bound_dev_if just keep the
address set to any address until resolving completes. For the sake of code
symmetry a similar change has been applied to the IPv4 address resolve path.

Sponsored by: Mellanox Technologies


# 337069 02-Aug-2018 hselasky

MFC r336363:
Process address resolve requests at least one time per second in ibcore.

When setting a large address resolve timeout it was observed that the
address resolving would succeed at the timeout and not when the address
was available. Make sure the address resolving requests are processed no
slower than one time every second.

While at it use "int" for jiffies instead of "unsigned long" to match
FreeBSD ticks.

Sponsored by: Mellanox Technologies


# 331790 30-Mar-2018 hselasky

MFC r330585:
Define values instead of using hardcoding.

Sponsored by: Mellanox Technologies


# 331788 30-Mar-2018 hselasky

MFC r330583:
Embed the IPv6 scope ID before calling rtalloc1() in ibcore.
Else rtalloc1() will resolve to the loopback interface.

Sponsored by: Mellanox Technologies


# 331783 30-Mar-2018 hselasky

MFC r330507:
Get correct network device when accepting incoming RDMA connections in ibcore.

This patch ensures the GID index is always used as a basis of resolving
incoming RDMA connections, as compared to the GID value itself.

Background:
On a per infiniband port basis, the GID identifier is not a unique identifier!
This assumption falls apart when VLAN ID, IPv6 scope ID and RoCE type,
as supported by RoCE v2, is taken into account. This additional
information is stored in the so-called GID attributes and is needed to
correctly identify the destination network interface for an incoming
connection.

Different VLANs are allowed to define the same IPv4 addresses and especially
for the default IPv6 link-local addresses or when using so-called containers
or jails, this is true.

The VNET information for the destination network interface is needed in
order to perform the L2 address lookup in the right Virtual Network Stack
context.

Consequently old functions previously used by RoCE v1, like
rdma_addr_find_smac_by_sgid() are impossible to support, because
there can be multiple identical GIDs associated with the same
infiniband port, and the answer to such a request becomes undefined.
This function has been removed.

Sponsored by: Mellanox Technologies


# 331781 30-Mar-2018 hselasky

MFC r330504:
Add support for loopback in ibcore.

Implement the missing pieces in addr_resolve() to support loopback
addresses. IB core will test for the IFF_LOOPBACK flag in the network
interface and treat these devices in a special way.

Sponsored by: Mellanox Technologies


# 331772 30-Mar-2018 hselasky

MFC r330490:
Add missing FreeBSD tags and SVN properties to ibcore.

Sponsored by: Mellanox Technologies


# 331769 30-Mar-2018 hselasky

MFC r303505, r303506, r303512, r303513, r303646, r320418, r323082,
r326169, r326563, r326649, r326716, r326764, r326765 and r329222:

RoCE/infiniband/iWarp upgrade to Linux 4.9 for kernel and userspace.
This commit merges projects/bsd_rdma_4_9 to 11-stable.

Compatibility wrappers have been made for existing 11-stable ibcore
APIs, including ib_reg_phys_mr().
Refer to "sys/ofed/include/rdma/ib_verbs_compat.h" for more information.

The iw_cxgb driver has not been updated and has been disconnected from
the build.

Sponsored by: Mellanox Technologies

MFC r326169 and r326563:
RoCE/infiniband upgrade to Linux v4.9 for kernel and userspace.

List of kernel sources used:
============================

1) kernel sources were cloned from git://github.com/torvalds/linux.git
Top commit 69973b830859bc6529a7a0468ba0d80ee5117826 - tag: v4.9, linux-4.9

2) krping was cloned from https://github.com/larrystevenwise/krping
Top commit 292a2f1abf0348285e678a82264740d52e4dcfe4

List of userspace sources used:
===============================

1) rdma-core was cloned from https://github.com/linux-rdma/rdma-core.git
Top commit d65138ef93af30b3ea249f3a84aa6a24ba7f8a75

2) OpenSM was cloned from git://git.openfabrics.org/~halr/opensm.git
Top commit 85f841cf209f791c89a075048a907020e924528d

3) libibmad was cloned from git://git.openfabrics.org/~iraweiny/libibmad.git
Tag 1.3.13 with some additional patches from Mellanox.

4) infiniband-diags was cloned from git://git.openfabrics.org/~iraweiny/infiniband-diags.git
Tag 1.6.7 with some additional patches from Mellanox.

NOTES:
======

1) The mthca driver has been removed from userspace.
2) All GPLv2 only sources have been removed and where applicable
rewritten from scratch under a BSD license.
3) List of fully supported drivers in userspace and kernel:
a) iw_cxgbe (Chelsio)
b) mlx4ib (Mellanox)
c) mlx5ib (Mellanox)
4) WITH_OFED=YES is still required by make in order to build
OFED userspace and kernel code.
5) Full support has been added for routable RoCE, RoCE v2.

MFC r326649:
Disconnect OFED after r326169 broke all DIRDEPS support for it.

MFC r326716:
Correctly define the unordered_map namespace in ofed/libibnetdisc .

This should fix ofed/libibnetdisc compilation with C-compilers
different from clang and GCC v4.2.1.

Submitted by: kib
Sponsored by: Mellanox Technologies

MFC r326764:
ofed: Remove duplicated symbols from the version file.

ld.bfd accepts multiple listing of the same symbol in the version script.
lld is stricter and errors out. Since arm64 and sometimes amd64 use lld,
we should correct this cosmetic issue.

Sponsored by: Mellanox Technologies
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D13329

MFC r326765:
ofed: Define barriers for mips and arm.

I used the strongest barriers available on the architectures, so if
the future analysis show that it is excessive, the barriers could be
relaxed. Still, it is unlikely that it is meaningful to run IB on 32bit
ARM or current MIPS machines, so the change is to make WITH_OFED to pass
tinderbox.

Sponsored by: Mellanox Technologies
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D13329

MFC r303505:
sdp: Use an mbufq for received control packets.

This is simpler than the hand-rolled queue, and fixes a use-after-free.

Sponsored by: EMC / Isilon Storage Division

MFC r303506:
sdp: Destroy the PCB lock before freeing to the zone.

Sponsored by: EMC / Isilon Storage Division

MFC r303512:
sdp: Use malloc(9) instead of the Linux compat layer.

SDP transmit and receive rings are always created in a sleepable context,
so we can use M_WAITOK and remove error checks.

Sponsored by: EMC / Isilon Storage Division

MFC r303513:
sdp: Destroy the RDMA ID after destroying the connection's queue pair.

This is the ordering documented by rdma_destroy_qp(). Also add a useful
KASSERT to sdp_pcbfree().

Sponsored by: EMC / Isilon Storage Division

MFC r303646:
ipoib: Bound the number of egress mbufs buffered during pathrec lookups.

In pathological situations where the master subnet manager becomes
unresponsive for an extended period, we may otherwise end up queuing all
of the system's mbufs while waiting for a response to a path record lookup.

This addresses the same issue as commit 1e85b806f9 in Linux.

Reviewed by: cem, ngie
Sponsored by: EMC / Isilon Storage Division

MFC r329222:
Import the mthca kernel side infiniband driver from Linux 4.9 and fix
compilation under FreeBSD. The mthca driver was temporarily removed as
part of the Linux 4.9 RoCE/infinband upgrade.

Top commit in Linux source tree:
69973b830859bc6529a7a0468ba0d80ee5117826

Sponsored by: Mellanox Technologies

MFC r320418. Note that the socket lock _is_ the same as so_rcv's lock
in 11 and this is a no-op in this branch.

Sponsored by: Chelsio Communications

MFC r323082:
cxgbe/iw_cxgbe: Set TCP_NODELAY before initiating connection so that
t4_tom picks it up right away. This is less work than waiting for
the connection to be established before applying the setting.

Sponsored by: Chelsio Communications