History log of /freebsd-current/sys/ofed/drivers/infiniband/core/ib_verbs.c
Revision Date Author Comments
# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# b633e08c 16-Jun-2021 Hans Petter Selasky <hselasky@FreeBSD.org>

ibcore: Kernel space update based on Linux 5.7-rc1.

Overview:

This is the first stage of a RDMA stack upgrade introducing kernel
changes only based on Linux 5.7-rc1.

This patch is based on about four main areas of work:
- Update of the IB uobjects system:
- The memory holding so-called AH, CQ, PD, SRQ and UCONTEXT objects
is now managed by ibcore. This also require some changes in the
kernel verbs API. The updated verbs changes are typically about
initialize and deinitialize objects, and remove allocation and
free of memory.

- Update of the uverbs IOCTL framework:
- The parsing and handling of user-space commands has been
completely refactored to integrate with the updated IB uobjects
system.

- Various changes and updates to the generic uverbs interfaces in
device drivers including the new uAPI surface.

- The mlx5_ib_devx.c in mlx5ib and related mlx5 core changes.

Dependencies:

- The mlx4ib driver code has been updated with the minimum changes
needed.

- The mlx5ib driver code has been updated with the minimum changes
needed including DV support.

Compatibility:

- All user-space facing APIs are backwards compatible after this
change.

- All kernel-space facing RDMA APIs are backwards compatible after
this change, with exception of ib_create_ah() and ib_destroy_ah()
which takes a new flag.

- The "ib_device_ops" structure exist, but only contains the driver ID
and some structure sizes.

Differences from Linux:

- Infiniband drivers must use the INIT_IB_DEVICE_OPS() macro to set
the sizes needed for allocating various IB objects, when adding
IB device instances.

Security:

- PRIV_NET_RAW is needed to use raw ethernet transmit features.
- PRIV_DRIVER is needed to use other privileged operations.

Based on upstream Linux, Torvalds (5.7-rc1):
8632e9b5645bbc2331d21d892b0d6961c1a08429

MFC after: 1 week
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31149
Sponsored by: NVIDIA Networking


# c3987b8e 16-Jun-2021 Hans Petter Selasky <hselasky@FreeBSD.org>

ibcore: Declare ib_post_send() and ib_post_recv() arguments const

Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.

Linux commit:
f696bf6d64b195b83ca1bdb7cd33c999c9dcf514
7bb1fafc2f163ad03a2007295bb2f57cfdbfb630
d34ac5cd3a73aacd11009c4fc3ba15d7ea62c411

MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking


# d92a9e56 16-Jun-2021 Hans Petter Selasky <hselasky@FreeBSD.org>

ibcore: Simplify ib_modify_qp_is_ok().

All callers to ib_modify_qp_is_ok() provides enum ib_qp_state makes the
checks of out-of-scope redundant. Let's remove them together with updating
function signature to return boolean result.

While at it remove unused "ll" parameter from ib_modify_qp_is_ok().

Linux commit:
19b1f54099b6ee334acbfbcfbdffd1d1f057216d
d31131bba5a1630304c55ea775c48cc84912ab59

MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking


# 0c13880c 16-Jun-2021 Hans Petter Selasky <hselasky@FreeBSD.org>

ibcore: Support rate limit for packet pacing

Add new member rate_limit to ib_qp_attr which holds the packet pacing
rate in kbps, 0 means unlimited.

IB_QP_RATE_LIMIT is added to ib_attr_mask and could be used by RAW
QPs when changing QP state from RTR to RTS, RTS to RTS.

Linux commit:
528e5a1bd3f0e9b760cb3a1062fce7513712a15d

MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking


# 3d2fb36a 16-Jun-2021 Hans Petter Selasky <hselasky@FreeBSD.org>

ibcore: Add new IB rates.

Add the new rates that were added to Infiniband spec as part of
HDR and 2x support.

Linux commit:
a5a5d1993696419e7d5357fc3128e53d219d382e

MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking


# bb43866c 04-Jun-2019 Slava Shwartsman <slavash@FreeBSD.org>

Fix prio vs. nonprio tagged traffic in RDMACM

In current RDMACM implementation RDMACM server will not find a GID
index when the request was prio-tagged and the sever is non
prio-tagged and vise-versa.
According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should
be considered as untagged. Treat RDMACM request the same.

Reviewed by: hselasky, kib
MFC after: 3 Days
Sponsored by: Mellanox Technologies


# 013f1e14 08-May-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Add new rates to ibcore.

Add the new rates that were added to the Infiniband specification as part of
HDR and 2x support.

Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies


# f628150b 05-Dec-2018 Slava Shwartsman <slavash@FreeBSD.org>

ibcore: Make sure GID index variable gets initialized.

Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies


# cda1e10c 17-Jul-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Use __FBSDID() for RCS tags in ibcore.

MFC after: 1 week
Sponsored by: Mellanox Technologies


# c68cb3a6 17-Jul-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Add support for RoCEv2 multicast in ibcore.

When creating address handle from multicast GID, set MAC according to
the appropriate formula instead of searching for it in the GID table:
- For IPv4 multicast GID use ip_eth_mc_map().
- For IPv6 multicast GID use ipv6_eth_mc_map().

Linux commit:
9636a56fa864464896bf7d1272c701f2b9a57737

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 56326b47 17-Jul-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Depend on IPv6 stack to resolve link local address for RoCEv2 in ibcore.

RoCEv1 does not use the IPv6 stack to resolve the link local DGID since it
uses GID address. It forms the DMAC directly from the DGID.

Linux commit:
56d0a7d9a0f045ee27a001762deac28c7d28e2e4

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 91f0e6e1 17-Jul-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Avoid that ib_drain_qp() triggers an out-of-bounds stack access in ibcore.

Linux commit:
a1ae7d0345edd593d6725d3218434d903a0af95d

MFC after: 1 week
Sponsored by: Mellanox Technologies


# f4546fa3 17-Jul-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Add support for prio-tagged traffic for RDMA in ibcore.

When receiving a PCP change all GID entries are reloaded.
This ensures the relevant GID entries use prio tagging,
by setting VLAN present and VLAN ID to zero.

The priority for prio tagged traffic is set using the regular
rdma_set_service_type() function.

Fake the real network device to have a VLAN ID of zero
when prio tagging is enabled. This is logic is hidden inside
the rdma_vlan_dev_vlan_id() function which must always be used
to retrieve the VLAN ID throughout all of ibcore and the
infiniband network drivers.

The VLAN presence information then propagates through all
of ibcore and so incoming connections will have the VLAN
bit set. The incoming VLAN ID is then checked against the
return value of rdma_vlan_dev_vlan_id().

MFC after: 1 week
Sponsored by: Mellanox Technologies


# b70db327 17-Jul-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Set RoCEv2 MGID according to spec in ibcore.

RoCEv2 Annex states that for RoCEv2 over IPv4, the corresponding
IPv4 address is encoded into the GID according to the following rule:
GID= :ffff:<IPv4 address>

Remove the 0xff0e prefix for RoCEv2 packets with IPv4 and leave it
zeroed and change rdma_is_multicast_addr() to consider the new logic.

Linux commit:
be1d325a335840a86c133a56c6a911c368bac0fd
1c3aea2bc8f0b2e5b57375ead40457ff75a3a2ec

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 71698382 17-Jul-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

For multicast functions in ibcore, verify that LIDs are multicast LIDs.

The Infiniband spec defines "A multicast address is defined by a
MGID and a MLID" (section 10.5).

Add check to verify that the MLID value is in the correct address
range.

RoCE Annex (A16.9.10/11) declares that during attach (detach) QP to a
multicast group, if the QP is associated with a RoCE port, the
multicast group MLID is unused and is ignored.

During attach or detach multicast, when the QP is associated with a
port, it is enough to check the port's link layer and validate the
LID only if it is Infiniband. Otherwise, avoid validating the
multicast LID.

Linux commit:
8561eae60ff9417a50fa1fb2b83ae950dc5c1e21
5236333592244557a19694a51337df6ac018f0a7

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 1456d97c 05-Mar-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Optimize ibcore RoCE address handle creation from user-space.

Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.

It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.

This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.

This commit combines the following Linux upstream commits:

IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah

MFC after: 1 week
Sponsored by: Mellanox Technologies


# bf8641fe 05-Mar-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Get correct network device when accepting incoming RDMA connections in ibcore.

This patch ensures the GID index is always used as a basis of resolving
incoming RDMA connections, as compared to the GID value itself.

Background:
On a per infiniband port basis, the GID identifier is not a unique identifier!
This assumption falls apart when VLAN ID, IPv6 scope ID and RoCE type,
as supported by RoCE v2, is taken into account. This additional
information is stored in the so-called GID attributes and is needed to
correctly identify the destination network interface for an incoming
connection.

Different VLANs are allowed to define the same IPv4 addresses and especially
for the default IPv6 link-local addresses or when using so-called containers
or jails, this is true.

The VNET information for the destination network interface is needed in
order to perform the L2 address lookup in the right Virtual Network Stack
context.

Consequently old functions previously used by RoCE v1, like
rdma_addr_find_smac_by_sgid() are impossible to support, because
there can be multiple identical GIDs associated with the same
infiniband port, and the answer to such a request becomes undefined.
This function has been removed.

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 676833da 05-Mar-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Pass valid if_index to rdma_addr_find_l2_eth_by_grh() in ibcore when possible.

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 09938b21 05-Mar-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Add missing FreeBSD tags and SVN properties to ibcore.

MFC after: 1 week
Sponsored by: Mellanox Technologies


# e5806bf4 23-Nov-2017 Hans Petter Selasky <hselasky@FreeBSD.org>

Compile fix for LINT-NOIP kernel target.

Sponsored by: Mellanox Technologies


# 478d3005 14-Jun-2017 Hans Petter Selasky <hselasky@FreeBSD.org>

Initial RoCE/infiniband kernel update to Linux v4.9.

This patch currently supports:
- ibcore as a kernel module only
- krping as a kernel module only
- ipoib as a kernel module only

Sponsored by: Mellanox Technologies