354996 |
22-Nov-2019 |
hselasky |
MFC r354728: Prevent potential underflow in ibcore.
Linux commit: a9018adfde809d44e71189b984fa61cc89682b5e
Sponsored by: Mellanox Technologies |
354994 |
22-Nov-2019 |
hselasky |
MFC r354727: Correct MR length field to be 64-bit in ibcore.
Linux commit: edd31551148c09608feee6b8756ad148d550ee3b
Sponsored by: Mellanox Technologies |
347857 |
16-May-2019 |
hselasky |
MFC r347301: Add new rates to ibcore.
Add the new rates that were added to the Infiniband specification as part of HDR and 2x support.
Submitted by: slavash@ Sponsored by: Mellanox Technologies |
338612 |
12-Sep-2018 |
hselasky |
MFC r338491: ibcore: Fix endless loop in searching for matching VLAN device
In r337943 ifnet's if_pcp was set to the PCP value in use instead of IFNET_PCP_NONE. Current ibcore code assumes that if_pcp is IFNET_PCP_NONE with VLAN interfaces so it can identify prio-tagged traffic. Fix that by explicitly verifying that that the if_type is IFT_ETHER and not IFT_L2VLAN.
Approved by: re (Marius), hselasky (mentor), kib (mentor) Sponsored by: Mellanox Technologies |
338557 |
10-Sep-2018 |
hselasky |
MFC r338541: Introduce and use sgid_index in CM requests in ibcore.
For RoCE, when CM requests are received for RC and UD connections, netdevice of the incoming request is unavailable. Because of that CM requests are always forwarded to init_net namespace.
Now that we have the GID index available, introduce SGID index in incoming CM requests and refer to the netdevice of it.
While at it fix some incorrect uses of init_net and make sure the rdma_create_id() function stores the VNET it is passed.
Based on linux commit: cee104334c98dd04e9dd4d9a4fa4784f7f6aada9
Sponsored by: Mellanox Technologies |
337097 |
02-Aug-2018 |
hselasky |
MFC r336964: Only NULL check the VNET pointer when VIMAGE is enabled in ibcore. Else a NULL VNET pointer should be ignored. This fixes address resolving when VIMAGE is disabled.
Sponsored by: Mellanox Technologies |
337088 |
02-Aug-2018 |
hselasky |
MFC r336383: Check port number supplied by user verbs cmds in ibcore.
The ib_uverbs_create_ah() ind ib_uverbs_modify_qp() calls receive the port number from user input as part of its attributes and assumes it is valid. Down on the stack, that parameter is used to access kernel data structures. If the value is invalid, the kernel accesses memory it should not. To prevent this, verify the port number before using it.
Linux commit: 5ecce4c9b17bed4dc9cb58bfb10447307569b77b a62ab66b13a0f9bcb17b7b761f6670941ed5cd62 5a7a88f1b488e4ee49eb3d5b82612d4d9ffdf2c3
Sponsored by: Mellanox Technologies |
337085 |
02-Aug-2018 |
hselasky |
MFC r336380: Check AF family prior resolving address and introduce safer rdma_addr_size() variants in ibcore.
Garbage supplied by user will cause to UCMA module provide zero memory size for memcpy(), because it wasn't checked, it will produce unpredictable results in rdma_resolve_addr().
There are several places in the ucma ABI where userspace can pass in a sockaddr but set the address family to AF_IB. When that happens, rdma_addr_size() will return a size bigger than sizeof struct sockaddr_in6, and the ucma kernel code might end up copying past the end of a buffer not sized for a struct sockaddr_ib.
Fix this by introducing new variants int rdma_addr_size_in6(struct sockaddr_in6 *addr); int rdma_addr_size_kss(struct __kernel_sockaddr_storage *addr);
that are type-safe for the types used in the ucma ABI and return 0 if the size computed is bigger than the size of the type passed in. We can use these new variants to check what size userspace has passed in before copying any addresses.
Linux commit: 2975d5de6428ff6d9317e9948f0968f7d42e5d74 09abfe7b5b2f442a85f4c4d59ecf582ad76088d7 84652aefb347297aa08e91e283adf7b18f77c2d5
Sponsored by: Mellanox Technologies |
337078 |
02-Aug-2018 |
hselasky |
MFC r336372: Add support for prio-tagged traffic for RDMA in ibcore.
When receiving a PCP change all GID entries are reloaded. This ensures the relevant GID entries use prio tagging, by setting VLAN present and VLAN ID to zero.
The priority for prio tagged traffic is set using the regular rdma_set_service_type() function.
Fake the real network device to have a VLAN ID of zero when prio tagging is enabled. This is logic is hidden inside the rdma_vlan_dev_vlan_id() function which must always be used to retrieve the VLAN ID throughout all of ibcore and the infiniband network drivers.
The VLAN presence information then propagates through all of ibcore and so incoming connections will have the VLAN bit set. The incoming VLAN ID is then checked against the return value of rdma_vlan_dev_vlan_id().
Sponsored by: Mellanox Technologies |
337076 |
02-Aug-2018 |
hselasky |
MFC r336370: Set RoCEv2 MGID according to spec in ibcore.
RoCEv2 Annex states that for RoCEv2 over IPv4, the corresponding IPv4 address is encoded into the GID according to the following rule: GID= :ffff:<IPv4 address>
Remove the 0xff0e prefix for RoCEv2 packets with IPv4 and leave it zeroed and change rdma_is_multicast_addr() to consider the new logic.
Linux commit: be1d325a335840a86c133a56c6a911c368bac0fd 1c3aea2bc8f0b2e5b57375ead40457ff75a3a2ec
Sponsored by: Mellanox Technologies |
331787 |
30-Mar-2018 |
hselasky |
MFC r330581: Add IB_SPEED_HDR definition in ibcore.
Sponsored by: Mellanox Technologies |
331784 |
30-Mar-2018 |
hselasky |
MFC r330508: Optimize ibcore RoCE address handle creation from user-space.
Creating a UD address handle from user-space or from the kernel-space, when the link layer is ethernet, requires resolving the remote L3 address into a L2 address. Doing this from the kernel is easy because the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily available. In userspace such an interface does not exist and kernel help is required.
It should be noted that in an IP-based GID environment, the GID itself does not contain all the information needed to resolve the destination IP address. For example information like VLAN ID and SCOPE ID, is not part of the GID and must be fetched from the GID attributes. Therefore a source GID should always be referred to as a GID index. Instead of going through various racy steps to obtain information about the GID attributes from user-space, this is now all done by the kernel.
This patch optimises the L3 to L2 address resolving using the existing create address handle uverbs interface, retrieving back the L2 address as an additional user-space information structure.
This commit combines the following Linux upstream commits:
IB/core: Let create_ah return extended response to user IB/core: Change ib_resolve_eth_dmac to use it in create AH IB/mlx5: Make create/destroy_ah available to userspace IB/mlx5: Use kernel driver to help userspace create ah IB/mlx5: Report that device has udata response in create_ah
Sponsored by: Mellanox Technologies |
331783 |
30-Mar-2018 |
hselasky |
MFC r330507: Get correct network device when accepting incoming RDMA connections in ibcore.
This patch ensures the GID index is always used as a basis of resolving incoming RDMA connections, as compared to the GID value itself.
Background: On a per infiniband port basis, the GID identifier is not a unique identifier! This assumption falls apart when VLAN ID, IPv6 scope ID and RoCE type, as supported by RoCE v2, is taken into account. This additional information is stored in the so-called GID attributes and is needed to correctly identify the destination network interface for an incoming connection.
Different VLANs are allowed to define the same IPv4 addresses and especially for the default IPv6 link-local addresses or when using so-called containers or jails, this is true.
The VNET information for the destination network interface is needed in order to perform the L2 address lookup in the right Virtual Network Stack context.
Consequently old functions previously used by RoCE v1, like rdma_addr_find_smac_by_sgid() are impossible to support, because there can be multiple identical GIDs associated with the same infiniband port, and the answer to such a request becomes undefined. This function has been removed.
Sponsored by: Mellanox Technologies |
331772 |
30-Mar-2018 |
hselasky |
MFC r330490: Add missing FreeBSD tags and SVN properties to ibcore.
Sponsored by: Mellanox Technologies |
331769 |
30-Mar-2018 |
hselasky |
MFC r303505, r303506, r303512, r303513, r303646, r320418, r323082, r326169, r326563, r326649, r326716, r326764, r326765 and r329222:
RoCE/infiniband/iWarp upgrade to Linux 4.9 for kernel and userspace. This commit merges projects/bsd_rdma_4_9 to 11-stable.
Compatibility wrappers have been made for existing 11-stable ibcore APIs, including ib_reg_phys_mr(). Refer to "sys/ofed/include/rdma/ib_verbs_compat.h" for more information.
The iw_cxgb driver has not been updated and has been disconnected from the build.
Sponsored by: Mellanox Technologies
MFC r326169 and r326563: RoCE/infiniband upgrade to Linux v4.9 for kernel and userspace.
List of kernel sources used: ============================
1) kernel sources were cloned from git://github.com/torvalds/linux.git Top commit 69973b830859bc6529a7a0468ba0d80ee5117826 - tag: v4.9, linux-4.9
2) krping was cloned from https://github.com/larrystevenwise/krping Top commit 292a2f1abf0348285e678a82264740d52e4dcfe4
List of userspace sources used: ===============================
1) rdma-core was cloned from https://github.com/linux-rdma/rdma-core.git Top commit d65138ef93af30b3ea249f3a84aa6a24ba7f8a75
2) OpenSM was cloned from git://git.openfabrics.org/~halr/opensm.git Top commit 85f841cf209f791c89a075048a907020e924528d
3) libibmad was cloned from git://git.openfabrics.org/~iraweiny/libibmad.git Tag 1.3.13 with some additional patches from Mellanox.
4) infiniband-diags was cloned from git://git.openfabrics.org/~iraweiny/infiniband-diags.git Tag 1.6.7 with some additional patches from Mellanox.
NOTES: ======
1) The mthca driver has been removed from userspace. 2) All GPLv2 only sources have been removed and where applicable rewritten from scratch under a BSD license. 3) List of fully supported drivers in userspace and kernel: a) iw_cxgbe (Chelsio) b) mlx4ib (Mellanox) c) mlx5ib (Mellanox) 4) WITH_OFED=YES is still required by make in order to build OFED userspace and kernel code. 5) Full support has been added for routable RoCE, RoCE v2.
MFC r326649: Disconnect OFED after r326169 broke all DIRDEPS support for it.
MFC r326716: Correctly define the unordered_map namespace in ofed/libibnetdisc .
This should fix ofed/libibnetdisc compilation with C-compilers different from clang and GCC v4.2.1.
Submitted by: kib Sponsored by: Mellanox Technologies
MFC r326764: ofed: Remove duplicated symbols from the version file.
ld.bfd accepts multiple listing of the same symbol in the version script. lld is stricter and errors out. Since arm64 and sometimes amd64 use lld, we should correct this cosmetic issue.
Sponsored by: Mellanox Technologies Reviewed by: hselasky Differential revision: https://reviews.freebsd.org/D13329
MFC r326765: ofed: Define barriers for mips and arm.
I used the strongest barriers available on the architectures, so if the future analysis show that it is excessive, the barriers could be relaxed. Still, it is unlikely that it is meaningful to run IB on 32bit ARM or current MIPS machines, so the change is to make WITH_OFED to pass tinderbox.
Sponsored by: Mellanox Technologies Reviewed by: hselasky Differential revision: https://reviews.freebsd.org/D13329
MFC r303505: sdp: Use an mbufq for received control packets.
This is simpler than the hand-rolled queue, and fixes a use-after-free.
Sponsored by: EMC / Isilon Storage Division
MFC r303506: sdp: Destroy the PCB lock before freeing to the zone.
Sponsored by: EMC / Isilon Storage Division
MFC r303512: sdp: Use malloc(9) instead of the Linux compat layer.
SDP transmit and receive rings are always created in a sleepable context, so we can use M_WAITOK and remove error checks.
Sponsored by: EMC / Isilon Storage Division
MFC r303513: sdp: Destroy the RDMA ID after destroying the connection's queue pair.
This is the ordering documented by rdma_destroy_qp(). Also add a useful KASSERT to sdp_pcbfree().
Sponsored by: EMC / Isilon Storage Division
MFC r303646: ipoib: Bound the number of egress mbufs buffered during pathrec lookups.
In pathological situations where the master subnet manager becomes unresponsive for an extended period, we may otherwise end up queuing all of the system's mbufs while waiting for a response to a path record lookup.
This addresses the same issue as commit 1e85b806f9 in Linux.
Reviewed by: cem, ngie Sponsored by: EMC / Isilon Storage Division
MFC r329222: Import the mthca kernel side infiniband driver from Linux 4.9 and fix compilation under FreeBSD. The mthca driver was temporarily removed as part of the Linux 4.9 RoCE/infinband upgrade.
Top commit in Linux source tree: 69973b830859bc6529a7a0468ba0d80ee5117826
Sponsored by: Mellanox Technologies
MFC r320418. Note that the socket lock _is_ the same as so_rcv's lock in 11 and this is a no-op in this branch.
Sponsored by: Chelsio Communications
MFC r323082: cxgbe/iw_cxgbe: Set TCP_NODELAY before initiating connection so that t4_tom picks it up right away. This is less work than waiting for the connection to be established before applying the setting.
Sponsored by: Chelsio Communications |
329306 |
15-Feb-2018 |
hselasky |
MFC r325807: Make sure the ib_wr_opcode enum is signed by adding a negative dummy element. Different compilers may optimise the enum type in different ways. This ensures coherency when range checking the value of enums in ibcore.
Sponsored by: Mellanox Technologies |
325939 |
17-Nov-2017 |
hselasky |
MFC r325614: Multiple fixes for using IPv6 link-local addresses with RDMA in ibcore.
1) Fail to resolve RDMA address if rtalloc1() returns the loopback device, lo0, as the gateway interface. Currently RDMA loopback is not supported.
2) Use ip_dev_find() and ip6_dev_find() to lookup network interfaces with matching IPv4 and IPv6 addresses, respectivly.
3) In addr_resolve() make sure the "ifa" pointer is always set, also when the "ifp" is NULL. Else a NULL pointer access might happen trying to read from the "ifa" pointer later on.
4) In rdma_addr_find_dmac_by_grh() make sure the "bound_dev_if" field gets set properly instead of passing the scope ID through the IPv6 socket address structure. This is more in line with upstream OFED in Linux.
5) In rdma_addr_find_smac_by_sgid() there is no need to pass the scope ID for IPv6. Either it is stored in the "bound_dev_if" field or ip6_dev_find() will find the correct network device regardless of the scope ID.
Sponsored by: Mellanox Technologies |
325604 |
09-Nov-2017 |
hselasky |
MFC r324792: The remote DMA TCP portspace selector, RDMA_PS_TCP, is used for both iWarp and RoCE in ibcore. The selection of RDMA_PS_TCP can not be used to indicate iWarp protocol use. Backport the proper IB device capabilities from Linux upstream to distinguish between iWarp and RoCE. Only allocate the additional socket required for iWarp for RDMA IDs when at least one iWarp device present. This resolves interopability issues between iWarp and RoCE in ibcore
Reviewed by: np @ Differential Revision: https://reviews.freebsd.org/D12563 Sponsored by: Mellanox Technologies |
325600 |
09-Nov-2017 |
hselasky |
MFC r324492: Make sure the IPv6 scope ID gets zeroed inside the GID. Else searching for a valid GID entry based on IPv6 addresses can fail.
Sponsored by: Mellanox Technologies |
302408 |
08-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
298486 |
22-Apr-2016 |
hselasky |
More fixes for using IPv6 addresses with RDMA:
- Added check that the SCOPE ID is only restored for IPv6 linklocal addresses.
- Changes made by r237263 in the "cma_bind_addr()" function did not check if the socket address was of type IPv6 and used the IPv4 socket address for IPv6 addresses. This caused the function to fail. Fixed this.
- In the "rdma_gid2ip()" function and some other places the "sin6_len" and "sin6_scope_id" fields were not set for IPv6 socket addresses. Fixed this.
- The scope ID is not stored as part of the GID entries and must be passed as an argument to "rdma_gid2ip()".
- Added new method to "struct ib_device" which returns a pointer to the network interface which belongs to the given infiniband device. This is needed to be able to get the scope ID for IPv6 addresses via the associated ethernet interface.
- Added convenience function, "rdma_get_ipv6_scope_id()", to get the scope ID for IPv6 addresses.
- Implemented new "get_netdev" method for mlx4ib. Other IB controller drivers which want to support IPv6 addresses needs to implement this aswell.
- Bumped the FreeBSD version due to changing "struct ib_device".
Sponsored by: Mellanox Technologies MFC after: 1 week
|
294610 |
22-Jan-2016 |
np |
Fix for iWARP servers that listen on INADDR_ANY.
The iWARP Connection Manager (CM) on FreeBSD creates a TCP socket to represent an iWARP endpoint when the connection is over TCP. For servers the current approach is to invoke create_listen callback for each iWARP RNIC registered with the CM. This doesn't work too well for INADDR_ANY because a listen on any TCP socket already notifies all hardware TOEs/RNICs of the new listener. This patch fixes the server side of things for FreeBSD. We've tried to keep all these modifications in the iWARP/TCP specific parts of the OFED infrastructure as much as possible.
Submitted by: Krishnamraju Eraparaju @ Chelsio (with design inputs from Steve Wise) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4801
|
293310 |
07-Jan-2016 |
hselasky |
Remove unused file.
|
291249 |
24-Nov-2015 |
hselasky |
Add some defines needed by the coming mlx5 infiniband support.
Sponsored by: Mellanox Technologies MFC after: 1 week
|
278886 |
17-Feb-2015 |
hselasky |
Update the infiniband stack to Mellanox's OFED version 2.1.
Highlights: - Multiple verbs API updates - Support for RoCE, RDMA over ethernet
All hardware drivers depending on the common infiniband stack has been updated aswell.
Discussed with: np @ Sponsored by: Mellanox Technologies MFC after: 1 month
|
273135 |
15-Oct-2014 |
hselasky |
Update the OFED Linux compatibility layer and Mellanox hardware driver(s):
- Properly name an inclusion guard - Fix compile warnings regarding unsigned enums - Add two new sysctl nodes - Remove all empty linux header files - Make an error printout more verbose - Use "mod_delayed_work()" instead of cancelling and starting a timeout. - Implement more Linux scatterlist functions.
MFC after: 3 days Sponsored by: Mellanox Technologies
|
270710 |
27-Aug-2014 |
hselasky |
- Update the OFED Linux Emulation layer as a preparation for a hardware driver update from Mellanox Technologies. - Remove empty files from the OFED Linux Emulation layer. - Fix compile warnings related to printf() and the "%lld" and "%llx" format specifiers. - Add some missing 2-clause BSD copyrights. - Add "Mellanox Technologies, Ltd." to list of copyright holders. - Add some new compatibility files. - Fix order of uninit in the mlx4ib module to avoid crash at unload using the new module_exit_order() function.
MFC after: 1 week Sponsored by: Mellanox Technologies
|
263102 |
13-Mar-2014 |
glebius |
Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit interface, in the r241616 a crutch was provided. It didn't work well, and finally we decided that it is time to break ABI and simply make if_baudrate a 64-bit value. Meanwhile, the entire struct if_data was reviewed.
o Remove the if_baudrate_pf crutch.
o Make all fields of struct if_data fixed machine independent size. The notion of data (packet counters, etc) are by no means MD. And it is a bug that on amd64 we've got a 64-bit counters, while on i386 32-bit, which at modern speeds overflow within a second.
This also removes quite a lot of COMPAT_FREEBSD32 code.
o Give 16 bit for the ifi_datalen field. This field was provided to make future changes to if_data less ABI breaking. Unfortunately the 8 bit size of it had effectively limited sizeof if_data to 256 bytes.
o Give 32 bits to ifi_mtu and ifi_metric. o Give 64 bits to the rest of fields, since they are counters.
__FreeBSD_version bumped.
Discussed with: emax Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
256116 |
07-Oct-2013 |
dim |
Give an unnamed union in sys/ofed/include/rdma/ib_verbs.h a name, to silence a gcc warning.
Approved by: re (gjb) MFC after: 3 days
|
255972 |
01-Oct-2013 |
alfred |
Enable ib_dev.mmap function
Removed the ifdef linux from this function. Added stub function for contiguous pages to avoid compilation errors.
Submitted by: Orit Moskovich (oritm mellanox.com) Approved by: re
|
255932 |
29-Sep-2013 |
alfred |
Update OFED to Linux 3.7 and update Mellanox drivers.
Update the OFED Infiniband core to the version supplied in Linux version 3.7.
The update to OFED is nearly all additional defines and functions with the exception of the addition of additional parameters to ib_register_device() and the reg_user_mr callback.
In addition the ibcore (Infiniband core) and ipoib (IP over Infiniband) have both been made into completely loadable modules to facilitate testing of the OFED stack in FreeBSD.
Finally the Mellanox Infiniband drivers are now updated to the latest version shipping with Linux 3.7.
Submitted by: Mellanox FreeBSD driver team: Oded Shanoon (odeds mellanox.com), Meny Yossefi (menyy mellanox.com), Orit Moskovich (oritm mellanox.com)
Approved by: re
|
254122 |
09-Aug-2013 |
jeff |
- Reserve a special AF for SDP. The one we were incorrectly using before was taken by another AF.
Sponsored by: EMC / Isilon Storage Division
|
241697 |
18-Oct-2012 |
jhb |
Take advantage of if_baudrate_pf and calculate an effective baud rate on all platforms (not just amd64) to compute an equivalent IB rate.
|
237263 |
19-Jun-2012 |
np |
- Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs. These are available as t3_tom and t4_tom modules that augment cxgb(4) and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as usual with or without these extra features.
- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the works and will follow soon.
Build-tested with make universe.
30s overview ============ What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the capabilities of an interface: # ifconfig -m | grep TOE
Enable/disable TCP offload on an interface (just like any other ifnet capability): # ifconfig cxgbe0 toe # ifconfig cxgbe0 -toe
Which connections are offloaded? Look for toe4 and/or toe6 in the output of netstat and sockstat: # netstat -np tcp | grep toe # sockstat -46c | grep toe
Reviewed by: bz, gnn Sponsored by: Chelsio communications. MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)
|
219846 |
21-Mar-2011 |
kib |
Allow the ofed modules to be compiled on i386.
Reviewed by: jeff
|
219820 |
21-Mar-2011 |
jeff |
- Merge in OFED 1.5.3 from projects/ofed/head
|