#
367047 |
|
25-Oct-2020 |
rpokala |
MFC r366686:
Allow IP over IB to work with multiple FIBs.
Call M_SETFIB() to make sure the IPoIB packet is directed to the correct interface-specific FIB.
This was sufficient to allow general-purpose routing using the default FIB, and a separate FIB for routing between IPoIB on ib0 and IPoEthernet on mce0.
Reviewed by: hselasky Obtained from: Anmol Kumar <anmolk at panasas dot com> Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D26733
|
#
363151 |
|
13-Jul-2020 |
hselasky |
MFC r362953: Infiniband clients must be attached and detached in a specific order in ibcore.
Currently the linking order of the infiniband, IB, modules decide in which order the clients are attached and detached. For example one IB client may use resources from another IB client. This can lead to a potential deadlock at shutdown. For example if the ipoib is unregistered after the ib_multicast client is detached, then if ipoib is using multicast addresses a deadlock may happen, because ib_multicast will wait for all its resources to be freed before returning from the remove method.
Fix this by using module_xxx_order() instead of module_xxx().
Differential Revision: https://reviews.freebsd.org/D23973 Sponsored by: Mellanox Technologies
|
#
359122 |
|
19-Mar-2020 |
hselasky |
MFC r359014: Fix for double unlock in ipoib.
The ipoib_unicast_send() function is not supposed to unlock the priv lock.
Sponsored by: Mellanox Technologies
|
#
358932 |
|
13-Mar-2020 |
hselasky |
MFC r358694: Fix some whitespace issues in ipoib.
Sponsored by: Mellanox Technologies
|
#
353183 |
|
07-Oct-2019 |
hselasky |
MFC r352955: Make sure the transmit loop doesn't get starved in ipoib.
When the software send queue gets filled up, callbacks to if_transmit will stop. Make sure the transmit callback routine checks the send queue and outputs any remaining mbufs. Else the remaining mbufs may simply sit in the output queue blocking the transmit path.
Sponsored by: Mellanox Technologies
|
#
341887 |
|
12-Dec-2018 |
hselasky |
MFC r341536: ipoib: Don't do a light flush when MTU is unchanged.
When changing the MTU of ibX network interfaces, check that the MTU was really changed before requesting an update of the multicast rules. Else we might go into an infinite loop joining and leaving ibX multicast groups towards the opensm master interface.
Sponsored by: Mellanox Technologies
|
#
341885 |
|
12-Dec-2018 |
hselasky |
MFC r341535: ipoib: correct setting MTU from inside ipoib(4).
It is not enough to set ifnet->if_mtu to change the interface MTU. System saves the MTU for route in the radix tree, and route cache keeps the interface MTU as well. Since addition of the multicast group causes recalculation of MTU, even bringing the interface up changes MTU from 4042 to 1500, which makes the system configuration inconsistent. Worse, ip_output() prefers route MTU over interface MTU, so large packets are not fragmented and dropped on floor.
Fix it for ipoib(4) using the same approach (or hack) as was applied for it_tun/if_tap in r339012. Thanks to bz@ for giving the hint.
Submitted by: kib@ Sponsored by: Mellanox Technologies
|
#
338556 |
|
10-Sep-2018 |
hselasky |
MFC r338526: Implement get network interface by params function in ipoib.
Also fix the validate_ipv4_net_dev() and validate_ipv6_net_dev() functions which had source and destination addresses swapped, and didn't set the scope ID for IPv6 link-local addresses.
This allows applications like krping to work using IPoIB devices.
Sponsored by: Mellanox Technologies
|
#
337096 |
|
02-Aug-2018 |
hselasky |
MFC r336391: Use __FBSDID() for RCS tags in ibcore.
Sponsored by: Mellanox Technologies
|
#
332159 |
|
06-Apr-2018 |
brooks |
MFC r331648:
Improve copy-and-pasted versions of SIOCGIFADDR.
The original implementation used a reference to ifr_data and a cast to do the equivalent of accessing ifr_addr. This was copied multiple times since 1996.
Approved by: kib Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14873
|
#
331769 |
|
30-Mar-2018 |
hselasky |
MFC r303505, r303506, r303512, r303513, r303646, r320418, r323082, r326169, r326563, r326649, r326716, r326764, r326765 and r329222:
RoCE/infiniband/iWarp upgrade to Linux 4.9 for kernel and userspace. This commit merges projects/bsd_rdma_4_9 to 11-stable.
Compatibility wrappers have been made for existing 11-stable ibcore APIs, including ib_reg_phys_mr(). Refer to "sys/ofed/include/rdma/ib_verbs_compat.h" for more information.
The iw_cxgb driver has not been updated and has been disconnected from the build.
Sponsored by: Mellanox Technologies
MFC r326169 and r326563: RoCE/infiniband upgrade to Linux v4.9 for kernel and userspace.
List of kernel sources used: ============================
1) kernel sources were cloned from git://github.com/torvalds/linux.git Top commit 69973b830859bc6529a7a0468ba0d80ee5117826 - tag: v4.9, linux-4.9
2) krping was cloned from https://github.com/larrystevenwise/krping Top commit 292a2f1abf0348285e678a82264740d52e4dcfe4
List of userspace sources used: ===============================
1) rdma-core was cloned from https://github.com/linux-rdma/rdma-core.git Top commit d65138ef93af30b3ea249f3a84aa6a24ba7f8a75
2) OpenSM was cloned from git://git.openfabrics.org/~halr/opensm.git Top commit 85f841cf209f791c89a075048a907020e924528d
3) libibmad was cloned from git://git.openfabrics.org/~iraweiny/libibmad.git Tag 1.3.13 with some additional patches from Mellanox.
4) infiniband-diags was cloned from git://git.openfabrics.org/~iraweiny/infiniband-diags.git Tag 1.6.7 with some additional patches from Mellanox.
NOTES: ======
1) The mthca driver has been removed from userspace. 2) All GPLv2 only sources have been removed and where applicable rewritten from scratch under a BSD license. 3) List of fully supported drivers in userspace and kernel: a) iw_cxgbe (Chelsio) b) mlx4ib (Mellanox) c) mlx5ib (Mellanox) 4) WITH_OFED=YES is still required by make in order to build OFED userspace and kernel code. 5) Full support has been added for routable RoCE, RoCE v2.
MFC r326649: Disconnect OFED after r326169 broke all DIRDEPS support for it.
MFC r326716: Correctly define the unordered_map namespace in ofed/libibnetdisc .
This should fix ofed/libibnetdisc compilation with C-compilers different from clang and GCC v4.2.1.
Submitted by: kib Sponsored by: Mellanox Technologies
MFC r326764: ofed: Remove duplicated symbols from the version file.
ld.bfd accepts multiple listing of the same symbol in the version script. lld is stricter and errors out. Since arm64 and sometimes amd64 use lld, we should correct this cosmetic issue.
Sponsored by: Mellanox Technologies Reviewed by: hselasky Differential revision: https://reviews.freebsd.org/D13329
MFC r326765: ofed: Define barriers for mips and arm.
I used the strongest barriers available on the architectures, so if the future analysis show that it is excessive, the barriers could be relaxed. Still, it is unlikely that it is meaningful to run IB on 32bit ARM or current MIPS machines, so the change is to make WITH_OFED to pass tinderbox.
Sponsored by: Mellanox Technologies Reviewed by: hselasky Differential revision: https://reviews.freebsd.org/D13329
MFC r303505: sdp: Use an mbufq for received control packets.
This is simpler than the hand-rolled queue, and fixes a use-after-free.
Sponsored by: EMC / Isilon Storage Division
MFC r303506: sdp: Destroy the PCB lock before freeing to the zone.
Sponsored by: EMC / Isilon Storage Division
MFC r303512: sdp: Use malloc(9) instead of the Linux compat layer.
SDP transmit and receive rings are always created in a sleepable context, so we can use M_WAITOK and remove error checks.
Sponsored by: EMC / Isilon Storage Division
MFC r303513: sdp: Destroy the RDMA ID after destroying the connection's queue pair.
This is the ordering documented by rdma_destroy_qp(). Also add a useful KASSERT to sdp_pcbfree().
Sponsored by: EMC / Isilon Storage Division
MFC r303646: ipoib: Bound the number of egress mbufs buffered during pathrec lookups.
In pathological situations where the master subnet manager becomes unresponsive for an extended period, we may otherwise end up queuing all of the system's mbufs while waiting for a response to a path record lookup.
This addresses the same issue as commit 1e85b806f9 in Linux.
Reviewed by: cem, ngie Sponsored by: EMC / Isilon Storage Division
MFC r329222: Import the mthca kernel side infiniband driver from Linux 4.9 and fix compilation under FreeBSD. The mthca driver was temporarily removed as part of the Linux 4.9 RoCE/infinband upgrade.
Top commit in Linux source tree: 69973b830859bc6529a7a0468ba0d80ee5117826
Sponsored by: Mellanox Technologies
MFC r320418. Note that the socket lock _is_ the same as so_rcv's lock in 11 and this is a no-op in this branch.
Sponsored by: Chelsio Communications
MFC r323082: cxgbe/iw_cxgbe: Set TCP_NODELAY before initiating connection so that t4_tom picks it up right away. This is less work than waiting for the connection to be established before applying the setting.
Sponsored by: Chelsio Communications
|
#
302408 |
|
07-Jul-2016 |
gjb |
Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle. Prune svn:mergeinfo from the new branch, as nothing has been merged here.
Additional commits post-branch will follow.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
301229 |
|
02-Jun-2016 |
gnn |
Fix up the Infiniband code to handle the new arpresolve.
|
#
298046 |
|
15-Apr-2016 |
pfg |
ofed: for pointers replace 0 with NULL.
These are mostly cosmetical, no functional change.
Found with devel/coccinelle. Reviewed by: hselasky
|
#
296909 |
|
15-Mar-2016 |
hselasky |
Fix witness panic in the ipoib_ioctl() function when unloading the ipoib module.
The bpfdetach() function is trying to turn off promiscious mode on the network interface it is attached to while holding a mutex. The fix consists of ignoring any further calls to the ipoib_ioctl() function when the network interface is going to be detached. The ipoib_ioctl() function might sleep.
Sponsored by: Mellanox Technologies MFC after: 1 week
|
#
296688 |
|
11-Mar-2016 |
jhb |
Use SI_SUB_LAST instead of SI_SUB_SMP as the "catch-all" subsystem.
Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D5515
|
#
293544 |
|
09-Jan-2016 |
melifaro |
Finish r275196: do not dereference rtentry in if_output() routines.
The only piece of information that is required is rt_flags subset.
In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags to check if this particular mbuf needs to be dropped (and what error should be returned). Note that if_loop() will always return EHOSTUNREACH for "reject" routes regardless of RTF_HOST flag existence. This is due to upcoming routing changes where RTF_HOST value won't be available as lookup result.
All other functions require RTF_GATEWAY flag to check if they need to return EHOSTUNREACH instead of EHOSTDOWN error.
There are 11 places where non-zero 'struct route' is passed to if_output(). For most of the callers (forwarding, bpf, arp) does not care about exact error value. In fact, the only place where this result is propagated is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()).
Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary rte flags to ro_flags. Call this function in ip_output() after looking up/ verifying rte.
Reviewed by: ae
|
#
292978 |
|
31-Dec-2015 |
melifaro |
Implement interface link header precomputation API.
Add if_requestencap() interface method which is capable of calculating various link headers for given interface. Right now there is support for INET/INET6/ARP llheader calculation (IFENCAP_LL type request). Other types are planned to support more complex calculation (L2 multipath lagg nexthops, tunnel encap nexthops, etc..).
Reshape 'struct route' to be able to pass additional data (with is length) to prepend to mbuf.
These two changes permits routing code to pass pre-calculated nexthop data (like L2 header for route w/gateway) down to the stack eliminating the need for other lookups. It also brings us closer to more complex scenarios like transparently handling MPLS nexthops and tunnel interfaces. Last, but not least, it removes layering violation introduced by flowtable code (ro_lle) and simplifies handling of existing if_output consumers.
ARP/ND changes: Make arp/ndp stack pre-calculate link header upon installing/updating lle record. Interface link address change are handled by re-calculating headers for all lles based on if_lladdr event. After these changes, arpresolve()/nd6_resolve() returns full pre-calculated header for supported interfaces thus simplifying if_output(). Move these lookups to separate ether_resolve_addr() function which ether returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr() compat versions to return link addresses instead of pre-calculated data.
BPF changes: Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT. Despite the naming, both of there have ther header "complete". The only difference is that interface source mac has to be filled by OS for AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside BPF and not pollute if_output() routines. Convert BPF to pass prepend data via new 'struct route' mechanism. Note that it does not change non-optimized if_output(): ro_prepend handling is purely optional. Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI. It is not needed for ethernet anymore. The only remaining FDDI user is dev/pdq mostly untouched since 2007. FDDI support was eliminated from OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).
Flowtable changes: Flowtable violates layering by saving (and not correctly managing) rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated header data from that lle.
Differential Revision: https://reviews.freebsd.org/D4102
|
#
289749 |
|
22-Oct-2015 |
hselasky |
Rename linuxapi[.ko] into linuxkpi[.ko], to reflect that it is a kernel programming interface module, KPI, to avoid confusion with the existing Linux userspace binary compatibility shims. Bump the FreeBSD_version number.
Reviewed by: np @ Suggested by: dumbbell @ Sponsored by: Mellanox Technologies
|
#
287861 |
|
16-Sep-2015 |
melifaro |
Simplify the way of attaching IPv6 link-layer header.
Problem description: How do we currently perform layer 2 resolution and header imposition:
For IPv4 we have the following chain: ip_output() -> (ether|atm|whatever)_output() -> arpresolve()
Lookup is done in proper place (link-layer output routine) and it is possible to provide cached lle data.
For IPv6 situation is more complex: ip6_output() -> nd6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_storelladdr()
We have ip6_ouput() which calls nd6_output() instead of link output routine. nd6_output() does the following: * checks if lle exists, creates it if needed (similar to arpresolve()) * performes lle state transitions (similar to arpresolve()) * calls nd6_output_ifp() which pushes packets to link output routine along with running SeND/MAC hooks regardless of lle state (e.g. works as run-hooks placeholder).
After that, iface output routine like ether_output() calls nd6_storelladdr() which performs lle lookup once again.
As a result, we perform lookup twice for each outgoing packet for most types of interfaces. We also need to maintain runtime-checked table of 'nd6-free' interfaces (see nd6_need_cache()).
Fix this behavior by eliminating first ND lookup. To be more specific: * make all nd6_output() consumers use nd6_output_ifp() instead * rename nd6_output[_slow]() to nd6_resolve_[slow]() * convert nd6_resolve() and nd6_resolve_slow() to arpresolve() semantics, e.g. copy L2 address to buffer instead of pushing packet towards lower layers * Make all nd6_storelladdr() users use nd6_resolve() * eliminate nd6_storelladdr()
The resulting callchain is the following: ip6_output() -> nd6_output_ifp() -> (whatever)_output() -> nd6_resolve()
Error handling: Currently sending packet to non-existing la results in ip6_<output|forward> -> nd6_output() -> nd6_output _lle() which returns 0. In new scenario packet is propagated to <ether|whatever>_output() -> nd6_resolve() which will return EWOULDBLOCK, and that result will be converted to 0.
(And EWOULDBLOCK is actually used by IB/TOE code).
Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D1469
|
#
278886 |
|
17-Feb-2015 |
hselasky |
Update the infiniband stack to Mellanox's OFED version 2.1.
Highlights: - Multiple verbs API updates - Support for RoCE, RDMA over ethernet
All hardware drivers depending on the common infiniband stack has been updated aswell.
Discussed with: np @ Sponsored by: Mellanox Technologies MFC after: 1 month
|
#
277402 |
|
19-Jan-2015 |
hselasky |
Add missing linuxapi module dependencies and always use the FreeBSD "MODULE_VERSION" macro definition. Remove the redefinition of the "MODULE_VERSION" macro from the Linux kernel compatibility API.
MFC after: 1 month Reported by: np@ Sponsored by: Mellanox Technologies
|
#
277302 |
|
17-Jan-2015 |
hselasky |
Start importing the basic OFED linux compatibility layer changes made by dumbbell@ to be able to compile this layer as a dependency module. Clean up some Makefiles and remove the no longer used OFED define. Currently only i386 and amd64 targets are supported.
MFC after: 1 month Sponsored by: Mellanox Technologies
|
#
275196 |
|
27-Nov-2014 |
melifaro |
Do not return unlocked/unreferenced lle in arpresolve/nd6_storelladdr - return lle flags IFF needed. Do not pass rte to arpresolve - pass is_gateway flag instead.
|
#
272225 |
|
27-Sep-2014 |
glebius |
Mechanically convert to if_inc_counter().
|
#
272027 |
|
23-Sep-2014 |
hselasky |
Hardware driver update from Mellanox Technologies, including: - improved performance - better stability - new features - bugfixes
Supported HCAs: - ConnectX-2 - ConnectX-3 - ConnectX-3 Pro
Sponsored by: Mellanox Technologies MFC after: 1 week
|
#
270710 |
|
27-Aug-2014 |
hselasky |
- Update the OFED Linux Emulation layer as a preparation for a hardware driver update from Mellanox Technologies. - Remove empty files from the OFED Linux Emulation layer. - Fix compile warnings related to printf() and the "%lld" and "%llx" format specifiers. - Add some missing 2-clause BSD copyrights. - Add "Mellanox Technologies, Ltd." to list of copyright holders. - Add some new compatibility files. - Fix order of uninit in the mlx4ib module to avoid crash at unload using the new module_exit_order() function.
MFC after: 1 week Sponsored by: Mellanox Technologies
|
#
263102 |
|
13-Mar-2014 |
glebius |
Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit interface, in the r241616 a crutch was provided. It didn't work well, and finally we decided that it is time to break ABI and simply make if_baudrate a 64-bit value. Meanwhile, the entire struct if_data was reviewed.
o Remove the if_baudrate_pf crutch.
o Make all fields of struct if_data fixed machine independent size. The notion of data (packet counters, etc) are by no means MD. And it is a bug that on amd64 we've got a 64-bit counters, while on i386 32-bit, which at modern speeds overflow within a second.
This also removes quite a lot of COMPAT_FREEBSD32 code.
o Give 16 bit for the ifi_datalen field. This field was provided to make future changes to if_data less ABI breaking. Unfortunately the 8 bit size of it had effectively limited sizeof if_data to 256 bytes.
o Give 32 bits to ifi_mtu and ifi_metric. o Give 64 bits to the rest of fields, since they are counters.
__FreeBSD_version bumped.
Discussed with: emax Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
260870 |
|
18-Jan-2014 |
melifaro |
Simplify filling sockaddr_dl structure for if_resolvemulti() callback providers. link_init_sdl() function can be used to fill most of the parameters. Use caller stack instead of allocation / freing memory for each request. Do not drop support for extra-long (probably non-existing) link-layer protocols by introducing link_alloc_sdl() (used by if_resolvemulti() callback) and link_free_sdl() (used by caller). Since this change breaks KBI, MFC requires slightly different approach (link_init_sdl() auto-allocating buffer if necessary to handle cases with unmodified if_resolvemulti() callers).
MFC after: 2 weeks
|
#
255932 |
|
28-Sep-2013 |
alfred |
Update OFED to Linux 3.7 and update Mellanox drivers.
Update the OFED Infiniband core to the version supplied in Linux version 3.7.
The update to OFED is nearly all additional defines and functions with the exception of the addition of additional parameters to ib_register_device() and the reg_user_mr callback.
In addition the ibcore (Infiniband core) and ipoib (IP over Infiniband) have both been made into completely loadable modules to facilitate testing of the OFED stack in FreeBSD.
Finally the Mellanox Infiniband drivers are now updated to the latest version shipping with Linux 3.7.
Submitted by: Mellanox FreeBSD driver team: Oded Shanoon (odeds mellanox.com), Meny Yossefi (menyy mellanox.com), Orit Moskovich (oritm mellanox.com)
Approved by: re
|
#
254576 |
|
20-Aug-2013 |
jhb |
Stop an ipoib interface before detaching it.
PR: kern/181225 Submitted by: Shahar Klein Obtained from: Mellanox MFC after: 1 week
|
#
254523 |
|
19-Aug-2013 |
andre |
Add m_clrprotoflags() to clear protocol specific mbuf flags at up and downwards layer crossings.
Consistently use it within IP, IPv6 and ethernet protocols.
Discussed with: trociny, glebius
|
#
249976 |
|
27-Apr-2013 |
glebius |
Add const qualifier to the dst parameter of the ifnet if_output method.
|
#
243882 |
|
05-Dec-2012 |
glebius |
Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys.
Exceptions:
- sys/contrib not touched - sys/mbuf.h edited manually
|
#
241696 |
|
18-Oct-2012 |
jhb |
Use if_initbaudrate().
|
#
233870 |
|
04-Apr-2012 |
jhb |
Fix build on i386.
|
#
220555 |
|
11-Apr-2011 |
bz |
Even though this block is not compiled currently, properly assign CSUM_TSO to if_hwassist rather than if_capabilities to avoid future errors.
Reviewed by: jeff
|
#
219820 |
|
21-Mar-2011 |
jeff |
- Merge in OFED 1.5.3 from projects/ofed/head
|