History log of /freebsd-current/sys/dev/mlx5/mlx5_en/mlx5_en_rx.c
Revision Date Author Comments
# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 5dc00f00 19-Sep-2022 Justin Hibbits <jhibbits@FreeBSD.org>

Mechanically convert mlx5en(4) to IfAPI

Reviewed by: zlei
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38595


# caf32b26 14-Feb-2023 Gleb Smirnoff <glebius@FreeBSD.org>

pfil: add pfil_mem_{in,out}() and retire pfil_run_hooks()

The 0b70e3e78b0 changed the original design of a single entry point
into pfil(9) chains providing separate functions for the filtering
points that always provide mbufs and know the direction of a flow.
The motivation was to reduce branching. The logical continuation
would be to do the same for the filtering points that always provide
a memory pointer and retire the single entry point.

o Hooks now provide two functions: one for mbufs and optional for
memory pointers.
o pfil_hook_args() has a new member and pfil_add_hook() has a
requirement to zero out uninitialized data. Bump PFIL_VERSION.
o As it was before, a hook function for a memory pointer may realloc
into an mbuf. Such mbuf would be returned via a pointer that must
be provided in argument.
o The only hook that supports memory pointers is ipfw:default-link.
It is rewritten to provide two functions.
o All remaining uses of pfil_run_hooks() are converted to
pfil_mem_in().
o Transparent union of pfil_packet_t and tricks to fix pointer
alignment are retired. Internal pfil_realloc() reduces down to
m_devget() and thus is retired, too.

Reviewed by: mjg, ocochard
Differential revision: https://reviews.freebsd.org/D37977


# 7cc3ea9c 20-Sep-2022 Randall Stewart <rrs@FreeBSD.org>

mlx5 M_TSTMP accuracy looses quite a bit of precision so lets fix it.

The way that the clock is synchronized between the system and the current mlx5 for the purposes of the M_TSTMP
being carried we loose a lot of precision. Instead lets change the math that calculates this to separate out
the seconds/nanoseconds and operate on the two values so we don't get overflow instead of just
shifting the value down and loosing precision.

Reviewed by: kib, hselasky
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D36327


# cb276279 25-May-2022 Hans Petter Selasky <hselasky@FreeBSD.org>

mlx5en(4): Set the leaf network interface field in the mbuf packet header.

This will be used for TLS RX.

Submitted by: jhb@
Differential revision: https://reviews.freebsd.org/D32356
Sponsored by: NVIDIA Networking


# bc531a1f 16-Feb-2022 Hans Petter Selasky <hselasky@FreeBSD.org>

mlx5en: Improve CQE error debugging.

MFC after: 1 week
Sponsored by: NVIDIA Networking


# 84d7b8e7 01-Feb-2022 Hans Petter Selasky <hselasky@FreeBSD.org>

mlx5en: Implement TLS RX support.

TLS RX support is modeled after TLS TX support. The basic structures and layouts
are almost identical, except that the send tag created filters RX traffic and
not TX traffic.

The TLS RX tag keeps track of past TLS records up to a certain limit,
approximately 1 Gbyte of TCP data. TLS records of same length are joined
into a single database record.

Regularly the HW is queried for TLS RX progress information. The TCP sequence
number gotten from the HW is then matches against the database of TLS TCP
sequence number records and lengths. If a match is found a static params WQE
is queued on the IQ and the hardware should immediately resume decrypting TLS
data until the next non-sequential TCP packet arrives.

Offloading TLS RX data is supported for untagged, prio-tagged, and
regular VLAN traffic.

MFC after: 1 week
Sponsored by: NVIDIA Networking


# aabca103 01-Feb-2022 Hans Petter Selasky <hselasky@FreeBSD.org>

mlx5en: Properly account for no-checksum on tunneled packets.

MFC after: 1 week
Sponsored by: NVIDIA Networking


# 69426357 01-Feb-2022 Hans Petter Selasky <hselasky@FreeBSD.org>

mlx5en: Implement support for internal queues, IQ.

Internal send queues are regular sendqueues which are reserved for WQE commands
towards the hardware and firmware. These queues typically carry resync
information for ongoing TLS RX connections and when changing schedule queues
for rate limited connections.

The internal queue, IQ, code is more or less a stripped down copy
of the existing SQ managing code with exception of:

1) An optional single segment memory buffer which can be read or
written as a whole by the hardware, may be provided.
2) An optional completion callback for all transmit operations, may
be provided.
3) Does not support mbufs.

MFC after: 1 week
Sponsored by: NVIDIA Networking


# 2d5e5a0d 01-Feb-2022 Hans Petter Selasky <hselasky@FreeBSD.org>

mlx5en: Patch to inhibit transmit doorbell writes during packet reception.

During packet reception the network stack frequently transmit data in
response to TCP window updates. To reduce the number of transmit doorbells
needed, inhibit all transmit doorbells designated for the same channel until
after the reception of packets for the given channel is completed.

While at it slightly refactor the mlx5e_tx_notify_hw() function:

1) The doorbell information is always stored into sq->doorbell.d64 .
No need to pass a separate pointer to this variable.

2) Move checks for skipping doorbell writes inside this function.

MFC after: 1 week
Sponsored by: NVIDIA Networking


# 89918a23 14-Jun-2021 Konstantin Belousov <konstantinb@nvidia.com>

mlx5en: idiomatic use of preprocessor, in particular paths

MFC after: 1 week
Sponsored by: NVIDIA Networking


# b984b956 14-Jun-2021 Konstantin Belousov <konstantinb@nvidia.com>

mlx5en: normalize use of the opt_*.h files

MFC after: 1 week
Sponsored by: NVIDIA Networking


# 149349e0 05-Apr-2021 Konstantin Belousov <konstantinb@nvidia.com>

mlx5en: handle offloaded Rx checksums calculated for tunneled packets

Reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week


# f34f0a65 16-Nov-2020 Hans Petter Selasky <hselasky@FreeBSD.org>

Report EQE data upon CQ completion in mlx5core.

Report EQE data upon CQ completion to let upper layers use this data.

Linux commit:
4e0e2ea1886afe8c001971ff767f6670312a9b04

MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking


# 4d0e6d84 08-May-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Remove non-functional MLX5E_MAX_RX_SEGS macro in mlx5en(4).

MFC after: 3 days
Sponsored by: Mellanox Technologies


# 8b825a18 08-May-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Fix for compilation warning in mlx5en(4).

Function 'mlx5e_alloc_rx_wqe' can never be inlined because it uses alloca
(override using the always_inline attribute)

MFC after: 3 days
Sponsored by: Mellanox Technologies


# 945f3984 08-May-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Correct check for the calibration generation in mlx5en(4).

If generation is cleared due to hardware clock failure, check for it before
the divisor is used. Actually clear generation when failure occurs.

While there, stop doing the calculations inside the generation loop. Since
all members of mlx5e_clbr_point are used for calculations, get the
local copy of the structure and use it after generation stabilized.

Submitted by: kib@
MFC after: 3 days
Sponsored by: Mellanox Technologies


# a005c157 08-May-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Configure firmware to use RX hash format in mini CQE in mlx5en(4).

When using CQE zipping, one can choose between RX hash and Checksum.
This will indicate the parameter on which a zipping session should be
stopped.

While porting the Linux code, Checksum was chosen. However, the value
of Checksum is not being used anywhere.
For the FreeBSD driver, we prefer to use the RX hash format which will
guarantee the RX hash value for all the mini CQEs.
While at it, make sure to initialize the Checksum value in the
decompressed CQE.

Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies


# 423530be 08-May-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Add support for Dynamic Interrupt Moderation, DIM, in mlx5en(4).

Add support for DIM based on Linux,
with some minor adaptions specific to FreeBSD.

Linux commit
f97c3dc3c0e8d23a5c4357d182afeef4c67f5c33

MFC after: 3 days
Sponsored by: Mellanox Technologies


# 50575ce1 25-Apr-2019 Andrew Gallatin <gallatin@FreeBSD.org>

Track TCP connection's NUMA domain in the inpcb

Drivers can now pass up numa domain information via the
mbuf numa domain field. This information is then used
by TCP syncache_socket() to associate that information
with the inpcb. The domain information is then fed back
into transmitted mbufs in ip{6}_output(). This mechanism
is nearly identical to what is done to track RSS hash values
in the inp_flowid.

Follow on changes will use this information for lacp egress
port selection, binding TCP pacers to the appropriate NUMA
domain, etc.

Reviewed by: markj, kib, slavash, bz, scottl, jtl, tuexen
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20028


# 538ff57b 15-Apr-2019 Andrew Gallatin <gallatin@FreeBSD.org>

mlx5en: Enable new pfil(9) KPI ethernet filtering hooks

This allows efficient filtering at packet ingress on mlx5en.

Note that the packets are filtered (and potentially dropped) *before*
the driver has committed to (re)allocating an mbuf for the
packet. Dropped packets are treated essentially the same as an
error. Nothing is allocated, and the existing buffer is recycled. This
allows us to drop malicious packets at close to line rate with very
little CPU use.

Reviewed by: hselasky, slavash, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19063


# 01f02abf 05-Dec-2018 Slava Shwartsman <slavash@FreeBSD.org>

mlx5en: Count all transmitted and received bytes.

Add counter for all transmitted and received bytes. Currently only all
transmitted and received packets were counted. Fix description of RX LRO
counters while at it.

Submitted by: hselasky@
Approved by: hselasky (mentor)
MFC after: 1 week
Sponsored by: Mellanox Technologies


# 90c8e441 17-Jul-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Use a mbuf header instead of a mbuf cluster for debugging interrupts in mlx5en(4).

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 2f17f76a 17-Jul-2018 Hans Petter Selasky <hselasky@FreeBSD.org>

Handle jumbo frames without requiring big clusters in mlx5en(4).

The scatter list is formed by the chunks of MCLBYTES each, and larger
than default packets are returned to the stack as the mbuf chain.

Submitted by: kib@
MFC after: 1 week
Sponsored by: Mellanox Technologies


# e44f4f35 19-Dec-2017 Konstantin Belousov <kib@FreeBSD.org>

mlx5en: Avoid SFENCe on x86

The IA32 memory model guarantees that all writes are seen in the program
order. Also, any access to the uncacheable memory flushes the store
buffers. As the consequence, SFENCE instruction is (almost) never needed,
in particular, it is not needed to ensure the correct order of updates as
seen by a PCIe device.

Use atomic_thread_fence_rel() instead of wb() to only emit compiler barriers
on x86 there. Other architectures get the right barrier instruction as
well.

Reviewed by: hselasky
Sponsored by: Mellanox Technologies
MFC after: 1 week


# ef23f141 29-Nov-2017 Konstantin Belousov <kib@FreeBSD.org>

Implement hardware mlx5(4) rx timestamps.

Driver support is only provided for ConnectX4/5.

System-time timestamp is calculated based on the free-running counter
timestamp provided by hardware. Driver periodically samples the
counter to calibrate it against the system clock and uses linear
interpolation to convert. Stability of the crystal which drives the
clock is +-50 ppm at the operational temperature, which makes the
algorithm good enough.

The calculation is somewhat delicate because all values are 64bit and
overflow the naive formula for linear interpolation. The calculation
drops the least significant bits in advance, see the PREC shift in
mlx5_mbuf_tstmp().

Hardware stamps can be turned off by 'ifconfig mceN -hwrxtsmp'. Buggy
firmware might result in small but visible errors in the reported
timestamps, detectable e.g. by nonsensical (negative) RTT values for
LAN pings.

Reviewed by: gallatin, hselasky
Sponsored by: Mellanox Technologies
Differential revision: https://reviews.freebsd.org/D12638


# e5d6b589 01-Oct-2017 Hans Petter Selasky <hselasky@FreeBSD.org>

Make sure the doorbell lock is valid for the i386 version
of the mlx5en(4) driver.

Tested by: gallatin @
MFC after: 1 week
Sponsored by: Mellanox Technologies


# 8508e4d7 08-Aug-2017 Hans Petter Selasky <hselasky@FreeBSD.org>

Make sure the received IP header gets 32-bit aligned for short packets
in the mlx5en(4) driver.

MFC after: 1 week
Sponsored by: Mellanox Technologies


# 6f4cab6c 07-Nov-2016 Hans Petter Selasky <hselasky@FreeBSD.org>

Add timer to watch the RQ when we are out of mbufs.

The firmware/hardware does not generate additional completion
events unless we post new buffers. Use a timer to try to post
more buffers in case we are temporarily out of mbufs. Else
the receive schedule completely stops.

Sponsored by: Mellanox Technologies
MFC after: 1 week


# 57d5dd79 08-Aug-2016 Hans Petter Selasky <hselasky@FreeBSD.org>

Switch to the new block based LRO input function for the mlx5en
driver. This change significantly increases the overall RX aggregation
ratio for heavily loaded networks handling 10-80 thousand simultaneous
connections.

Remove the turbo LRO code and all references to it which has now been
superceeded by the tcp_lro_queue_mbuf() function.

Tested by: Netflix
Sponsored by: Mellanox Technologies
MFC after: 1 week


# 36ad8372 06-Jun-2016 Sepherosa Ziehau <sephe@FreeBSD.org>

net: Use M_HASHTYPE_OPAQUE_HASH if the mbuf flowid has hash properties

Reviewed by: hps, erj, tuexen
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D6688


# 6dd38b87 01-Apr-2016 Sepherosa Ziehau <sephe@FreeBSD.org>

tcp/lro: Use tcp_lro_flush_all in device drivers to avoid code duplication

And factor out tcp_lro_rx_done, which deduplicates the same logic with
netinet/tcp_lro.c

Reviewed by: gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5725


# 636d1fec 19-Jan-2016 Hans Petter Selasky <hselasky@FreeBSD.org>

Add clarifying comment about CQE zipping.

Reviewed by: gnn
Sponsored by: Mellanox Technologies
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D4940


# 1558d49b 19-Jan-2016 Hans Petter Selasky <hselasky@FreeBSD.org>

Declare local variables at top of function.

Reviewed by: gnn
Sponsored by: Mellanox Technologies
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D4939


# 90cc1c77 28-Dec-2015 Hans Petter Selasky <hselasky@FreeBSD.org>

Add support for CQE zipping. CQE zipping reduces PCI overhead by
coalescing and zipping multiple CQEs into a single merged CQE. The
feature is enabled by default and can be disabled by a sysctl.

Implementing this feature mlx5_cqwq_pop() has been separated from
mlx5e_get_cqe().

MFC after: 1 week
Submitted by: Mark Bloch <markb@mellanox.com>
Differential Revision: https://reviews.freebsd.org/D4598
Sponsored by: Mellanox Technologies


# 278ce1c9 06-Dec-2015 Hans Petter Selasky <hselasky@FreeBSD.org>

Add full support for Receive Side Scaling, RSS, to the mlx5en
driver. This includes binding all interrupt and worker threads
according to the RSS configuration, setting up correct Toeplitz
hashing keys as given by RSS and setting the correct mbuf
hashtype for all received traffic.

MFC after: 1 week
Sponsored by: Mellanox Technologies
Differential Revision: https://reviews.freebsd.org/D4410


# bb3853c6 19-Nov-2015 Hans Petter Selasky <hselasky@FreeBSD.org>

Style changes, mostly automated.

Differential Revision: https://reviews.freebsd.org/D4179
Submitted by: Daria Genzel <dariaz@mellanox.com>
Sponsored by: Mellanox Technologies
MFC after: 3 days


# dc7e38ac 09-Nov-2015 Hans Petter Selasky <hselasky@FreeBSD.org>

Add mlx5 and mlx5en driver(s) for ConnectX-4 and ConnectX-4LX cards
from Mellanox Technologies. The current driver supports ethernet
speeds up to and including 100 GBit/s. Infiniband support will be
done later.

The code added is not compiled by default, which will be done by a
separate commit.

Sponsored by: Mellanox Technologies
MFC after: 2 weeks