History log of /freebsd-current/sys/dev/cxgbe/tom/t4_tom.c
Revision Date Author Comments
# 64a00f87 31-Mar-2023 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Consolidate all mk_set_tcb_field_ulp in one place.

MFC after: 1 week
Sponsored by: Chelsio Communications


# c6c6d4af 15-Apr-2024 Navdeep Parhar <np@FreeBSD.org>

cxgbe/tom: Fix the rx channel selection in options2.

This affects TOE operation when multiple rx c-channels are in use for
offload, which is an unusual configuration.

MFC after: 1 week
Sponsored by: Chelsio Communications


# f76effed 28-Mar-2024 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Remove tx_modq lookup table.

The driver always uses the same modulation queue as the channel and the
table is unnecessary.

MFC after: 1 week
Sponsored by: Chelsio Communications


# eba13bbc 20-Mar-2024 John Baldwin <jhb@FreeBSD.org>

cxgbe: Support TCP_USE_DDP on offloaded TOE connections

When this socket option is enabled, relatively large contiguous
buffers are allocated and used to receive data from the remote
connection. When data is received a wrapper M_EXT mbuf is queued to
the socket's receive buffer. This reduces the length of the linked
list of received mbufs and allows consumers to consume receive data in
larger chunks.

To minimize reprogramming the page pods in the adapter, receive
buffers for a given connection are recycled. When a buffer has been
fully consumed by the receiver and freed, the buffer is placed on a
per-connection free buffers list.

The size of the receive buffers defaults to 256k and can be set via
the hw.cxgbe.toe.ddp_rcvbuf_len sysctl. The
hw.cxgbe.toe.ddp_rcvbuf_cache sysctl (defaults to 4) determines the
maximum number of free buffers cached per connection. Note that this
limit does not apply to "in-flight" receive buffers that are
associated with mbufs in the socket's receive buffer.

Co-authored-by: Navdeep Parhar <np@FreeBSD.org>
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44001


# a5a965d7 30-Jan-2024 John Baldwin <jhb@FreeBSD.org>

cxgbe tom: Enable ULP_MODE_TCPDDP on demand

Most ULP modes in cxgbe's TOE are enabled on the fly when a protocol
is needed (e.g. ULP_MODE_ISCSI is enabled by cxgbei when offloading a
connection using iSCSI, and ULP_MODE_TLS is enabled when RX TLS keys
are programmed for a TOE connection). The one exception to this is
ULP_MODE_TCPDDP.

Currently the cxgbe driver enables ULP_MODE_TCPDDP when a TOE
connection is first created. However, since DDP connections cannot be
converted to other connection types, this requires some special
handling in the driver. For example, iSCSI daemons use the SO_NO_DDP
socket option to ensure TOE connections use ULP_MODE_NONE so they can
be converted to ULP_MODE_ISCSI. Similarly, using TLS receive offload
(ULP_MODE_TLS) requires disabling TCP DDP for new connections by
default.

This commit changes cxgbe to instead switch a connection from
ULP_MODE_NONE to ULP_MODE_TCPDDP when a connection first attempts to
use TCP DDP via aio_read(2). This permits connections to always start
as ULP_MODE_NONE and switch to a protocol-specific mode as needed.

Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D43670


# 8cb9b68f 09-Jan-2024 John Baldwin <jhb@FreeBSD.org>

sys: Use mbufq_empty instead of comparing mbufq_len against 0

Reviewed by: bz, emaste
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D43338


# dc485b96 22-Aug-2023 Marius Strobl <marius@FreeBSD.org>

tcp_info: Add and export more FreeBSD-specific fields

This change adds struct tcp_info fields corresponding to the following
struct tcpcb ones:
- snd_una
- snd_max
- rcv_numsacks
- rcv_adv
- dupacks

Note that while both tcp_fill_info() and fill_tcp_info_from_tcb() are
extended accordingly, no counterpart of rcv_numsacks is available in
the cxgbe(4) TOE PCB, though.

Sponsored by: NetApp, Inc. (originally)


# 8c6104c4 22-Aug-2023 Marius Strobl <marius@FreeBSD.org>

tcp_fill_info(): Change lock assertion on INPCB to locked only

This function actually only ever reads from the TCP PCB. Consequently,
also make the pointer to its TCP PCB parameter const.

Sponsored by: NetApp, Inc. (originally)


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# c255d1a4 23-Jan-2023 Justin Hibbits <jhibbits@FreeBSD.org>

IfAPI: Add if_llsoftc member accessors for TOEDEV

Summary:
Keep TOEDEV() macro for backwards compatibility, and add a SETTOEDEV()
macro to complement with the new accessors.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D38199


# 2ff447ee 15-Nov-2022 John Baldwin <jhb@FreeBSD.org>

cxgbe: Enable TOE TLS RX when an RX key is provided via setsockopt().

Rather than requiring a socket to be created as a TLS socket from the
get go, switch a TOE socket from "plain" TOE to TLS mode when a
receive key is added to the socket.

The firmware is only able to switch a "plain" TOE connection to TLS
mode if the head of the pending socket data is the start of a TLS
record, so the connection is migrated to TLS mode as a multi-step
process.

When TOE TLS RX is enabled, the associated connection's receive side
is frozen via a flag in the TCB. The state of the socket buffer is
then examined to determine if the pending data in the socket buffer
ends on a TLS record boundary. If so, the connection is migrated to
TLS mode and unfrozen. Otherwise, the connection is unfrozen
temporarily until more data arrives. Once more data arrives, the
receive queue is frozen again and rechecked. This continues until the
connection is paused at a record boundary. Any records received
before TLS mode is enabled are decrypted as software records.

Note that this removes the 'rx_tls_ports' sysctl. TOE TLS offload for
receive is now enabled automatically on existing TOE connections when
using a KTLS-aware SSL library just as it was previously enabled
automatically for TLS transmit. This also enables TLS offload for TOE
connections which enable TLS after passing initial data in the clear
(e.g. STARTTLS with SMTP).

Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D37351


# 9eb0e832 08-Nov-2022 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: provide macros to access inpcb and socket from a tcpcb

There should be no functional changes with this commit.

Reviewed by: rscheff
Differential revision: https://reviews.freebsd.org/D37123


# e1401f75 19-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

cxgbe: use standard sototcpcb() accessor macro to get socket's tcpcb

Reviewed by: np
Differential revision: https://reviews.freebsd.org/D37041


# 8d2c1393 28-Sep-2022 Navdeep Parhar <np@FreeBSD.org>

cxgbe/tom: Fix assertions in the code that maintains TCB history.

The tids used for TOE connections start from tid_base, not 0.

MFC after: 1 week
Sponsored by: Chelsio Communications


# e7d02be1 17-Aug-2022 Gleb Smirnoff <glebius@FreeBSD.org>

protosw: refactor protosw and domain static declaration and load

o Assert that every protosw has pr_attach. Now this structure is
only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw. This was suggested
in 1996 by wollman@ (see 7b187005d18ef), and later reiterated
in 2006 by rwatson@ (see 6fbb9cf860dcd).
o Make struct domain hold a variable sized array of protosw pointers.
For most protocols these pointers are initialized statically.
Those domains that may have loadable protocols have spacers. IPv4
and IPv6 have 8 spacers each (andre@ dff3237ee54ea).
o For inetsw and inet6sw leave a comment noting that many protosw
entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c

Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36232


# b483b6b2 04-May-2022 John Baldwin <jhb@FreeBSD.org>

cxgbe tom: Force unsigned modulus for queue indices.

The final transmit and receive queue indices need to be positive
values. However, since txq_idx and rxq_idx are signed (to permit
using -1 to as a marker for uninitialized values), using %= with
another integer type (vi->nofld[tr]xq) yielded a sign-extended modulus
value. This resulted in negative queue indices and a buffer underrun
when arc4random() returned a value with the sign bit set. Use a
temporary unsigned variable to hold the "raw" queue index to force
unsigned modulus.

This worked previously because the modulus was previously applied
directly to the return value of arc4random() which is unsigned before
the result was assigned to txq_idx and rxq_idx.

Discussed with: np
Fixes: db28d4a0cd1c cxgbe/t4_tom: Support for round-robin selection of offload queues.
Sponsored by: Chelsio Communications


# db28d4a0 14-Apr-2022 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: Support for round-robin selection of offload queues.

A COP (Connection Offload Policy) rule can now specify that the tx
and/or rx queue for a new tid should be selected in a round-robin
manner. There is no change in default behavior.

Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D34921


# 1ecbc1d8 14-Sep-2021 John Baldwin <jhb@FreeBSD.org>

cxgbe tom: Don't queue AIO requests on listen sockets.

This is similar to the fixes in 141fe2dceeae. One difference is that
TOE sockets do not change states (listen vs non-listen) once created,
so no lock is needed for SOLISTENING().

Sponsored by: Chelsio Communications


# 2eb0e53a 12-Aug-2021 John Baldwin <jhb@FreeBSD.org>

cxgbei: Wait for the final CPL to be received in icl_cxgbei_conn_close.

A socket in the FIN_WAIT_1 state is marked disconnected by
do_close_con_rpl() even though there might still receive data pending.
This is because the socket at that point has set SBS_CANTRCVMORE which
causes the protocol layer to discard any data received before the FIN.
However, icl_cxgbei_conn_close needs to wait until all the data has
been discarded. Replace the wait for SS_ISDISCONNECTED with instead
waiting for final_cpl_received() to be called.

Reported by: Jithesh Arakkan @ Chelsio
Sponsored by: Chelsio Communications


# ec8004dd 24-Jun-2021 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Do not configure traffic classes automatically on attach.

The driver used to configure all available classes with some default
parameters on attach and the rest of t4_sched.c was written with the
assumption that all traffic classes are always valid in the hardware.
But this resulted in a lot of informational messages being logged in the
firmware's circular log, crowding out other more useful messages.

This change leaves the tx scheduler alone during attach to reduce the
spam in the devlog. The state of every class is now tracked separately
from its flags and there is support for an 'uninitialized' state.

MFC after: 2 weeks
Sponsored by: Chelsio Communications


# 6beb67c7 21-Jun-2021 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Get the number of usable traffic classes from the firmware.

Recent firmwares are able to utilize the traffic classes of tx channels
that were previously unused. This effectively doubles the number of
traffic classes available per port for 2 port cards. Stop using the raw
per-channel value in the driver and ask the firmware for the number of
usable traffic classes instead.

MFC after: 2 weeks
Sponsored by: Chelsio Communications


# 789f2d4b 28-May-2021 John Baldwin <jhb@FreeBSD.org>

cxgbe tom: Remove support for non-KTLS TLS offload.

TOE TLS offload was first supported via a customized OpenSSL developed
by Chelsio with proprietary socket options prior to KTLS being present
either in FreeBSD or upstream OpenSSL. With the addition of KTLS in
both places, cxgbe's TOE driver was extended to support TLS offload
via KTLS as well. This change removes the older interface leaving
only the KTLS bindings for TOE TLS.

Since KTLS was added to TOE TLS second, it was somehat shoe-horned
into the existing code. In addition to removing the non-KTLS TLS
offload, refactor and simplify the code to assume KTLS, e.g. not
copying keys into a helper structure that mimic'ed the non-KTLS mode,
but using the KTLS session object directly when constructing key
contexts.

This also removes some unused code to send TX keys inline in work
requests for TOE TLS. This code was never enabled, and was arguably
sending the wrong thing (it was not sending the raw key context as we
do for NIC TLS when using inline keys).

Sponsored by: Chelsio Communications


# 677cb972 20-May-2021 John Baldwin <jhb@FreeBSD.org>

cxgbe tom: Free pending iSCSI mbufs on connection shutdown.

If an iSCSI connection is shutdown abruptly (e.g. by a RST from the
peer), pending iSCSI PDUs and page pod work requests can be in the
ulp_pduq when the final CPL is received indicating the death of the
connection.

Reported by: Jithesh Arakkan @ Chelsio


# 24b98f28 23-May-2021 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Overhaul CLIP (Compressed Local IPv6) table management.

- Process the list of local IPs once instead of once per adapter. Add
addresses from all VNETs to the driver's list but leave hardware
updates for later when the global VNET/IFADDR list locks have been
released.

- Add address to the hardware table synchronously when a CLIP entry is
requested for an address that's not already in there.

- Provide ioctls that allow userspace tools to manage addresses in the
CLIP table.

- Add a knob (hw.cxgbe.clip_db_auto) that controls whether local IPs are
automatically added to the CLIP table or not.

MFC after: 2 weeks
Sponsored by: Chelsio Communications


# 557c4521 13-Apr-2021 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: Implement tod_pmtu_update.

tod_pmtu_update was added to the kernel in 01d74fe1ffc.

Sponsored by: Chelsio Communications


# 53948932 29-Mar-2021 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: restore socket's protosw before entering TIME_WAIT.

This fixes a panic due to stale so->so_proto if t4_tom is unloaded and
one or more connections that were previously offloaded are still around
in TIME_WAIT state.

Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29503


# 7ac8040a 19-Feb-2021 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Use firmware commands to get/set filter configuration.

1. Query the firmware for filter mode, mask, and related ingress config
instead of trying to figure them out from hardware registers. Read
configuration from the registers only when the firmware does not
support this query.

2. Use the firmware to set the filter mode. This is the correct way to
do it and is more flexible as well. The filter mode (and associated
ingress config) can now be changed any time it is safe to do so.

The user can specify a subset of a valid mode and the driver will
enable enough bits to make sure that the mode is maxed out -- that
is, it is not possible to set another bit without exceeding the
total width for optional filter fields. This is a hardware
requirement that was not enforced by the driver previously.

MFC after: 2 weeks
Sponsored by: Chelsio Communications


# 0082e479 03-Dec-2020 John Baldwin <jhb@FreeBSD.org>

Clear TLS offload mode if a TLS socket hangs without receiving data.

By default, if a TOE TLS socket stops receiving data for more than 5
seconds, revert the connection back to plain TOE mode. This provides
a fallback if the userland SSL library does not support KTLS. In
addition, for client TLS 1.3 sockets using connect(), the TOE socket
blocks before the handshake has completed since the socket option is
only invoked for the final handshake.

The timeout defaults to 5 seconds, but can be changed at boot via the
hw.cxgbe.toe.tls_rx_timeout tunable or for an individual interface via
the dev.<nexus>.toe.tls_rx_timeout sysctl.

Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27470


# b3ceca0c 10-Nov-2020 John Baldwin <jhb@FreeBSD.org>

Clear tp->tod in t4_pcb_detach().

Otherwise, a socket can have a non-NULL tp->tod while TF_TOE is clear.
In particular, if a newly accepted socket falls back to non-TOE due to
an active open failure, the non-TOE socket will still have tp->tod set
even though TF_TOE is clear.

Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27028


# 6b7ecdcd 19-Oct-2020 John Baldwin <jhb@FreeBSD.org>

Re-enable receive flow control for TOE TLS sockets.

Flow control was disabled during initial TOE TLS development to
workaround a hang (and to match the Linux TOE TLS support for T6).
The rest of the TOE TLS code maintained credits as if flow control was
enabled which was inherited from before the workaround was added with
the exception that the receive window was allowed to go negative.
This negative receive window handling (rcv_over) was because I hadn't
realized the full implications of disabling flow control.

To clean this up, re-enable flow control on TOE TLS sockets. The
existing TPF_FORCE_CREDITS workaround is sufficient for the original
hang. Now that flow control is enabled, remove the rcv_over
workaround and instead assert that the receive window never goes
negative matching plain TCP TOE sockets.

Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26799


# 7c228be3 25-Jun-2020 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Add a pointer to the adapter softc in vi_info.

There were quite a few places where port_info was being accessed only to
get to the adapter.

Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25432


# 8cce4145 27-Apr-2020 John Baldwin <jhb@FreeBSD.org>

Add support for KTLS RX over TOE to T6.

This largely reuses the TLS TOE support added in r330884. However,
this uses the KTLS framework in upstream OpenSSL rather than requiring
Chelsio-specific patches to OpenSSL. As with the existing TLS TOE
support, use of RX offload requires setting the tls_rx_ports sysctl.

Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24453


# f1f93475 27-Apr-2020 John Baldwin <jhb@FreeBSD.org>

Initial support for kernel offload of TLS receive.

- Add a new TCP_RXTLS_ENABLE socket option to set the encryption and
authentication algorithms and keys as well as the initial sequence
number.

- When reading from a socket using KTLS receive, applications must use
recvmsg(). Each successful call to recvmsg() will return a single
TLS record. A new TCP control message, TLS_GET_RECORD, will contain
the TLS record header of the decrypted record. The regular message
buffer passed to recvmsg() will receive the decrypted payload. This
is similar to the interface used by Linux's KTLS RX except that
Linux does not return the full TLS header in the control message.

- Add plumbing to the TOE KTLS interface to request either transmit
or receive KTLS sessions.

- When a socket is using receive KTLS, redirect reads from
soreceive_stream() into soreceive_generic().

- Note that this interface is currently only defined for TLS 1.1 and
1.2, though I believe we will be able to reuse the same interface
and structures for 1.3.


# f3b6d8ad 15-Apr-2020 John Baldwin <jhb@FreeBSD.org>

Clear CPL_GET_TCB_RPL handler on module unload.

This fixes a panic when unloading and reloading t4_tom.ko since the
old pointer is still stored when t4_tom_load tries to set it.

Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24358


# 7ba6f549 06-Mar-2020 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: Do not uninitialize a toepcb that has not been initialized.

This fixes the following panic:
--- trap 0xc, rip = 0xffffffff80c00411, rsp = 0xfffffe0025192840, rbp = 0xfffffe0025192860 ---
vmem_xfree() at vmem_xfree+0xd1/frame 0xfffffe0025192860
tls_uninit_toep() at tls_uninit_toep+0x78/frame 0xfffffe0025192880
free_toepcb() at free_toepcb+0x32/frame 0xfffffe00251928a0
t4_connect() at t4_connect+0x3be/frame 0xfffffe0025192950
tcp_offload_connect() at tcp_offload_connect+0xa4/frame 0xfffffe0025192990
tcp_usr_connect() at tcp_usr_connect+0xec/frame 0xfffffe00251929f0
soconnect() at soconnect+0xae/frame 0xfffffe0025192a30
kern_connectat() at kern_connectat+0xe2/frame 0xfffffe0025192a90
sys_connect() at sys_connect+0x75/frame 0xfffffe0025192ad0
amd64_syscall() at amd64_syscall+0x137/frame 0xfffffe0025192bf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0025192bf0
--- syscall (98, FreeBSD ELF64, sys_connect), rip = 0x8008e9d8a, rsp = 0x7fffffffc0f8, rbp = 0x7fffffffc130 ---

Reviewed by: jhb@
MFC after: 3 days
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D23989


# 334fc582 08-Jan-2020 Bjoern A. Zeeb <bz@FreeBSD.org>

vnet: virtualise more network stack sysctls.

Virtualise tcp_always_keepalive, TCP and UDP log_in_vain. All three are
set in the netoptions startup script, which we would love to run for VNETs
as well [1].

While virtualising the log_in_vain sysctls seems pointles at first for as
long as the kernel message buffer is not virtualised, it at least allows
an administrator to debug the base system or an individual jail if needed
without turning the logging on for all jails running on a system.

PR: 243193 [1]
MFC after: 2 weeks


# 866a7f28 22-Oct-2019 John Baldwin <jhb@FreeBSD.org>

Always allocate the atid table during attach.

Previously the table was allocated on first use by TOE and the
ratelimit code. The forthcoming NIC KTLS code also uses this table.
Allocate it unconditionally during attach to simplify consumers.

Reviewed by: np
Differential Revision: https://reviews.freebsd.org/D22028


# 4f13842f 08-Oct-2019 John Baldwin <jhb@FreeBSD.org>

Add support for KTLS in the Chelsio TOE module.

This adds a TOE hook to allocate a KTLS session. It also recognizes
TLS mbufs in the socket buffer and sends those to the NIC using a TLS
work request to encrypt the record before segmenting it.

TOE TLS support must be enabled via the dev.t6nex.<N>.tls sysctl in
addition to enabling KTLS.

Reviewed by: np, gallatin
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21891


# c537e887 26-Aug-2019 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: Initialize all TOE connection parameters in one place.
Remove now-redundant items from toepcb and synq_entry and the code to
support them.

Let the driver calculate tx_align, rx_coalesce, and sndbuf by default.

Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21387


# 74a155ed 28-Jun-2019 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: the AIO tx job queue must be empty by the time the driver
releases the offload resources associated with the tid.

Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20798


# b7acf27c 27-Jun-2019 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: Fix regression in t_maxseg usage within t4_tom.

t_maxseg was changed in r293284 to not have any adjustment for TCP
timestamps. t4_tom inadvertently went back to pre-r293284 semantics
in r332506.

Sponsored by: Chelsio Communications


# 35c0026f 30-May-2019 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: Do not attempt to look up entries in the TCB history if
it hasn't been initialized.

This fixes a bug in r346570 that could cause a panic when servicing
TCP_INFO for offloaded connections.

MFC after: 3 days
Sponsored by: Chelsio Communications


# 61e02298 22-Apr-2019 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: Add a "TCB history" feature that samples hardware state
for a tid and maintains a running history of some interesting events.

Service TCP_INFO queries from the history when the tid is being tracked
there.


# edb518f4 20-Mar-2019 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Treat the viid as an opaque identifier.

Recent firmwares prefer to use a different format for viid internally
and this change allows them to do so.

MFC after: 1 week
Sponsored by: Chelsio Communications


# b156a400 18-Dec-2018 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: fixes for issues on the passive open side.

- Fix PR 227760 by getting the TOE to respond to the SYN after the call
to toe_syncache_add, not during it. The kernel syncache code calls
syncache_respond just before syncache_insert. If the ACK to the
syncache_respond is processed in another thread it may run before the
syncache_insert and won't find the entry. Note that this affects only
t4_tom because it's the only driver trying to insert and expand
syncache entries from different threads.

- Do not leak resources if an embryonic connection terminates at
SYN_RCVD because of L2 lookup failures.

- Retire lctx->synq and associated code because there is never a need to
walk the list of embryonic connections associated with a listener.
The per-tid state is still called a synq entry in the driver even
though the synq itself is now gone.

PR: 227760
MFC after: 2 weeks
Sponsored by: Chelsio Communications


# 78afed13 28-Nov-2018 John Baldwin <jhb@FreeBSD.org>

Move CLIP table handling out of TOM and into the base driver.

- Store the clip table in 'struct adapter' instead of in the TOM softc.
- Init the clip table during attach and teardown during detach.
- While here, add a dev.<nexus>.<unit>.misc.clip sysctl to dump the
CLIP table.

This does mean that we update the clip table even if TOE is not enabled,
but non-TOE things need the CLIP table anyway.

Reviewed by: np, Krishnamraju Eraparaju @ Chelsio
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D18010


# bc13c69b 15-Nov-2018 John Baldwin <jhb@FreeBSD.org>

Move the TLS key map into the adapter softc so non-TOE code can use it.

Sponsored by: Chelsio Communications


# 24bc8671 21-Aug-2018 Navdeep Parhar <np@FreeBSD.org>

cxgbe/tom: Make sure 'matched' is always initialized before use.

Reported by: Coverity (CID 1390894)
MFC after: 1 week
Sponsored by: Chelsio Communications


# 7576fe76 20-Aug-2018 Navdeep Parhar <np@FreeBSD.org>

cxgbe/tom: Provide the hardware tid in tcp_info.

Submitted by: marius@


# 72049e73 17-Aug-2018 Navdeep Parhar <np@FreeBSD.org>

cxgbe/tom: Put the ifnet or VLAN's PCP value in the 802.1Q tag of frames
generated by the TOE. Works with vid 0 (no VLAN, just priority) too.

MFC after: 1 week
Sponsored by: Chelsio Communications


# 5fc0f72f 09-Aug-2018 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Add support for high priority filters on T6+. They have their
own region in the TCAM starting with T6, unlike previous chips where
they were in the same region as normal filters.

These filters "hit" before anything else in the LE's lookup. The exact
order is:
a) High priority filters
b) TOE's active region (TCAM and/or hash)
c) Servers (TOE hw listeners)
d) Normal filters

MFC after: 1 week
Sponsored by: Chelsio Communications


# 1979b511 06-Aug-2018 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Allow user-configured and driver-configured traffic classes to
be used simultaneously. Move sysctl_tc and sysctl_tc_params to
t4_sched.c while here.

MFC after: 3 weeks
Sponsored by: Chelsio Communications


# d7c5a620 18-May-2018 Matt Macy <mmacy@FreeBSD.org>

ifnet: Replace if_addr_lock rwlock with epoch + mutex

Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32
4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32
4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32
4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32
4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32
4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32
4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32

After the patch

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51
5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51
5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51
5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51
5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52
5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15366


# b3daa684 14-May-2018 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Filtering related features and fixes.

- Driver support for hardware NAT.
- Driver support for swapmac action.
- Validate a request to create a hashfilter against the filter mask.
- Add a hashfilter config file for T5.

Sponsored by: Chelsio Communications


# 4535e804 30-Apr-2018 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Use opaque cookies or tid range-checks to determine the
intended recipient of a CPL when it can't be determined solely from the
opcode. Retire the per-queue handlers for such CPLs in favor of the new
scheme.

Sponsored by: Chelsio Communications


# 8896672a 26-Apr-2018 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Move release_tid to the base NIC driver for future consumers.

Sponsored by: Chelsio Communications.


# 3747c1ff 26-Apr-2018 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Break up alloc_tid_tabs and move the atid routines to the base
NIC driver. The atid services will be used by new features (hashfilters
and inline TLS) that do not involve TOE.

Sponsored by: Chelsio Communications


# 8aa1c1d8 19-Apr-2018 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Fix bugs in the handling of COP rules that match on VLAN tag.

Retrieve the tag from the correct ifnet and use the provided tag
(instead of hardcoded 0xffff, implying no tag) in the routines that
process offload policy.

Submitted by: Krishnamraju Eraparaju @ Chelsio
Sponsored by: Chelsio Communications


# 1131c927 14-Apr-2018 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Add support for Connection Offload Policy (aka COP).

COP allows fine-grained control on whether to offload a TCP connection
using t4_tom, and what settings to apply to a connection selected for
offload. t4_tom must still be loaded and IFCAP_TOE must still be
enabled for full TCP offload to take place on an interface. The
difference is that IFCAP_TOE used to be the only knob and would enable
TOE for all new connections on the inteface, but now the driver will
also consult the COP, if any, before offloading to the hardware TOE.

A policy is a plain text file with any number of rules, one per line.
Each rule has a "match" part consisting of a socket-type (L = listen,
A = active open, P = passive open, D = don't care) and a pcap-filter(7)
expression, and a "settings" part that specifies whether to offload the
connection or not and the parameters to use if so. The general format
of a rule is: [socket-type] expr => settings

Example. See cxgbetool(8) for more information.
[L] ip && port http => offload
[L] port 443 => !offload
[L] port ssh => offload
[P] src net 192.168/16 && dst port ssh => offload !nagle !timestamp cong newreno
[P] dst port ssh => offload !nagle ecn cong tahoe
[P] dst port http => offload
[A] dst port 443 => offload tls
[A] dst net 192.168/16 => offload !timestamp cong highspeed

The driver processes the rules for each new listen, active open, or
passive open and stops at the first match. There is an implicit rule at
the end of every policy that prohibits offload when no rule in the
policy matches:
[D] all => !offload

This is a reworked and expanded version of a patch submitted by
Krishnamraju Eraparaju @ Chelsio.

Sponsored by: Chelsio Communications


# f8fea0d9 02-Apr-2018 Navdeep Parhar <np@FreeBSD.org>

cxgbe: Implement tcp_info handler for connections handled by t4_tom.

The TCB is read using a memory window right now. A better alternate to
get self-consistent, uncached information would be to use a GET_TCB
request but waiting for a reply from hw while holding non-sleepable
locks is quite inconvenient.

Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14817


# edf95feb 27-Mar-2018 John Baldwin <jhb@FreeBSD.org>

Use the offload transmit queue to set flags on TLS connections.

Requests to modify the state of TLS connections need to be sent on the
same queue as TLS record transmit requests to ensure ordering.

However, in order to use the offload transmit queue in t4_set_tcb_field(),
the function needs to be updated to do proper flow control / credit
management when queueing a request to an offload queue. This required
passing a pointer to the toepcb itself to this function, so while here
remove the 'tid' and 'iqid' parameters and obtain those values from the
toepcb in t4_set_tcb_field() itself.

Submitted by: Harsh Jain @ Chelsio (original version)
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14871


# 1e9538d2 13-Mar-2018 John Baldwin <jhb@FreeBSD.org>

Support for TLS offload of TOE connections on T6 adapters.

The TOE engine in Chelsio T6 adapters supports offloading of TLS
encryption and TCP segmentation for offloaded connections. Sockets
using TLS are required to use a set of custom socket options to upload
RX and TX keys to the NIC and to enable RX processing. Currently
these socket options are implemented as TCP options in the vendor
specific range. A patched OpenSSL library will be made available in a
port / package for use with the TLS TOE support.

TOE sockets can either offload both transmit and reception of TLS
records or just transmit. TLS offload (both RX and TX) is enabled by
setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be
enabled on the relevant interface. Transmit offload can be used on
any "normal" or TLS TOE socket by using the custom socket option to
program a transmit key. This permits most TOE sockets to
transparently offload TLS when applications use a patched SSL library
(e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL
library). Receive offload can only be used with TOE sockets using the
TLS mode. The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a
list of TCP port numbers. Any connection with either a local or
remote port number in that list will be created as a TLS socket rather
than a plain TOE socket. Note that although this sysctl accepts an
arbitrary list of port numbers, the sysctl(8) tool is only able to set
sysctl nodes to a single value. A TLS socket will hang without
receiving data if used by an application that is not using a patched
SSL library. Thus, the tls_rx_ports node should be used with care.
For a server mostly concerned with offloading TLS transmit, this node
is not needed as plain TOE sockets will fall back to software crypto
when using an unpatched SSL library.

New per-interface statistics nodes are added giving counts of TLS
packets and payload bytes (payload bytes do not include TLS headers or
authentication tags/MACs) offloaded via the TOE engine, e.g.:

dev.cc.0.stats.rx_tls_octets: 149
dev.cc.0.stats.rx_tls_records: 13
dev.cc.0.stats.tx_tls_octets: 26501823
dev.cc.0.stats.tx_tls_records: 1620

TLS transmit work requests are constructed by a new variant of
t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.

TLS transmit work requests require a buffer containing IVs. If the
IVs are too large to fit into the work request, a separate buffer is
allocated when constructing a work request. This buffer is associated
with the transmit descriptor and freed when the descriptor is ACKed by
the adapter.

Received TLS frames use two new CPL messages. The first message is a
CPL_TLS_DATA containing the decryped payload of a single TLS record.
The handler places the mbuf containing the received payload on an
mbufq in the TOE pcb. The second message is a CPL_RX_TLS_CMP message
which includes a copy of the TLS header and indicates if there were
any errors. The handler for this message places the TLS header into
the socket buffer followed by the saved mbuf with the payload data.
Both of these handlers are contained in tom/t4_tls.c.

A few routines were exposed from t4_cpl_io.c for use by t4_tls.c
including send_rx_credits(), a new send_rx_modulate(), and
t4_close_conn().

TLS keys for both transmit and receive are stored in onboard memory
in the NIC in the "TLS keys" memory region.

In some cases a TLS socket can hang with pending data available in the
NIC that is not delivered to the host. As a workaround, TLS sockets
are more aggressive about sending CPL_RX_DATA_ACK messages anytime that
any data is read from a TLS socket. In addition, a fallback timer will
periodically send CPL_RX_DATA_ACK messages to the NIC for connections
that are still in the handshake phase. Once the connection has
finished the handshake and programmed RX keys via the socket option,
the timer is stopped.

A new function select_ulp_mode() is used to determine what sub-mode a
given TOE socket should use (plain TOE, DDP, or TLS). The existing
set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and
handles initialization of TLS-specific state when necessary in
addition to DDP-specific state.

Since TLS sockets do not receive individual TCP segments but always
receive full TLS records, they can receive more data than is available
in the current window (e.g. if a 16k TLS record is received but the
socket buffer is itself 16k). To cope with this, just drop the window
to 0 when this happens, but track the overage and "eat" the overage as
it is read from the socket buffer not opening the window (or adding
rx_credits) for the overage bytes.

Reviewed by: np (earlier version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14529


# 9689995d 13-Mar-2018 John Baldwin <jhb@FreeBSD.org>

Simplify error handling in t4_tom.ko module loading.

- Change t4_ddp_mod_load() to return void instead of always returning
success. This avoids having to pretend to have proper support for
unloading when only part of t4_tom_mod_load() has run.
- If t4_register_uld() fails, don't invoke t4_tom_mod_unload() directly.
The module handling code in the kernel invokes MOD_UNLOAD on a module
whose MOD_LOAD fails with an error already.

Reviewed by: np (part of a larger patch)
MFC after: 1 month
Sponsored by: Chelsio Communications


# 125d42fe 21-Feb-2018 John Baldwin <jhb@FreeBSD.org>

Move DDP PCB state into a helper structure.

This consolidates all of the DDP state in one place. Also, the code has
now been fixed to ensure that DDP state is only accessed for DDP
connections. This should not be a functional change but makes it cleaner
and easier to add state for other TOE socket modes in the future.

MFC after: 1 month
Sponsored by: Chelsio Communications


# f1798531 30-Jan-2018 John Baldwin <jhb@FreeBSD.org>

Export tcp_always_keepalive for use by the Chelsio TOM module.

This used to work by accident with ld.bfd even though always_keepalive
was marked as static. LLD honors static more correctly, so export this
variable properly (including moving it into the tcp_* namespace).

Reviewed by: bz, emaste
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D14129


# 718cf2cc 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/dev: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 0a3bf7fb 31-Aug-2017 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: There may not be a tid to update if the connection isn't
established.

MFC after: 2 weeks
Sponsored by: Chelsio Communications


# 67904997 05-May-2017 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: Per-connection rate limiting for TCP sockets handled by
the TOE. For now this capability is always enabled in kernels with
options RATELIMIT. t4_tom will check if_capenable once the base driver
gets code to support rate limiting for any socket (TOE or not).

This was tested with iperf3 and netperf ToT as they already support
SO_MAX_PACING_RATE sockopt. There is a bug in firmwares prior to
1.16.45.0 that affects the BSD driver only and results in rate-limiting
at an incorrect rate. This will resolve by itself as soon as 1.16.45.0
or later firmware shows up in the driver.

Relnotes: Yes
Sponsored by: Chelsio Communications


# eaf56694 06-Feb-2017 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: Fix CLIP entry refcounting on the passive side. Every
IPv6 connection being handled by the TOE should have a reference on its
CLIP entry.

Sponsored by: Chelsio Communications


# fbb779cc 27-Jan-2017 John Baldwin <jhb@FreeBSD.org>

Unregister CPL handlers for TOE-related messages when unloading TOM.

MFC after: 1 week
Sponsored by: Chelsio Communications


# b6791501 27-Jan-2017 John Baldwin <jhb@FreeBSD.org>

Don't drop a reference to the TOE PCB in undo_offload_socket().

undo_offload_socket() is only called by t4_connect() during a connection
setup failure, but t4_connect() still owns the TOE PCB and frees ita
after undo_offload_socket() returns. Release a reference in
undo_offload_socket() resulted in a double-free which panicked when
t4_connect() performed the second free. The reference release was
added to undo_offload_socket() incorrectly in r299210.

MFC after: 1 week
Sponsored by: Chelsio Communications


# a342904b 11-Jan-2017 Navdeep Parhar <np@FreeBSD.org>

cxgbe/tom: Add VIMAGE support to the TOE driver.

Active Open:
- Save the socket's vnet at the time of the active open (t4_connect) and
switch to it when processing the reply (do_act_open_rpl or
do_act_establish).

Passive Open:
- Save the listening socket's vnet in the driver's listen_ctx and switch
to it when processing incoming SYNs for the socket.
- Reject SYNs that arrive on an ifnet that's not in the same vnet as the
listening socket.

CLIP (Compressed Local IPv6) table:
- Add only those IPv6 addresses to the CLIP that are in a vnet
associated with one of the card's ifnets.

Misc:
- Set vnet from the toepcb when processing TCP state transitions.
- The kernel sets the vnet when calling the driver's output routine
so t4_push_frames runs in proper vnet context already. One exception
is when incoming credits trigger tx within the driver's ithread. Set
the vnet explicitly in do_fw4_ack for that case.

MFC after: 3 days
Sponsored by: Chelsio Communications


# d663d5ca 07-Jan-2017 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: Fix tid accounting. An offloaded IPv6 connection uses 2
tids, not 1, in the hardware.

MFC after: 3 days
Sponsored by: Chelsio Communications


# b0c554c3 17-Sep-2016 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: The SMAC entry for a VI is at a different location in the T6.

Sponsored by: Chelsio Communications


# 968267fd 31-Aug-2016 Navdeep Parhar <np@FreeBSD.org>

cxgbe/t4_tom: Add general purpose routines to deal with page pod regions
and allocations within them. Switch to these routines to manage the TOE
DDP region.

Sponsored by: Chelsio Communications


# 07159830 27-Jul-2016 John Baldwin <jhb@FreeBSD.org>

Add support for zero-copy aio_write() on TOE sockets.

AIO write requests for a TOE socket on a Chelsio T4+ adapter can now
DMA directly from the user-supplied buffer. This is implemented by
wiring the pages backing the user-supplied buffer and queueing special
mbufs backed by raw VM pages to the socket buffer. The TOE code
recognizes these special mbufs and builds a sglist from the VM page
array associated with the mbuf when queueing a work request to the TOE.

Because these mbufs do not have an associated virtual address, m_data
is not valid. Thus, the AIO handler does not invoke sosend() directly
for these mbufs but instead inlines portions of sosend_generic() and
tcp_usr_send().

An aiotx_buffer structure is used to describe the user buffer (e.g.
it holds the array of VM pages and a reference to the AIO job). The
special mbufs reference this structure via m_ext. Note that a single
job might be split across multiple mbufs (e.g. if it is larger than
the socket buffer size). The 'ext_arg2' member of each mbuf gives an
offset relative to the backing aiotx_buffer. The AIO job associated
with an aiotx_buffer structure is completed when the last reference to
the structure is released.

Zero-copy aio_write()'s for connections associated with a given
adapter can be enabled/disabled at runtime via the
'dev.t[45]nex.N.toe.tx_zcopy' sysctl.

MFC after: 1 month
Relnotes: yes
Sponsored by: Chelsio Communications


# 671bf2b8 04-Jul-2016 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Changes to the CPL-handler registration mechanism and code
related to "shared" CPLs.

a) Combine t4_set_tcb_field and t4_set_tcb_field_rpl into a single
function. Allow callers to direct the response to any iq. Tidy up
set_ulp_mode_iscsi while there to use names from t4_tcb.h instead of
magic constants.

b) Remove all CPL handler tables from struct adapter. This reduces its
size by around 2KB. All handlers are now registered at MOD_LOAD instead
of attach or some kind of initialization/activation. The registration
functions do not need an adapter parameter any more.

c) Add per-iq handlers to deal with CPLs whose destination cannot be
determined solely from the opcode. There are 2 such CPLs in use right
now: SET_TCB_RPL and L2T_WRITE_RPL. The base driver continues to send
filter and L2T_WRITEs over the mgmtq and solicits the reply on fwq.
t4_tom (including the DDP code) now uses the port's ctrlq to send
L2T_WRITEs and SET_TCB_FIELDs and solicits the reply on an ofld_rxq.
fwq and ofld_rxq have different handlers that know what kind of tid to
expect in the reply. Update t4_write_l2e and callers to to support any
wrq/iq combination.

Approved by: re@ (kib@)
Sponsored by: Chelsio Communications


# dc964385 06-May-2016 John Baldwin <jhb@FreeBSD.org>

Use DDP to implement zerocopy TCP receive with aio_read().

Chelsio's TCP offload engine supports direct DMA of received TCP payload
into wired user buffers. This feature is known as Direct-Data Placement.
However, to scale well the adapter needs to prepare buffers for DDP
before data arrives. aio_read() is more amenable to this requirement than
read() as applications often call read() only after data is available in
the socket buffer.

When DDP is enabled, TOE sockets use the recently added pru_aio_queue
protocol hook to claim aio_read(2) requests instead of letting them use
the default AIO socket logic. The DDP feature supports scheduling DMA
to two buffers at a time so that the second buffer is ready for use
after the first buffer is filled. The aio/DDP code optimizes the case
of an application ping-ponging between two buffers (similar to the
zero-copy bpf(4) code) by keeping the two most recently used AIO buffers
wired. If a buffer is reused, the aio/DDP code is able to reuse the
vm_page_t array as well as page pod mappings (a kind of MMU mapping the
Chelsio NIC uses to describe user buffers). The generation of the
vmspace of the calling process is used in conjunction with the user
buffer's address and length to determine if a user buffer matches a
previously used buffer. If an application queues a buffer for AIO that
does not match a previously used buffer then the least recently used
buffer is unwired before the new buffer is wired. This ensures that no
more than two user buffers per socket are ever wired.

Note that this feature is best suited to applications sending a steady
stream of data vs short bursts of traffic.

Discussed with: np
Relnotes: yes
Sponsored by: Chelsio Communications


# b66bb393 22-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

Cleanup redundant parenthesis from existing howmany()/roundup() macro uses.


# f353ae1c 26-Jan-2016 Gleb Smirnoff <glebius@FreeBSD.org>

More fixes to the build.


# 9eb533d3 25-Dec-2015 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Updates to the base NIC driver and t4_tom to support the iSCSI
offload driver. These changes come from projects/cxl_iscsi.


# fe2ebb76 02-Dec-2015 John Baldwin <jhb@FreeBSD.org>

Add support for configuring additional virtual interfaces (VIs) on a port.

Each virtual interface has its own MAC address, queues, and statistics.
The dedicated netmap interfaces (ncxgbeX / ncxlX) were already implemented
as additional VIs on each port. This change allows additional non-netmap
interfaces to be configured on each port. Additional virtual interfaces
use the naming scheme vcxgbeX or vcxlX.

Additional VIs are enabled by setting the hw.cxgbe.num_vis tunable to a
value greater than 1 before loading the cxgbe(4) or cxl(4) driver.
NB: The first VI on each port is the "main" interface (cxgbeX or cxlX).

T4/T5 NICs provide a limited number of MAC addresses for each physical port.
As a result, a maximum of six VIs can be configured on each port (including
the "main" interface and the netmap interface when netmap is enabled).

One user-visible result is that when netmap is enabled, packets received
or transmitted via the netmap interface are no longer counted in the stats
for the "main" interface, but are not accounted to the netmap interface.

The netmap interfaces now also have a new-bus device and export various
information sysctl nodes via dev.n(cxgbe|cxl).X.

The cxgbetool 'clearstats' command clears the stats for all VIs on the
specified port along with the port's stats. There is currently no way to
clear the stats of an individual VI.

Reviewed by: np
MFC after: 1 month
Sponsored by: Chelsio


# 0b563d54 12-Nov-2015 Navdeep Parhar <np@FreeBSD.org>

cxgbe/tom: consider all tx credits returned when the final CPL for
a tid is received and purge ulp_pdu_reclaimq.


# 87c824d3 04-Nov-2015 Navdeep Parhar <np@FreeBSD.org>

cxgbe/tom: redo the TOM bits that support the iSCSI driver.

- There is no reason to have a special case for iSCSI in t4_rcvd. Either there
is data in the socket buffer (from when the connection was plain TOE, before
being promoted to ulp_mode iSCSI) and sbused _should_ be taken into account,
or sbused is 0 and doesn't affect the calculation of rx_credits.

- write_tx_wr doesn't need special handling for iSCSI either. Its caller should
specify the ulp_submode.

- Replace t4_ulp_push_frames with t4_push_pdus that can deal with PDUs in an
mbufq hanging off the toepcb. This eliminates the "backwards" calls from
t4_tom's tx into the iSCSI driver.

- The iSCSI driver installs a handler for RX_ISCSI_DDP already and the iSCSI
handler for RX_DATA_DDP is identical to the one for RX_ISCSI_DDP. Take
advantage of this to eliminate the last remaining "backwards" call from
do_rx_data_ddp into the iSCSI driver.

- Eliminate the CXGBE_ISCSI_MBUF_TAG abomination.
- For tx, it makes no sense to allocate an mbuf tag just to stash 2 bits worth
of information. Use a spare byte from the mbuf header instead.
- For rx, the per-connection ulpcb is a more natural place to keep information
about the PDU currently being assembled.


# cc0a3c8c 29-Jul-2015 Andrey V. Elsukov <ae@FreeBSD.org>

Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.

Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by: gnn (previous version)
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3149


# b3d44a68 08-Feb-2015 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): tidy up some of the interaction between the Upper Layer
Drivers (ULDs) and the base if_cxgbe driver.

Track the per-adapter activation of ULDs in a new "active_ulds" field.
This was done pretty arbitrarily before this change -- via TOM_INIT_DONE
in adapter->flags for TOM, and the (1 << MAX_NPORTS) bit in
adapter->offload_map for iWARP.

iWARP and hw-accelerated iSCSI rely on the TOE (supported by the TOM
ULD). The rules are:
a) If the iWARP and/or iSCSI ULDs are available when TOE is enabled then
iWARP and/or iSCSI are enabled too.
b) When the iWARP and iSCSI modules are loaded they go looking for
adapters with TOE enabled and enable themselves on that adapter.
c) You cannot deactivate or unload the TOM module from underneath iWARP
or iSCSI. Any such attempt will fail with EBUSY.

MFC after: 2 weeks


# 19abdd06 07-Oct-2014 Navdeep Parhar <np@FreeBSD.org>

cxgbe/tom: don't leak resources tied to an active open request that
cannot be sent to the chip because a prerequisite L2 resolution
failed.

Submitted by: Hariprasad at chelsio dot com (original version)
MFC after: 2 weeks.


# c3322cb9 28-Oct-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# eb227282 08-Sep-2013 Navdeep Parhar <np@FreeBSD.org>

Rework the tx credit mechanism between the cxgbe/tom driver
and the card. This helps smooth out some burstiness in the
exchange.

Approved by: re (glebius)


# 319a31ea 28-Aug-2013 Navdeep Parhar <np@FreeBSD.org>

Change t4_list_lock and t4_uld_list_lock from mutexes to sx'es.

- tom_uninit had to be reworked not to hold the adapter lock (a mutex)
around t4_deactivate_uld, which acquires the uld_list_lock.
- the ifc_match for the interface cloner that creates the tracer ifnet
had to be reworked as the kernel calls ifc_match with the global
if_cloners_mtx held.


# 50ce3d40 04-Jul-2013 Navdeep Parhar <np@FreeBSD.org>

Pay attention to TCP_NODELAY when it's set/unset after the connection
is established.

MFC after: 1 day


# c337fa30 04-Jul-2013 Navdeep Parhar <np@FreeBSD.org>

- Read all TP parameters in one place.
- Read the filter mode, calculate various shifts, and use them
properly during active open (in select_ntuple).

MFC after: 1 day


# dd181b26 18-Apr-2013 Navdeep Parhar <np@FreeBSD.org>

cxgbe/tom: Update the CLIP table on the chip when there are changes
to the list of IPv6 addresses on the system. The table is used for
TOE+IPv6 only.


# d14b0ac1 29-Mar-2013 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Add support for Chelsio's Terminator 5 (aka T5) ASIC. This
includes support for the NIC and TOE features of the 40G, 10G, and
1G/100M cards based on the T5.

The ASIC is mostly backward compatible with the Terminator 4 so cxgbe(4)
has been updated instead of writing a brand new driver. T5 cards will
show up as cxl (short for cxlgb) ports attached to the t5nex bus driver.

Sponsored by: Chelsio


# dfd1b3a0 25-Jan-2013 Navdeep Parhar <np@FreeBSD.org>

Add a couple of missing error codes. Treat CPL_ERR_KEEPALV_NEG_ADVICE as
negative advice and not a fatal error.

MFC after: 3 days


# 87aa6825 15-Jan-2013 Navdeep Parhar <np@FreeBSD.org>

cxgbe/tom: Basic CLIP table management.

This is the Compressed Local IPv6 table on the chip. To save space, the
chip uses an index into this table instead of a full IPv6 address in
some of its hardware data structures.

For now the driver fills this table with all the local IPv6 addresses
that it sees at the time the table is initialized. I'll improve this
later so that the table is updated whenever new IPv6 addresses are
configured or existing ones deleted.

MFC after: 1 week


# 7f441ef2 14-Jan-2013 Navdeep Parhar <np@FreeBSD.org>

cxgbe/tom: Miscellaneous updates for TOE+IPv6 support (more to follow).

- Teach find_best_mtu_idx() to deal with IPv6 endpoints.

- Install correct protosw in offloaded TCP/IPv6 sockets when DDP is
enabled.

- Move set_tcp_ddp_ulp_mode to t4_tom.c so that t4_tom.h can be included
without having to drag in t4_msg.h too. This was bothering the iWARP
driver for some reason.

MFC after: 1 week


# c66c36a4 10-Jan-2013 Navdeep Parhar <np@FreeBSD.org>

Overhaul the stid allocator so that it can be used for IPv6 servers
too. The entry for an IPv6 server in the TCAM takes up the equivalent
of two ordinary stids and must be properly aligned too.

MFC after: 1 week


# b174b658 10-Jan-2013 Navdeep Parhar <np@FreeBSD.org>

cxgbe(4): Add functions to help synchronize "slow" operations (those not
on the fast data path) and use them instead of frobbing the adapter lock
and busy flag directly.

Other changes made while reworking all slow operations:
- Wait for the reply to a filter request (add/delete). This guarantees
that the operation is complete by the time the ioctl returns.
- Tidy up the tid_info structure.
- Do not allow the tx queue size to be set to something that's not a
power of 2.

MFC after: 1 week


# 9823d527 10-Oct-2012 Kevin Lo <kevlo@FreeBSD.org>

Revert previous commit...

Pointyhat to: kevlo (myself)


# a10cee30 09-Oct-2012 Kevin Lo <kevlo@FreeBSD.org>

Prefer NULL over 0 for pointers


# c91bcaaa 21-Aug-2012 Navdeep Parhar <np@FreeBSD.org>

Minor cleanup: use bitwise ops instead of pointless wrappers around
setbit/clrbit.


# e682d02e 16-Aug-2012 Navdeep Parhar <np@FreeBSD.org>

Support for TCP DDP (Direct Data Placement) in the T4 TOE module.

Basically, this is automatic rx zero copy when feasible. TCP payload is
DMA'd directly into the userspace buffer described by the uio submitted
in soreceive by an application.

- Works with sockets that are being handled by the TCP offload engine
of a T4 chip (you need t4_tom.ko module loaded after cxgbe, and an
"ifconfig +toe" on the cxgbe interface).
- Does not require any modification to the application.
- Not enabled by default. Use hw.t4nex.<X>.toe.ddp="1" to enable it.


# 09fe6320 19-Jun-2012 Navdeep Parhar <np@FreeBSD.org>

- Updated TOE support in the kernel.

- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)