History log of /freebsd-11-stable/sys/dev/cxgbe/tom/t4_tom.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 355242 30-Nov-2019 np

MFC r349500:

cxgbe/t4_tom: Fix regression in t_maxseg usage within t4_tom.

t_maxseg was changed in r293284 to not have any adjustment for TCP
timestamps. t4_tom inadvertently went back to pre-r293284 semantics
in r332506.

Sponsored by: Chelsio Communications


# 346970 30-Apr-2019 np

MFC r342208:

cxgbe/t4_tom: fixes for issues on the passive open side.

- Fix PR 227760 by getting the TOE to respond to the SYN after the call
to toe_syncache_add, not during it. The kernel syncache code calls
syncache_respond just before syncache_insert. If the ACK to the
syncache_respond is processed in another thread it may run before the
syncache_insert and won't find the entry. Note that this affects only
t4_tom because it's the only driver trying to insert and expand
syncache entries from different threads.

- Do not leak resources if an embryonic connection terminates at
SYN_RCVD because of L2 lookup failures.

- Retire lctx->synq and associated code because there is never a need to
walk the list of embryonic connections associated with a listener.
The per-tid state is still called a synq entry in the driver even
though the synq itself is now gone.

PR: 227760
Sponsored by: Chelsio Communications


# 346967 30-Apr-2019 np

MFC r345334:

cxgbe(4): Treat the viid as an opaque identifier.

Recent firmwares prefer to use a different format for viid internally
and this change allows them to do so.

Sponsored by: Chelsio Communications


# 346934 29-Apr-2019 np

MFC r341172, r341270.
t4_clip.c had to be manually adjusted because Concurrency Kit is not
available in stable/11.

r341172:
Move CLIP table handling out of TOM and into the base driver.

- Store the clip table in 'struct adapter' instead of in the TOM softc.
- Init the clip table during attach and teardown during detach.
- While here, add a dev.<nexus>.<unit>.misc.clip sysctl to dump the
CLIP table.

This does mean that we update the clip table even if TOE is not enabled,
but non-TOE things need the CLIP table anyway.

Reviewed by: np, Krishnamraju Eraparaju @ Chelsio
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D18010

r341270:
Make most of the CLIP code conditional on #ifdef INET6.

This fixes builds of kernels without INET6 such as LINT-NOINET6.

Reported by: arybchik
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D18384


# 346882 29-Apr-2019 np

MFC r338156, r338158-r338161, r338166.

r338156:
cxgbe(4): Avoid overflow while calculating channel rate.

Reported by: Coverity (CID 1008352)

r338158:
cxgbe(4): Check the RO bit properly before disabling relaxed ordering.

Reported by: Coverity (CID 1384286)

r338159:
cxgbe(4): Make it clear that VI_INIT_DONE implies vi->ntxq > 0, and so
rc will never be returned uninitialized.

Reported by: Coverity (CID 1394884). This is a false positive though.

r338160:
cxgbe(4): Do not leak memory in case of errors during VI initialization.

Reported by: Coverity (CID 1392026)

r338161:
cxgbe/tom: Make sure 'matched' is always initialized before use.

Reported by: Coverity (CID 1390894)

r338166:
cxgbe(4): Be explicit about ignoring the return value of cmpset in some
cases.

Reported by: Coverity (CIDs 1009398, 1009400, 1009401, 1357325, 1394783). All false positives.


# 346874 29-Apr-2019 np

MFC r337538, r337987

r337538:
cxgbe(4): Add support for high priority filters on T6+. They have their
own region in the TCAM starting with T6, unlike previous chips where
they were in the same region as normal filters.

These filters "hit" before anything else in the LE's lookup. The exact
order is:
a) High priority filters
b) TOE's active region (TCAM and/or hash)
c) Servers (TOE hw listeners)
d) Normal filters

Sponsored by: Chelsio Communications

r337987:
cxgbe(4): Adjust ntids to account for nhptids in the TOE case too.
This should have been part of r337538.


# 346871 29-Apr-2019 np

MFC r336718, r336720, r336734-r336735, r337398, r337439, and r337540.
These are all related to tx rate limiting in cxgbe.

r336718:
cxgbe(4): Validate only those parameters that are relevant to the
type of rate limiter being programmed. Skip the ones that are not
applicable.

Sponsored by: Chelsio Communications

r336720:
cxgbe(4): Remove useless code that crept in with r336718.

X-MFC With: 336718

r336734:
cxgbe(4): Better defaults for all cl-rl rate limiters.

Start in "class" instead of "flow" mode. This eliminates the need to
specify an MTU, which is not available that early anyway. It also
allows the user to manually configure ch-rl rate limiting after attach.
This used to fail because ch-rl isn't supported if cl-rl "flow" mode is
configured.

Set all traffic classes to 1Gbps during initialization. The goal is to
start off with _any_ valid configuration and 1Gbps works even for
gigabit cards.

Sponsored by: Chelsio Communications

r336735:
cxgbe(4): Consider rateunit before ratemode when displaying information
about a traffic class. This matches the order in which the firmware
evaluates unit and mode internally.

Sponsored by: Chelsio Communications

r337398:
cxgbe(4): Allow user-configured and driver-configured traffic classes to
be used simultaneously. Move sysctl_tc and sysctl_tc_params to
t4_sched.c while here.

Sponsored by: Chelsio Communications

r337439:
cxgbe(4): Allow the driver to specify a burst size when configuring a
traffic class for rate limiting.

Add experimental knobs that allow the user to specify a default pktsize
and burstsize for traffic classes associated with a port:

dev.<ifname>.<instance>.tc.pktsize
dev.<ifname>.<instance>.tc.burstsize

Sponsored by: Chelsio Communications

r337540:
cxgbe(4): Display pkt-size and burst-size in traffic class parameters.


# 346855 28-Apr-2019 np

MFC r333153, r333394, r333442, r333472, r333620, r334058, r334447,
r334452, and r335684. These revisions added hashfilters, NAT offload,
and SMAC/DMAC swapping filters to cxgbe.

r333153:
cxgbe(4): Move all TCAM filter code into a separate file.

Sponsored by: Chelsio Communications

r333394:
cxgbe(4): Add support for hash filters.

These filters reside in the card's memory instead of its TCAM and can be
configured via a new "hashfilter" subcommand in cxgbetool. Hash and
normal TCAM filters can be used together. The hardware does an
exact-match of packet fields for hash filters, unlike the masked match
performed for TCAM filters. Any T5/T6 card with memory can support at
least half a million hash filters. The sample config file with the
driver configures 512K of these, it is possible to double this to 1
million+ in some cases.

The chip does an exact-match of fields of incoming datagrams with hash
filters and performs the action configured for the filter if it matches.
The fields to match are specified in a "filter mask" in the firmware
config file. The filter mask always includes the 5-tuple (sip, dip,
sport, dport, ipproto). It can, optionally, also include any subset of
the filter mode (see filterMode and filterMask in the firmware config
file).

For example:
filterMode = fragmentation, mpshittype, protocol, vlan, port, fcoe
filterMask = protocol, port, vlan

Exact values of the 5-tuple, the physical port, and VLAN tag would have
to be provided while setting up a hash filter with the chip
configuration above.

Hash filters support all actions supported by TCAM filters. A packet
that hits a hash filter can be dropped, let through (with optional
steering to a specific queue or RSS region), switched out of another
port (with optional L2 rewrite of DMAC, SMAC, VLAN tag), or get NAT'ed.
(Support for some of these will show up in the driver in a follow-up
commit very shortly).

Sponsored by: Chelsio Communications

r333442:
cxgbe(4): Determine whether the firmware supports the FILTER2 work
request, which can be used to configure hardware NAT and swapmac.

All firmwares released after Jan 2017 support this work request.

Sponsored by: Chelsio Communications

r333472:
cxgbe(4): Add fields to support configuration of hardware NAT and
swapmac (SMAC/DMAC switcheroo) from userspace.

Sponsored by: Chelsio Communications

r333620:
cxgbe(4): Filtering related features and fixes.

- Driver support for hardware NAT.
- Driver support for swapmac action.
- Validate a request to create a hashfilter against the filter mask.
- Add a hashfilter config file for T5.

Sponsored by: Chelsio Communications

r334058:
cxgbe(4): Only valid filters are expected to have a valid tid.

r334447:
cxgbe(4): Add code to deal with the chip's source MAC table (aka SMT).

Submitted by: Krishnamraju Eraparaju @ Chelsio
Sponsored by: Chelsio Communications

r334452:
cxgbe(4): Add support for SMAC-rewriting filters.

Submitted by: Krishnamraju Eraparaju @ Chelsio
Sponsored by: Chelsio Communications

r335684:
cxgbe(4): Do not leak the filters in the hashfilter table on module
unload.

Sponsored by: Chelsio Communications

Relnotes: Yes


# 346852 28-Apr-2019 np

MFC r333114:

cxgbe(4): Use opaque cookies or tid range-checks to determine the
intended recipient of a CPL when it can't be determined solely from the
opcode. Retire the per-queue handlers for such CPLs in favor of the new
scheme.

Sponsored by: Chelsio Communications


# 346850 28-Apr-2019 np

MFC r333043:

cxgbe(4): Move release_tid to the base NIC driver for future consumers.

Sponsored by: Chelsio Communications.


# 346849 28-Apr-2019 np

MFC r333030:

cxgbe(4): Break up alloc_tid_tabs and move the atid routines to the base
NIC driver. The atid services will be used by new features (hashfilters
and inline TLS) that do not involve TOE.

Sponsored by: Chelsio Communications


# 346848 28-Apr-2019 np

MFC r331902:

r331902: cxgbe: Implement tcp_info handler for connections handled by t4_tom.


# 346805 28-Apr-2019 np

MFC r317849 (partial), r332506, and r332787.

r317849 (partial, required by r332506):
cxgbe/t4_tom: Per-connection rate limiting for TCP sockets handled by
the TOE.

Sponsored by: Chelsio Communications

r332506:
cxgbe(4): Add support for Connection Offload Policy (aka COP).

COP allows fine-grained control on whether to offload a TCP connection
using t4_tom, and what settings to apply to a connection selected for
offload. t4_tom must still be loaded and IFCAP_TOE must still be
enabled for full TCP offload to take place on an interface. The
difference is that IFCAP_TOE used to be the only knob and would enable
TOE for all new connections on the inteface, but now the driver will
also consult the COP, if any, before offloading to the hardware TOE.

A policy is a plain text file with any number of rules, one per line.
Each rule has a "match" part consisting of a socket-type (L = listen,
A = active open, P = passive open, D = don't care) and a pcap-filter(7)
expression, and a "settings" part that specifies whether to offload the
connection or not and the parameters to use if so. The general format
of a rule is: [socket-type] expr => settings

Example. See cxgbetool(8) for more information.
[L] ip && port http => offload
[L] port 443 => !offload
[L] port ssh => offload
[P] src net 192.168/16 && dst port ssh => offload !nagle !timestamp cong newreno
[P] dst port ssh => offload !nagle ecn cong tahoe
[P] dst port http => offload
[A] dst port 443 => offload tls
[A] dst net 192.168/16 => offload !timestamp cong highspeed

The driver processes the rules for each new listen, active open, or
passive open and stops at the first match. There is an implicit rule at
the end of every policy that prohibits offload when no rule in the
policy matches:
[D] all => !offload

This is a reworked and expanded version of a patch submitted by
Krishnamraju Eraparaju @ Chelsio.

Sponsored by: Chelsio Communications

r332787:
cxgbe(4): Fix bugs in the handling of COP rules that match on VLAN tag.

Retrieve the tag from the correct ifnet and use the provided tag
(instead of hardcoded 0xffff, implying no tag) in the routines that
process offload policy.

Submitted by: Krishnamraju Eraparaju @ Chelsio
Sponsored by: Chelsio Communications


# 345664 28-Mar-2019 jhb

MFC 330040,330041,330079,330884,330946,330947,331649,333068,333810,337722,
340466,340468,340469,340473: Add TOE-based TLS offload.

Note that this requires a modified OpenSSL library.

330040:
Fetch TLS key parameters from the firmware.

The parameters describe how much of the adapter's memory is reserved for
storing TLS keys. The 'meminfo' sysctl now lists this region of adapter
memory as 'TLS keys' if present.

330041:
Move ccr_aes_getdeckey() from ccr(4) to the cxgbe(4) driver.

This routine will also be used by the TOE module to manage TLS keys.

330079:
Move #include for rijndael.h out of x86-specific region.

The #include was added inside of the conditional by accident and the lack
of it broke non-x86 builds.

330884:
Support for TLS offload of TOE connections on T6 adapters.

The TOE engine in Chelsio T6 adapters supports offloading of TLS
encryption and TCP segmentation for offloaded connections. Sockets
using TLS are required to use a set of custom socket options to upload
RX and TX keys to the NIC and to enable RX processing. Currently
these socket options are implemented as TCP options in the vendor
specific range. A patched OpenSSL library will be made available in a
port / package for use with the TLS TOE support.

TOE sockets can either offload both transmit and reception of TLS
records or just transmit. TLS offload (both RX and TX) is enabled by
setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be
enabled on the relevant interface. Transmit offload can be used on
any "normal" or TLS TOE socket by using the custom socket option to
program a transmit key. This permits most TOE sockets to
transparently offload TLS when applications use a patched SSL library
(e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL
library). Receive offload can only be used with TOE sockets using the
TLS mode. The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a
list of TCP port numbers. Any connection with either a local or
remote port number in that list will be created as a TLS socket rather
than a plain TOE socket. Note that although this sysctl accepts an
arbitrary list of port numbers, the sysctl(8) tool is only able to set
sysctl nodes to a single value. A TLS socket will hang without
receiving data if used by an application that is not using a patched
SSL library. Thus, the tls_rx_ports node should be used with care.
For a server mostly concerned with offloading TLS transmit, this node
is not needed as plain TOE sockets will fall back to software crypto
when using an unpatched SSL library.

New per-interface statistics nodes are added giving counts of TLS
packets and payload bytes (payload bytes do not include TLS headers or
authentication tags/MACs) offloaded via the TOE engine, e.g.:

dev.cc.0.stats.rx_tls_octets: 149
dev.cc.0.stats.rx_tls_records: 13
dev.cc.0.stats.tx_tls_octets: 26501823
dev.cc.0.stats.tx_tls_records: 1620

TLS transmit work requests are constructed by a new variant of
t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.

TLS transmit work requests require a buffer containing IVs. If the
IVs are too large to fit into the work request, a separate buffer is
allocated when constructing a work request. This buffer is associated
with the transmit descriptor and freed when the descriptor is ACKed by
the adapter.

Received TLS frames use two new CPL messages. The first message is a
CPL_TLS_DATA containing the decryped payload of a single TLS record.
The handler places the mbuf containing the received payload on an
mbufq in the TOE pcb. The second message is a CPL_RX_TLS_CMP message
which includes a copy of the TLS header and indicates if there were
any errors. The handler for this message places the TLS header into
the socket buffer followed by the saved mbuf with the payload data.
Both of these handlers are contained in tom/t4_tls.c.

A few routines were exposed from t4_cpl_io.c for use by t4_tls.c
including send_rx_credits(), a new send_rx_modulate(), and
t4_close_conn().

TLS keys for both transmit and receive are stored in onboard memory
in the NIC in the "TLS keys" memory region.

In some cases a TLS socket can hang with pending data available in the
NIC that is not delivered to the host. As a workaround, TLS sockets
are more aggressive about sending CPL_RX_DATA_ACK messages anytime that
any data is read from a TLS socket. In addition, a fallback timer will
periodically send CPL_RX_DATA_ACK messages to the NIC for connections
that are still in the handshake phase. Once the connection has
finished the handshake and programmed RX keys via the socket option,
the timer is stopped.

A new function select_ulp_mode() is used to determine what sub-mode a
given TOE socket should use (plain TOE, DDP, or TLS). The existing
set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and
handles initialization of TLS-specific state when necessary in
addition to DDP-specific state.

Since TLS sockets do not receive individual TCP segments but always
receive full TLS records, they can receive more data than is available
in the current window (e.g. if a 16k TLS record is received but the
socket buffer is itself 16k). To cope with this, just drop the window
to 0 when this happens, but track the overage and "eat" the overage as
it is read from the socket buffer not opening the window (or adding
rx_credits) for the overage bytes.

330946:
Remove TLS-related inlines from t4_tom.h to fix iw_cxgbe(4) build.

- Remove the one use of is_tls_offload() and the function. AIO special
handling only needs to be disabled when a TOE socket is actively doing
TLS offload on transmit. The TOE socket's mode (which affects receive
operation) doesn't matter, so remove the check for the socket's mode and
only check if a TOE socket has TLS transmit keys configured to determine
if an AIO write request should fall back to the normal socket handling
instead of the TOE fast path.
- Move can_tls_offload() into t4_tls.c. It is not used in critical paths,
so inlining isn't that important. Change return type to bool while here.

330947:
Fix the check for an empty send socket buffer on a TOE TLS socket.

Compare sbavail() with the cached sb_off of already-sent data instead of
always comparing with zero. This will correctly close the connection and
send the FIN if the socket buffer contains some previously-sent data but
no unsent data.

331649:
Use the offload transmit queue to set flags on TLS connections.

Requests to modify the state of TLS connections need to be sent on the
same queue as TLS record transmit requests to ensure ordering.

However, in order to use the offload transmit queue in t4_set_tcb_field(),
the function needs to be updated to do proper flow control / credit
management when queueing a request to an offload queue. This required
passing a pointer to the toepcb itself to this function, so while here
remove the 'tid' and 'iqid' parameters and obtain those values from the
toepcb in t4_set_tcb_field() itself.

333068:
Use the correct key address when renegotiating the transmit key.

Previously, get_keyid() was returning the address of the receive key
instead of the transmit key when renegotiating the transmit key. This
could either hang the card (if a connection was only offloading TLS TX
and thus had a receive key address of -1) or cause the connection to
fail by overwriting the wrong key (if both RX and TX TLS were
offloaded).

333810:
Be more robust against garbage input on a TOE TLS TX socket.

If a socket is closed or shutdown and a partial record (or what
appears to be a partial record) is waiting in the socket buffer,
discard the partial record and close the connection rather than
waiting forever for the rest of the record.

337722:
Whitespace nit in t4_tom.h

340466:
Move the TLS key map into the adapter softc so non-TOE code can use it.

340468:
Change the quantum for TLS key addresses to 32 bytes.

The addresses passed when reading and writing keys are always shifted
right by 5 as the memory locations are addressed in 32-byte chunks, so
the quantum needs to be 32, not 8.

340469:
Remove bogus roundup2() of the key programming work request header.

The key context is always placed immediately after the work request
header. The total work request length has to be rounded up by 16
however.

340473:
Restore the <sys/vmem.h> header to fix build of cxgbe(4) TOM.

vmem's are not just used for TLS memory in TOM and the #include actually
predates the TLS code so should not have been removed when the TLS vmem
moved in r340466.

Sponsored by: Chelsio Communications


# 344856 06-Mar-2019 jhb

MFC 330882: Simplify error handling in t4_tom.ko module loading.

- Change t4_ddp_mod_load() to return void instead of always returning
success. This avoids having to pretend to have proper support for
unloading when only part of t4_tom_mod_load() has run.
- If t4_register_uld() fails, don't invoke t4_tom_mod_unload() directly.
The module handling code in the kernel invokes MOD_UNLOAD on a module
whose MOD_LOAD fails with an error already.


# 331722 29-Mar-2018 eadler

Revert r330897:

This was intended to be a non-functional change. It wasn't. The commit
message was thus wrong. In addition it broke arm, and merged crypto
related code.

Revert with prejudice.

This revert skips files touched in r316370 since that commit was since
MFCed. This revert also skips files that require $FreeBSD$ property
changes.

Thank you to those who helped me get out of this mess including but not
limited to gonzo, kevans, rgrimes.

Requested by: gjb (re)


# 331645 27-Mar-2018 jhb

MFC 329785: Move DDP PCB state into a helper structure.

This consolidates all of the DDP state in one place. Also, the code has
now been fixed to ensure that DDP state is only accessed for DDP
connections. This should not be a functional change but makes it cleaner
and easier to add state for other TOE socket modes in the future.

Sponsored by: Chelsio Communications


# 330897 14-Mar-2018 eadler

Partial merge of the SPDX changes

These changes are incomplete but are making it difficult
to determine what other changes can/should be merged.

No objections from: pfg


# 330307 03-Mar-2018 np

MFC r319506, r319872, r321063, r321103, r321179, r321390, r321435,
r321582, r321671, r322014, r322034, r322055, r322123, r322167, r322425,
r322549, r322914, r322960, r322962, r322964, r322985, r322990, r323006,
r323026, r323041, r323069, r323078, r323343, r323514, r323520, r324296,
r324379, r324386, r324443, r324945, r325596, r325680, r325880,
r325883-r325884, r325961, r326026, r326042, r327062, r327093, r327332,
r327528, r328420, and r328423.

r319506:
cxgbe(4): Update the statistics for compound tx work requests once per
work request, not once per frame.

r319872:
cxgbe(4): Do not request an FEC setting that the port does not support.

r321063:
cxgbe(4): Various link/media related improvements.

- Deal with changes to port_type, and not just port_mod when a
transceiver is changed. This fixes hot swapping of transceivers of
different types (QSFP+ or QSA or QSFP28 in a QSFP28 port, SFP+ or
SFP28 in a SFP28 port, etc.).

- Always refresh media information for ifconfig if the port is down.
The firmware does not generate tranceiver-change interrupts unless at
least one VI is enabled on the physical port. Before this change
ifconfig diplayed potentially stale information for ports that were
administratively down.

- Always recalculate and reapply L1 config on a transceiver change.

- Display PAUSE settings in ifconfig. The driver sysctls for this
continue to work as well.

r321103:
cxgbe(4): New ioctls to flash bootrom and boot config to the card.

r321179:
cxgbe/t4_tom: Log more details about the newly ESTABLISHED tid to the
trace buffer.

r321390:
cxgbe(4): Install the firmware bundled with the driver to the card if it
doesn't seem to have one. This lets the driver recover automatically
from incomplete firmware upgrades (panic, reboot, power loss, etc. in
the middle of an upgrade).

r321435:
cxgbe(4): Display some more TOE parameters related to retransmission
and keepalive in the sysctl MIB. Provide tunables to change some of
these parameters. These are supposed to be setup by the firmware so
these tunables are for experimentation only.

r321582:
cxgbe(4): Some updates to the common code.

- Updated register ranges.
- Helper routines for access to TP registers.
- Updated routine to read flash parameters.

r321671:
cxgbe/iw_cxgbe: Log the end point's history and flags to the trace
buffer just before it's freed.

r322014:
cxgbe(4): Initial import of the "collect" component of Chelsio unified
debug (cudbg) code, hooked up to the main driver via an ioctl.

The ioctl can be used to collect the chip's internal state in a
compressed dump file. These dumps can be decoded with the "view"
component of cudbg.

r322034:
cxgbe(4): Always use the first and not the last virtual interface
associated with a port in begin_synchronized_op.

r322055:
cxgbe(4): Allow the TOE timer tunables to be set with microsecond
precision. These timers are already displayed in microseconds in the
sysctl MIB. Add variables to track these tunables while here.

r322123:
cxgbe(4): Avoid a NULL dereference that would occur during module unload
if there were problems earlier during attach.

r322167:
cxgbe(4): Add the T6 and T5 Unified Wire configuration files to the
kernel, just like for T4, when the driver is compiled into the kernel.

r322425:
cxgbe(4): Save the last reported link parameters and compare them with
the current state to determine whether to generate a link-state change
notification. This fixes a bug introduced in r321063 that caused the
driver to sometimes skip these notifications.

r322549:
cxgbe/t4_tom: Use correct name for the ISS-valid bit in options2.

r322914:
cxgbe(4): Dump the mailbox contents in the same format as CH_DUMP_MBOX.

r322960:
cxgbe(4): Verify that the driver accesses the firmware mailbox in a
thread-safe manner.

r322962:
cxgbe(4): Remove write only variable from t4_port_init.

r322964:
cxgbe(4): vi_mac_funcs should include the base Ethernet function. It is
already used in the driver as if it does.

r322985:
cxgbe(4): Maintain one ifmedia per physical port instead of one per
Virtual Interface (VI). All autonomous VIs that share a port share the
same media.

r322990:
cxgbe(4): Do not access the mailbox without appropriate locks while
creating hardware VIs.

This fixes a bad race on systems with hw.cxgbe.num_vis > 1.

r323006:
cxgbe(4): Update T6/T5/T4 firmwares to 1.16.59.0.

r323026:
cxgbe(4): Zero out the memory allocated for the debug dump.
cudbg_collect seems to expect it this way.

r323041:
cxgbe(4): Add two new debug flags -- one to allow manual firmware
install after full initialization, and another to disable the TCB
cache (T6+). The latter works as a tunable only.

Note that debug_flags are for debugging only and should not be set
normally.

r323069:
cxgbe/t4_tom: Add a knob to select the congestion control algorigthm
used by the TOE hardware for fully offloaded connections. The knob
affects new connections only.

r323078:
cxgbe/t4_tom: There may not be a tid to update if the connection isn't
established.

r323343:
cxgbe(4): Fix a couple of problems in the sge_wrq data path.

- start_wrq_wr must not drain the wr_list if there are incomplete_wrs
pending. This can happen when a t4_wrq_tx runs between two
start_wrq_wr.

- commit_wrq_wr must examine the cookie's pidx and ndesc with the
queue's lock held. Otherwise there is a bad race when incomplete WRs
are being completed and commit_wrq_wr for the WR that is ahead in the
queue updates the next incomplete WR's cookie's pidx/ndesc but the
commit_wrq_wr for the second one is using stale values that it read
without the lock.

r323514:
cxgbetool(8): mode must be specified when creating the dump file.

r323520:
cxgbe(4): Ignore capabilities that depend on TOE when the firmware
reports TOE is not available.

r324296:
cxgbe(4): Provide knobs to set the holdoff parameters of TOE rx queues
separately from NIC rx queues instead of using the same parameters for
both types of queues.

r324379:
cxgbetool(8): Do not create a large file devoid of useful content when
the dumpstate ioctl fails. Make the file world-readable while here.

r324386:
cxgbe(4): Update T6, T5, and T4 firmwares to 1.16.63.0.

r324443:
cxgbetool(8): Do not close uninitialized fd on malloc failure.

r324945:
cxgbe(4): Read the MPS buffer group map from the firmware as it could be
different from hardware defaults. The congestion channel map, which is
still fixed, needs to be tracked separately now. Change the congestion
setting for TOE rx queues to match the drivers on other OSes while here.

r325596:
cxgbe(4): Do not request settings not supported by the port.

r325680:
cxgbe(4): Excluce mdi from the check against port capabilities.

r325880:
cxgbe(4): Combine all _10g and _1g tunables and drop the suffix from
their names. The finer-grained knobs weren't practically useful.

r325883:
cxgbe(4): Sanitize t4_num_vis during MOD_LOAD like all other t4_*
tunables. Add num_vis to the intrs_and_queues structure as it affects
the number of interrupts requested and queues created. In future
cfg_itype_and_nqueues might lower it incrementally instead of going
straight to 1 when enough interrupts aren't available.

r325884:
cxgbe(4): Remove rsrv_noflowq from intrs_and_queues structure as it does
not influence or get affected by the number of interrupts or queues.

r325961:
cxgbe(4): Add core Vdd to the sysctl MIB.

r326026:
cxgbe(4): Add a custom board to the device id list.

r326042:
cxgbe(4): Fix unsafe mailbox access in cudbg.

r327062:
cxgbe(4): Read the MFG diags version from the VPD and make it available
in the sysctl MIB.

r327093:
cxgbe(4): Do not forward interrupts to queues with freelists. This
leaves the firmware event queue (fwq) as the only queue that can take
interrupts for others.

This simplifies cfg_itype_and_nqueues and queue allocation in the driver
at the cost of a little (never?) used configuration. It also allows
service_iq to be split into two specialized variants in the future.

r327332:
cxgbe(4): Reduce duplication by consolidating minor variations of the
same code into a single routine.

r327528:
cxgbe(4): Add a knob to enable/disable PCIe relaxed ordering. Disable it by
default when running on Intel CPUs.

r328420:
cxgbe(4): Do not display harmless warning in non-debug builds.

r328423:
cxgbe(4): Accept old names of a couple of tunables.

Sponsored by: Chelsio Communications


# 330303 03-Mar-2018 jhb

MFC 328608: Export tcp_always_keepalive for use by the Chelsio TOM module.

This used to work by accident with ld.bfd even though always_keepalive
was marked as static. LLD honors static more correctly, so export this
variable properly (including moving it into the tcp_* namespace).

Relative to HEAD the MFC includes two additional changes:
- The t3_tom module used for cxgb(4) is also patched.
- A strong reference from the new name (tcp_always_keepalive) to the old
name (always_keepalive) has been added to preserve the KBI for existing
modules.

Suggested by: kib (strong reference)
Sponsored by: Chelsio Communications


# 318803 24-May-2017 np

MFC r313346:

cxgbe/t4_tom: Fix CLIP entry refcounting on the passive side. Every
IPv6 connection being handled by the TOE should have a reference on its
CLIP entry.

Sponsored by: Chelsio Communications


# 313179 04-Feb-2017 jhb

MFC 312904: Don't drop a reference to the TOE PCB in undo_offload_socket().

undo_offload_socket() is only called by t4_connect() during a connection
setup failure, but t4_connect() still owns the TOE PCB and frees ita
after undo_offload_socket() returns. Release a reference in
undo_offload_socket() resulted in a double-free which panicked when
t4_connect() performed the second free. The reference release was
added to undo_offload_socket() incorrectly in r299210.

Sponsored by: Chelsio Communications


# 313178 03-Feb-2017 jhb

MFC 312906:
Unregister CPL handlers for TOE-related messages when unloading TOM.

Sponsored by: Chelsio Communications


# 312116 14-Jan-2017 np

MFC r311569, r311657, and r311949.

r311569:
Fix comment in t4_tom. No functional change.

r311657:
cxgbe/t4_tom: Fix tid accounting. An offloaded IPv6 connection uses 2
tids, not 1, in the hardware.

r311949:
cxgbe/tom: Add VIMAGE support to the TOE driver.

Active Open:
- Save the socket's vnet at the time of the active open (t4_connect) and
switch to it when processing the reply (do_act_open_rpl or
do_act_establish).

Passive Open:
- Save the listening socket's vnet in the driver's listen_ctx and switch
to it when processing incoming SYNs for the socket.
- Reject SYNs that arrive on an ifnet that's not in the same vnet as the
listening socket.

CLIP (Compressed Local IPv6) table:
- Add only those IPv6 addresses to the CLIP that are in a vnet
associated with one of the card's ifnets.

Misc:
- Set vnet from the toepcb when processing TCP state transitions.
- The kernel sets the vnet when calling the driver's output routine
so t4_push_frames runs in proper vnet context already. One exception
is when incoming credits trigger tx within the driver's ithread. Set
the vnet explicitly in do_fw4_ack for that case.

Sponsored by: Chelsio Communications


# 309560 05-Dec-2016 jhb

MFC 305695,305696,305699,305702,305703,305713,305715,305827,305852,305906,
305908,306062,306063,306137,306138,306206,306216,306273,306295,306301,
306465,309302:
Add support for adapters using the Terminator T6 ASIC.

305695:
cxgbe(4): Set up fl_starve_threshold2 accurately for T6.

305696:
cxgbe(4): Use correct macro for header length with T6 ASICs. This
affects the transmit of the VF driver only.

305699:
cxgbe(4): Update the pad_boundary calculation for T6, which has a
different range of boundaries.

305702:
cxgbe(4): Use smaller min/max bursts for fl descriptors with a T6.

305703:
cxgbe(4): Deal with the slightly different SGE_STAT_CFG in T6.

305713:
cxgbe(4): Add support for additional port types and link speeds.

305715:
cxgbe(4): Catch up with the rename of tlscaps -> cryptocaps. TLS is one
of the capabilities of the crypto engine in T6.

305827:
cxgbe(4): Use the interface's viid to calculate the PF/VF/VFValid fields
to use in tx work requests.

305852:
cxgbe(4): Attach to cards with the Terminator 6 ASIC. T6 cards will
come up as 't6nex' nexus devices with 'cc' ports hanging off them.

The T6 firmware and configuration files will be added as soon as they
are released. For now the driver will try to work with whatever
firmware and configuration is on the card's flash.

305906:
cxgbe/t4_tom: The SMAC entry for a VI is at a different location in the T6.

305908:
cxgbe/t4_tom: Update the active/passive open code to support T6. Data
path works as-is.

306062:
cxgbe(4): Show wcwr_stats for T6 cards.

306063:
cxgbe(4): Setup congestion response for T6 rx queues.

306137:
cxgbetool: Add T6 support to the SGE context decoder.

306138:
Fix typo.

306206:
cxgbe(4): Catch up with the different layout of WHOAMI in T6.

Note that the code moved below t4_prep_adapter() as part of this change
because now it needs a working chip_id().

306216:
cxgbe(4): Fix the output of the "tids" sysctl on T6.

306273:
cxgbe(4): Fix netmap with T6, which doesn't encapsulate SGE_EGR_UPDATE
message inside a FW_MSG. The base NIC already deals with updates in
either form.

306295:
cxgbe(4): Support SIOGIFXMEDIA so that ifconfig displays correct media
for 25Gbps and 100Gbps ports. This should have been part of r305713,
which is when the driver first started reporting extended media types.

306301:
cxgbe(4): Use the port's top speed to figure out whether it is "high
speed" or not (for the purpose of calculating the number of queues etc.)
This does the right thing for 25Gbps and 100Gbps ports.

306465:
cxgbe(4): Claim the T6 -DBG card.

309302:
cxgbe(4): Include firmware for T6 cards in the driver. Update all
firmwares to 1.16.12.0.

Sponsored by: Chelsio Communications


# 309555 05-Dec-2016 jhb

MFC 303688,303750,305166,305167: Centralize and rework page pod handling.

303688:
cxgbe/t4_tom: Read the chip's DDP page sizes and save them in a
per-adapter data structure. This replaces a global array with hardcoded
page sizes.

303750:
cxgbe/t4_tom: The page pod arena allocates from pod address space and
not index space. The minimum valid allocation out of this arena is the
size of a single page pod.

305166:
cxgbe/t4_tom: Add general purpose routines to deal with page pod regions
and allocations within them. Switch to these routines to manage the TOE
DDP region.

305167:
cxgbe/t4_tom: Two new routines to allocate and write page pods for a
buffer in the kernel's address space.

Sponsored by: Chelsio Communications


# 306661 03-Oct-2016 jhb

MFC 303405: Add support for zero-copy aio_write() on TOE sockets.

AIO write requests for a TOE socket on a Chelsio T4+ adapter can now
DMA directly from the user-supplied buffer. This is implemented by
wiring the pages backing the user-supplied buffer and queueing special
mbufs backed by raw VM pages to the socket buffer. The TOE code
recognizes these special mbufs and builds a sglist from the VM page
array associated with the mbuf when queueing a work request to the TOE.

Because these mbufs do not have an associated virtual address, m_data
is not valid. Thus, the AIO handler does not invoke sosend() directly
for these mbufs but instead inlines portions of sosend_generic() and
tcp_usr_send().

An aiotx_buffer structure is used to describe the user buffer (e.g.
it holds the array of VM pages and a reference to the AIO job). The
special mbufs reference this structure via m_ext. Note that a single
job might be split across multiple mbufs (e.g. if it is larger than
the socket buffer size). The 'ext_arg2' member of each mbuf gives an
offset relative to the backing aiotx_buffer. The AIO job associated
with an aiotx_buffer structure is completed when the last reference to
the structure is released.

Zero-copy aio_write()'s for connections associated with a given
adapter can be enabled/disabled at runtime via the
'dev.t[45]nex.N.toe.tx_zcopy' sysctl.

Sponsored by: Chelsio Communications


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 302339 04-Jul-2016 np

cxgbe(4): Changes to the CPL-handler registration mechanism and code
related to "shared" CPLs.

a) Combine t4_set_tcb_field and t4_set_tcb_field_rpl into a single
function. Allow callers to direct the response to any iq. Tidy up
set_ulp_mode_iscsi while there to use names from t4_tcb.h instead of
magic constants.

b) Remove all CPL handler tables from struct adapter. This reduces its
size by around 2KB. All handlers are now registered at MOD_LOAD instead
of attach or some kind of initialization/activation. The registration
functions do not need an adapter parameter any more.

c) Add per-iq handlers to deal with CPLs whose destination cannot be
determined solely from the opcode. There are 2 such CPLs in use right
now: SET_TCB_RPL and L2T_WRITE_RPL. The base driver continues to send
filter and L2T_WRITEs over the mgmtq and solicits the reply on fwq.
t4_tom (including the DDP code) now uses the port's ctrlq to send
L2T_WRITEs and SET_TCB_FIELDs and solicits the reply on an ofld_rxq.
fwq and ofld_rxq have different handlers that know what kind of tid to
expect in the reply. Update t4_write_l2e and callers to to support any
wrq/iq combination.

Approved by: re@ (kib@)
Sponsored by: Chelsio Communications


# 299210 06-May-2016 jhb

Use DDP to implement zerocopy TCP receive with aio_read().

Chelsio's TCP offload engine supports direct DMA of received TCP payload
into wired user buffers. This feature is known as Direct-Data Placement.
However, to scale well the adapter needs to prepare buffers for DDP
before data arrives. aio_read() is more amenable to this requirement than
read() as applications often call read() only after data is available in
the socket buffer.

When DDP is enabled, TOE sockets use the recently added pru_aio_queue
protocol hook to claim aio_read(2) requests instead of letting them use
the default AIO socket logic. The DDP feature supports scheduling DMA
to two buffers at a time so that the second buffer is ready for use
after the first buffer is filled. The aio/DDP code optimizes the case
of an application ping-ponging between two buffers (similar to the
zero-copy bpf(4) code) by keeping the two most recently used AIO buffers
wired. If a buffer is reused, the aio/DDP code is able to reuse the
vm_page_t array as well as page pod mappings (a kind of MMU mapping the
Chelsio NIC uses to describe user buffers). The generation of the
vmspace of the calling process is used in conjunction with the user
buffer's address and length to determine if a user buffer matches a
previously used buffer. If an application queues a buffer for AIO that
does not match a previously used buffer then the least recently used
buffer is unwired before the new buffer is wired. This ensures that no
more than two user buffers per socket are ever wired.

Note that this feature is best suited to applications sending a steady
stream of data vs short bursts of traffic.

Discussed with: np
Relnotes: yes
Sponsored by: Chelsio Communications


# 298482 22-Apr-2016 pfg

Cleanup redundant parenthesis from existing howmany()/roundup() macro uses.


# 294889 27-Jan-2016 glebius

More fixes to the build.


# 292736 25-Dec-2015 np

cxgbe(4): Updates to the base NIC driver and t4_tom to support the iSCSI
offload driver. These changes come from projects/cxl_iscsi.


# 291665 02-Dec-2015 jhb

Add support for configuring additional virtual interfaces (VIs) on a port.

Each virtual interface has its own MAC address, queues, and statistics.
The dedicated netmap interfaces (ncxgbeX / ncxlX) were already implemented
as additional VIs on each port. This change allows additional non-netmap
interfaces to be configured on each port. Additional virtual interfaces
use the naming scheme vcxgbeX or vcxlX.

Additional VIs are enabled by setting the hw.cxgbe.num_vis tunable to a
value greater than 1 before loading the cxgbe(4) or cxl(4) driver.
NB: The first VI on each port is the "main" interface (cxgbeX or cxlX).

T4/T5 NICs provide a limited number of MAC addresses for each physical port.
As a result, a maximum of six VIs can be configured on each port (including
the "main" interface and the netmap interface when netmap is enabled).

One user-visible result is that when netmap is enabled, packets received
or transmitted via the netmap interface are no longer counted in the stats
for the "main" interface, but are not accounted to the netmap interface.

The netmap interfaces now also have a new-bus device and export various
information sysctl nodes via dev.n(cxgbe|cxl).X.

The cxgbetool 'clearstats' command clears the stats for all VIs on the
specified port along with the port's stats. There is currently no way to
clear the stats of an individual VI.

Reviewed by: np
MFC after: 1 month
Sponsored by: Chelsio


# 286001 29-Jul-2015 ae

Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.

Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by: gnn (previous version)
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3149


# 278374 08-Feb-2015 np

cxgbe(4): tidy up some of the interaction between the Upper Layer
Drivers (ULDs) and the base if_cxgbe driver.

Track the per-adapter activation of ULDs in a new "active_ulds" field.
This was done pretty arbitrarily before this change -- via TOM_INIT_DONE
in adapter->flags for TOM, and the (1 << MAX_NPORTS) bit in
adapter->offload_map for iWARP.

iWARP and hw-accelerated iSCSI rely on the TOE (supported by the TOM
ULD). The rules are:
a) If the iWARP and/or iSCSI ULDs are available when TOE is enabled then
iWARP and/or iSCSI are enabled too.
b) When the iWARP and iSCSI modules are loaded they go looking for
adapters with TOE enabled and enable themselves on that adapter.
c) You cannot deactivate or unload the TOM module from underneath iWARP
or iSCSI. Any such attempt will fail with EBUSY.

MFC after: 2 weeks


# 272719 07-Oct-2014 np

cxgbe/tom: don't leak resources tied to an active open request that
cannot be sent to the chip because a prerequisite L2 resolution
failed.

Submitted by: Hariprasad at chelsio dot com (original version)
MFC after: 2 weeks.


# 257241 28-Oct-2013 glebius

Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 255411 09-Sep-2013 np

Rework the tx credit mechanism between the cxgbe/tom driver
and the card. This helps smooth out some burstiness in the
exchange.

Approved by: re (glebius)


# 255006 28-Aug-2013 np

Change t4_list_lock and t4_uld_list_lock from mutexes to sx'es.

- tom_uninit had to be reworked not to hold the adapter lock (a mutex)
around t4_deactivate_uld, which acquires the uld_list_lock.
- the ifc_match for the interface cloner that creates the tracer ifnet
had to be reworked as the kernel calls ifc_match with the global
if_cloners_mtx held.


# 252716 04-Jul-2013 np

Pay attention to TCP_NODELAY when it's set/unset after the connection
is established.

MFC after: 1 day


# 252705 04-Jul-2013 np

- Read all TP parameters in one place.
- Read the filter mode, calculate various shifts, and use them
properly during active open (in select_ntuple).

MFC after: 1 day


# 249627 18-Apr-2013 np

cxgbe/tom: Update the CLIP table on the chip when there are changes
to the list of IPv6 addresses on the system. The table is used for
TOE+IPv6 only.


# 248925 30-Mar-2013 np

cxgbe(4): Add support for Chelsio's Terminator 5 (aka T5) ASIC. This
includes support for the NIC and TOE features of the 40G, 10G, and
1G/100M cards based on the T5.

The ASIC is mostly backward compatible with the Terminator 4 so cxgbe(4)
has been updated instead of writing a brand new driver. T5 cards will
show up as cxl (short for cxlgb) ports attached to the t5nex bus driver.

Sponsored by: Chelsio


# 245935 26-Jan-2013 np

Add a couple of missing error codes. Treat CPL_ERR_KEEPALV_NEG_ADVICE as
negative advice and not a fatal error.

MFC after: 3 days


# 245448 15-Jan-2013 np

cxgbe/tom: Basic CLIP table management.

This is the Compressed Local IPv6 table on the chip. To save space, the
chip uses an index into this table instead of a full IPv6 address in
some of its hardware data structures.

For now the driver fills this table with all the local IPv6 addresses
that it sees at the time the table is initialized. I'll improve this
later so that the table is updated whenever new IPv6 addresses are
configured or existing ones deleted.

MFC after: 1 week


# 245441 14-Jan-2013 np

cxgbe/tom: Miscellaneous updates for TOE+IPv6 support (more to follow).

- Teach find_best_mtu_idx() to deal with IPv6 endpoints.

- Install correct protosw in offloaded TCP/IPv6 sockets when DDP is
enabled.

- Move set_tcp_ddp_ulp_mode to t4_tom.c so that t4_tom.h can be included
without having to drag in t4_msg.h too. This was bothering the iWARP
driver for some reason.

MFC after: 1 week


# 245276 10-Jan-2013 np

Overhaul the stid allocator so that it can be used for IPv6 servers
too. The entry for an IPv6 server in the TCAM takes up the equivalent
of two ordinary stids and must be properly aligned too.

MFC after: 1 week


# 245274 10-Jan-2013 np

cxgbe(4): Add functions to help synchronize "slow" operations (those not
on the fast data path) and use them instead of frobbing the adapter lock
and busy flag directly.

Other changes made while reworking all slow operations:
- Wait for the reply to a filter request (add/delete). This guarantees
that the operation is complete by the time the ioctl returns.
- Tidy up the tid_info structure.
- Do not allow the tx queue size to be set to something that's not a
power of 2.

MFC after: 1 week


# 241394 10-Oct-2012 kevlo

Revert previous commit...

Pointyhat to: kevlo (myself)


# 241370 09-Oct-2012 kevlo

Prefer NULL over 0 for pointers


# 239514 21-Aug-2012 np

Minor cleanup: use bitwise ops instead of pointless wrappers around
setbit/clrbit.


# 239344 16-Aug-2012 np

Support for TCP DDP (Direct Data Placement) in the T4 TOE module.

Basically, this is automatic rx zero copy when feasible. TCP payload is
DMA'd directly into the userspace buffer described by the uio submitted
in soreceive by an application.

- Works with sockets that are being handled by the TCP offload engine
of a T4 chip (you need t4_tom.ko module loaded after cxgbe, and an
"ifconfig +toe" on the cxgbe interface).
- Does not require any modification to the application.
- Not enabled by default. Use hw.t4nex.<X>.toe.ddp="1" to enable it.


# 237263 19-Jun-2012 np

- Updated TOE support in the kernel.

- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)