History log of /freebsd-11-stable/sys/dev/cxgbe/adapter.h
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 346967 30-Apr-2019 np

MFC r345334:

cxgbe(4): Treat the viid as an opaque identifier.

Recent firmwares prefer to use a different format for viid internally
and this change allows them to do so.

Sponsored by: Chelsio Communications


# 346963 30-Apr-2019 np

MFC r343666, r343861-r343862, r343923, r343968, r345660, r345810

r343666:
cxgbe(4): Improved error reporting and diagnostics.

"slow" interrupt handler:
- Expand the list of INT_CAUSE registers known to the driver.
- Add decode information for many more bits but decouple it from the
rest of intr_info so that it is entirely optional.
- Call t4_fatal_err exactly once, and from the top level PL intr handler.

t4_fatal_err:
- Use t4_shutdown_adapter from the common code to stop the adapter.
- Stop servicing slow interrupts after the first fatal one.

Driver/firmware interaction:
- CH_DUMP_MBOX: note whether the mailbox being dumped is a command or a
reply or something else.
- Log the raw value of pcie_fw for some errors.
- Use correct log levels (debug vs. error).

Sponsored by: Chelsio Communications

r343861:
cxgbe(4): Auto-dump the device log on a mailbox timeout or when the
firmware reports an error in pcie_fw.

Sponsored by: Chelsio Communications

r343862:
cxgbe(4): Auto-dump the CIM block's logic analyzer on a TIMER0 interrupt.

Sponsored by: Chelsio Communications

r343923:
cxgbe(4): Delay the panic due to a fatal error by 30s.

This lets information logged by the interrupt handler reach the system
log before the system goes down.

r343968:
cxgbe(4): Ignore unused interrupts.

Sponsored by: Chelsio Communications

r345660:
cxgbe(4): Count and clear interrupts generated at the software's request.

An interrupt can be requested by setting the F_SWINT bit in PL_PF_CTL.

Sponsored by: Chelsio Communications

r345810:
cxgbe(4): Add a flag to indicate that bits in interrupt cause but not in
interrupt enable are not fatal.

The firmware sets up all the interrupt enables based on run time
configuration, which means the information in the enables is more
accurate than what's compiled into the driver. This change also allows
the fatal bits to be updated without any changes in the driver in some
cases.

Sponsored by: Chelsio Communications


# 346942 30-Apr-2019 np

MFC r341620:

cxgbe(4): Fall back to a basic configuration in case of any error during
card initialization. This is an expanded version of r333682.

Break up prep_firmware into simpler routines while here. Load the
firmware/config KLD only if needed.

Sponsored by: Chelsio Communications


# 346934 29-Apr-2019 np

MFC r341172, r341270.
t4_clip.c had to be manually adjusted because Concurrency Kit is not
available in stable/11.

r341172:
Move CLIP table handling out of TOM and into the base driver.

- Store the clip table in 'struct adapter' instead of in the TOM softc.
- Init the clip table during attach and teardown during detach.
- While here, add a dev.<nexus>.<unit>.misc.clip sysctl to dump the
CLIP table.

This does mean that we update the clip table even if TOE is not enabled,
but non-TOE things need the CLIP table anyway.

Reviewed by: np, Krishnamraju Eraparaju @ Chelsio
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D18010

r341270:
Make most of the CLIP code conditional on #ifdef INET6.

This fixes builds of kernels without INET6 such as LINT-NOINET6.

Reported by: arybchik
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D18384


# 346928 29-Apr-2019 np

MFC r339628, r339965

r339628:
cxgbe(4): improve the accuracy of various TSO limits reported to the kernel.

Sponsored by: Chelsio Communications

r339965:
cxgbe(4): Report a reasonable non-zero if_hw_tsomaxsegsize to the
kernel.

This reverts an accidental change that snuck in with r339628.

Sponsored by: Chelsio Communications


# 346883 29-Apr-2019 np

MFC r338218:

cxgbev(4): Updates to the VF driver to cope with recent ifmedia and
ctrlq changes in the base driver.

Sponsored by: Chelsio Communications


# 346878 29-Apr-2019 np

MFC r337873:

cxgbe(4): Use VLAN_TRUNKDEV instead of private cookie to figure out the
parent of a VLAN ifnet.

Sponsored by: Chelsio Communications


# 346877 29-Apr-2019 np

MFC r337830:

cxgbe(4): Use two hashes instead of a table to keep track of
hashfilters. Two because the driver needs to look up a hashfilter by
its 4-tuple or tid.

A couple of fixes while here:
- Reject attempts to add duplicate hashfilters.
- Do not assume that any part of the 4-tuple that isn't specified is 0.
This makes it consistent with all other mandatory parameters that
already require explicit user input.


# 346876 29-Apr-2019 np

MFC r337659:

cxgbe(4): Move all control queues to the adapter.

There used to be one control queue per adapter (the mgmtq) that was
initialized during adapter init and one per port that was initialized
later during port init. This change moves all the control queues (one
per port/channel) to the adapter so that they are initialized during
adapter init and are available before any port is up. This allows the
driver to issue ctrlq work requests over any channel without having to
bring up any port.


# 346875 29-Apr-2019 np

MFC r337609:

cxgbe(4): Create two variants of service_iq, one for queues with
freelists and one for those without.

MFH: 3 weeks
Sponsored by: Chelsio Communications


# 346871 29-Apr-2019 np

MFC r336718, r336720, r336734-r336735, r337398, r337439, and r337540.
These are all related to tx rate limiting in cxgbe.

r336718:
cxgbe(4): Validate only those parameters that are relevant to the
type of rate limiter being programmed. Skip the ones that are not
applicable.

Sponsored by: Chelsio Communications

r336720:
cxgbe(4): Remove useless code that crept in with r336718.

X-MFC With: 336718

r336734:
cxgbe(4): Better defaults for all cl-rl rate limiters.

Start in "class" instead of "flow" mode. This eliminates the need to
specify an MTU, which is not available that early anyway. It also
allows the user to manually configure ch-rl rate limiting after attach.
This used to fail because ch-rl isn't supported if cl-rl "flow" mode is
configured.

Set all traffic classes to 1Gbps during initialization. The goal is to
start off with _any_ valid configuration and 1Gbps works even for
gigabit cards.

Sponsored by: Chelsio Communications

r336735:
cxgbe(4): Consider rateunit before ratemode when displaying information
about a traffic class. This matches the order in which the firmware
evaluates unit and mode internally.

Sponsored by: Chelsio Communications

r337398:
cxgbe(4): Allow user-configured and driver-configured traffic classes to
be used simultaneously. Move sysctl_tc and sysctl_tc_params to
t4_sched.c while here.

Sponsored by: Chelsio Communications

r337439:
cxgbe(4): Allow the driver to specify a burst size when configuring a
traffic class for rate limiting.

Add experimental knobs that allow the user to specify a default pktsize
and burstsize for traffic classes associated with a port:

dev.<ifname>.<instance>.tc.pktsize
dev.<ifname>.<instance>.tc.burstsize

Sponsored by: Chelsio Communications

r337540:
cxgbe(4): Display pkt-size and burst-size in traffic class parameters.


# 346866 29-Apr-2019 np

MFC r334467:

cxgbe(4): Retire an old check.


# 346863 29-Apr-2019 np

MFC r334138:

r334138: cxgbe(4): Make FW4_ACK a shared CPL.


# 346855 28-Apr-2019 np

MFC r333153, r333394, r333442, r333472, r333620, r334058, r334447,
r334452, and r335684. These revisions added hashfilters, NAT offload,
and SMAC/DMAC swapping filters to cxgbe.

r333153:
cxgbe(4): Move all TCAM filter code into a separate file.

Sponsored by: Chelsio Communications

r333394:
cxgbe(4): Add support for hash filters.

These filters reside in the card's memory instead of its TCAM and can be
configured via a new "hashfilter" subcommand in cxgbetool. Hash and
normal TCAM filters can be used together. The hardware does an
exact-match of packet fields for hash filters, unlike the masked match
performed for TCAM filters. Any T5/T6 card with memory can support at
least half a million hash filters. The sample config file with the
driver configures 512K of these, it is possible to double this to 1
million+ in some cases.

The chip does an exact-match of fields of incoming datagrams with hash
filters and performs the action configured for the filter if it matches.
The fields to match are specified in a "filter mask" in the firmware
config file. The filter mask always includes the 5-tuple (sip, dip,
sport, dport, ipproto). It can, optionally, also include any subset of
the filter mode (see filterMode and filterMask in the firmware config
file).

For example:
filterMode = fragmentation, mpshittype, protocol, vlan, port, fcoe
filterMask = protocol, port, vlan

Exact values of the 5-tuple, the physical port, and VLAN tag would have
to be provided while setting up a hash filter with the chip
configuration above.

Hash filters support all actions supported by TCAM filters. A packet
that hits a hash filter can be dropped, let through (with optional
steering to a specific queue or RSS region), switched out of another
port (with optional L2 rewrite of DMAC, SMAC, VLAN tag), or get NAT'ed.
(Support for some of these will show up in the driver in a follow-up
commit very shortly).

Sponsored by: Chelsio Communications

r333442:
cxgbe(4): Determine whether the firmware supports the FILTER2 work
request, which can be used to configure hardware NAT and swapmac.

All firmwares released after Jan 2017 support this work request.

Sponsored by: Chelsio Communications

r333472:
cxgbe(4): Add fields to support configuration of hardware NAT and
swapmac (SMAC/DMAC switcheroo) from userspace.

Sponsored by: Chelsio Communications

r333620:
cxgbe(4): Filtering related features and fixes.

- Driver support for hardware NAT.
- Driver support for swapmac action.
- Validate a request to create a hashfilter against the filter mask.
- Add a hashfilter config file for T5.

Sponsored by: Chelsio Communications

r334058:
cxgbe(4): Only valid filters are expected to have a valid tid.

r334447:
cxgbe(4): Add code to deal with the chip's source MAC table (aka SMT).

Submitted by: Krishnamraju Eraparaju @ Chelsio
Sponsored by: Chelsio Communications

r334452:
cxgbe(4): Add support for SMAC-rewriting filters.

Submitted by: Krishnamraju Eraparaju @ Chelsio
Sponsored by: Chelsio Communications

r335684:
cxgbe(4): Do not leak the filters in the hashfilter table on module
unload.

Sponsored by: Chelsio Communications

Relnotes: Yes


# 346852 28-Apr-2019 np

MFC r333114:

cxgbe(4): Use opaque cookies or tid range-checks to determine the
intended recipient of a CPL when it can't be determined solely from the
opcode. Retire the per-queue handlers for such CPLs in favor of the new
scheme.

Sponsored by: Chelsio Communications


# 346850 28-Apr-2019 np

MFC r333043:

cxgbe(4): Move release_tid to the base NIC driver for future consumers.

Sponsored by: Chelsio Communications.


# 346849 28-Apr-2019 np

MFC r333030:

cxgbe(4): Break up alloc_tid_tabs and move the atid routines to the base
NIC driver. The atid services will be used by new features (hashfilters
and inline TLS) that do not involve TOE.

Sponsored by: Chelsio Communications


# 346848 28-Apr-2019 np

MFC r331902:

r331902: cxgbe: Implement tcp_info handler for connections handled by t4_tom.


# 346805 28-Apr-2019 np

MFC r317849 (partial), r332506, and r332787.

r317849 (partial, required by r332506):
cxgbe/t4_tom: Per-connection rate limiting for TCP sockets handled by
the TOE.

Sponsored by: Chelsio Communications

r332506:
cxgbe(4): Add support for Connection Offload Policy (aka COP).

COP allows fine-grained control on whether to offload a TCP connection
using t4_tom, and what settings to apply to a connection selected for
offload. t4_tom must still be loaded and IFCAP_TOE must still be
enabled for full TCP offload to take place on an interface. The
difference is that IFCAP_TOE used to be the only knob and would enable
TOE for all new connections on the inteface, but now the driver will
also consult the COP, if any, before offloading to the hardware TOE.

A policy is a plain text file with any number of rules, one per line.
Each rule has a "match" part consisting of a socket-type (L = listen,
A = active open, P = passive open, D = don't care) and a pcap-filter(7)
expression, and a "settings" part that specifies whether to offload the
connection or not and the parameters to use if so. The general format
of a rule is: [socket-type] expr => settings

Example. See cxgbetool(8) for more information.
[L] ip && port http => offload
[L] port 443 => !offload
[L] port ssh => offload
[P] src net 192.168/16 && dst port ssh => offload !nagle !timestamp cong newreno
[P] dst port ssh => offload !nagle ecn cong tahoe
[P] dst port http => offload
[A] dst port 443 => offload tls
[A] dst net 192.168/16 => offload !timestamp cong highspeed

The driver processes the rules for each new listen, active open, or
passive open and stops at the first match. There is an implicit rule at
the end of every policy that prohibits offload when no rule in the
policy matches:
[D] all => !offload

This is a reworked and expanded version of a patch submitted by
Krishnamraju Eraparaju @ Chelsio.

Sponsored by: Chelsio Communications

r332787:
cxgbe(4): Fix bugs in the handling of COP rules that match on VLAN tag.

Retrieve the tag from the correct ifnet and use the provided tag
(instead of hardcoded 0xffff, implying no tag) in the routines that
process offload policy.

Submitted by: Krishnamraju Eraparaju @ Chelsio
Sponsored by: Chelsio Communications


# 345664 28-Mar-2019 jhb

MFC 330040,330041,330079,330884,330946,330947,331649,333068,333810,337722,
340466,340468,340469,340473: Add TOE-based TLS offload.

Note that this requires a modified OpenSSL library.

330040:
Fetch TLS key parameters from the firmware.

The parameters describe how much of the adapter's memory is reserved for
storing TLS keys. The 'meminfo' sysctl now lists this region of adapter
memory as 'TLS keys' if present.

330041:
Move ccr_aes_getdeckey() from ccr(4) to the cxgbe(4) driver.

This routine will also be used by the TOE module to manage TLS keys.

330079:
Move #include for rijndael.h out of x86-specific region.

The #include was added inside of the conditional by accident and the lack
of it broke non-x86 builds.

330884:
Support for TLS offload of TOE connections on T6 adapters.

The TOE engine in Chelsio T6 adapters supports offloading of TLS
encryption and TCP segmentation for offloaded connections. Sockets
using TLS are required to use a set of custom socket options to upload
RX and TX keys to the NIC and to enable RX processing. Currently
these socket options are implemented as TCP options in the vendor
specific range. A patched OpenSSL library will be made available in a
port / package for use with the TLS TOE support.

TOE sockets can either offload both transmit and reception of TLS
records or just transmit. TLS offload (both RX and TX) is enabled by
setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be
enabled on the relevant interface. Transmit offload can be used on
any "normal" or TLS TOE socket by using the custom socket option to
program a transmit key. This permits most TOE sockets to
transparently offload TLS when applications use a patched SSL library
(e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL
library). Receive offload can only be used with TOE sockets using the
TLS mode. The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a
list of TCP port numbers. Any connection with either a local or
remote port number in that list will be created as a TLS socket rather
than a plain TOE socket. Note that although this sysctl accepts an
arbitrary list of port numbers, the sysctl(8) tool is only able to set
sysctl nodes to a single value. A TLS socket will hang without
receiving data if used by an application that is not using a patched
SSL library. Thus, the tls_rx_ports node should be used with care.
For a server mostly concerned with offloading TLS transmit, this node
is not needed as plain TOE sockets will fall back to software crypto
when using an unpatched SSL library.

New per-interface statistics nodes are added giving counts of TLS
packets and payload bytes (payload bytes do not include TLS headers or
authentication tags/MACs) offloaded via the TOE engine, e.g.:

dev.cc.0.stats.rx_tls_octets: 149
dev.cc.0.stats.rx_tls_records: 13
dev.cc.0.stats.tx_tls_octets: 26501823
dev.cc.0.stats.tx_tls_records: 1620

TLS transmit work requests are constructed by a new variant of
t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.

TLS transmit work requests require a buffer containing IVs. If the
IVs are too large to fit into the work request, a separate buffer is
allocated when constructing a work request. This buffer is associated
with the transmit descriptor and freed when the descriptor is ACKed by
the adapter.

Received TLS frames use two new CPL messages. The first message is a
CPL_TLS_DATA containing the decryped payload of a single TLS record.
The handler places the mbuf containing the received payload on an
mbufq in the TOE pcb. The second message is a CPL_RX_TLS_CMP message
which includes a copy of the TLS header and indicates if there were
any errors. The handler for this message places the TLS header into
the socket buffer followed by the saved mbuf with the payload data.
Both of these handlers are contained in tom/t4_tls.c.

A few routines were exposed from t4_cpl_io.c for use by t4_tls.c
including send_rx_credits(), a new send_rx_modulate(), and
t4_close_conn().

TLS keys for both transmit and receive are stored in onboard memory
in the NIC in the "TLS keys" memory region.

In some cases a TLS socket can hang with pending data available in the
NIC that is not delivered to the host. As a workaround, TLS sockets
are more aggressive about sending CPL_RX_DATA_ACK messages anytime that
any data is read from a TLS socket. In addition, a fallback timer will
periodically send CPL_RX_DATA_ACK messages to the NIC for connections
that are still in the handshake phase. Once the connection has
finished the handshake and programmed RX keys via the socket option,
the timer is stopped.

A new function select_ulp_mode() is used to determine what sub-mode a
given TOE socket should use (plain TOE, DDP, or TLS). The existing
set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and
handles initialization of TLS-specific state when necessary in
addition to DDP-specific state.

Since TLS sockets do not receive individual TCP segments but always
receive full TLS records, they can receive more data than is available
in the current window (e.g. if a 16k TLS record is received but the
socket buffer is itself 16k). To cope with this, just drop the window
to 0 when this happens, but track the overage and "eat" the overage as
it is read from the socket buffer not opening the window (or adding
rx_credits) for the overage bytes.

330946:
Remove TLS-related inlines from t4_tom.h to fix iw_cxgbe(4) build.

- Remove the one use of is_tls_offload() and the function. AIO special
handling only needs to be disabled when a TOE socket is actively doing
TLS offload on transmit. The TOE socket's mode (which affects receive
operation) doesn't matter, so remove the check for the socket's mode and
only check if a TOE socket has TLS transmit keys configured to determine
if an AIO write request should fall back to the normal socket handling
instead of the TOE fast path.
- Move can_tls_offload() into t4_tls.c. It is not used in critical paths,
so inlining isn't that important. Change return type to bool while here.

330947:
Fix the check for an empty send socket buffer on a TOE TLS socket.

Compare sbavail() with the cached sb_off of already-sent data instead of
always comparing with zero. This will correctly close the connection and
send the FIN if the socket buffer contains some previously-sent data but
no unsent data.

331649:
Use the offload transmit queue to set flags on TLS connections.

Requests to modify the state of TLS connections need to be sent on the
same queue as TLS record transmit requests to ensure ordering.

However, in order to use the offload transmit queue in t4_set_tcb_field(),
the function needs to be updated to do proper flow control / credit
management when queueing a request to an offload queue. This required
passing a pointer to the toepcb itself to this function, so while here
remove the 'tid' and 'iqid' parameters and obtain those values from the
toepcb in t4_set_tcb_field() itself.

333068:
Use the correct key address when renegotiating the transmit key.

Previously, get_keyid() was returning the address of the receive key
instead of the transmit key when renegotiating the transmit key. This
could either hang the card (if a connection was only offloading TLS TX
and thus had a receive key address of -1) or cause the connection to
fail by overwriting the wrong key (if both RX and TX TLS were
offloaded).

333810:
Be more robust against garbage input on a TOE TLS TX socket.

If a socket is closed or shutdown and a partial record (or what
appears to be a partial record) is waiting in the socket buffer,
discard the partial record and close the connection rather than
waiting forever for the rest of the record.

337722:
Whitespace nit in t4_tom.h

340466:
Move the TLS key map into the adapter softc so non-TOE code can use it.

340468:
Change the quantum for TLS key addresses to 32 bytes.

The addresses passed when reading and writing keys are always shifted
right by 5 as the memory locations are addressed in 32-byte chunks, so
the quantum needs to be 32, not 8.

340469:
Remove bogus roundup2() of the key programming work request header.

The key context is always placed immediately after the work request
header. The total work request length has to be rounded up by 16
however.

340473:
Restore the <sys/vmem.h> header to fix build of cxgbe(4) TOM.

vmem's are not just used for TLS memory in TOM and the #include actually
predates the TLS code so should not have been removed when the TLS vmem
moved in r340466.

Sponsored by: Chelsio Communications


# 345040 11-Mar-2019 jhb

MFC 318429,318967,319721,319723,323600,323724,328353-328361,330042,343056:
Add a driver for the Chelsio T6 crypto accelerator engine.

Note that with the set of commits in this batch, no additional tunables
are needed to use the driver once it is loaded.

318429:
Add a driver for the Chelsio T6 crypto accelerator engine.

The ccr(4) driver supports use of the crypto accelerator engine on
Chelsio T6 NICs in "lookaside" mode via the opencrypto framework.

Currently, the driver supports AES-CBC, AES-CTR, AES-GCM, and AES-XTS
cipher algorithms as well as the SHA1-HMAC, SHA2-256-HMAC, SHA2-384-HMAC,
and SHA2-512-HMAC authentication algorithms. The driver also supports
chaining one of AES-CBC, AES-CTR, or AES-XTS with an authentication
algorithm for encrypt-then-authenticate operations.

Note that this driver is still under active development and testing and
may not yet be ready for production use. It does pass the tests in
tests/sys/opencrypto with the exception that the AES-GCM implementation
in the driver does not yet support requests with a zero byte payload.

To use this driver currently, the "uwire" configuration must be used
along with explicitly enabling support for lookaside crypto capabilities
in the cxgbe(4) driver. These can be done by setting the following
tunables before loading the cxgbe(4) driver:

hw.cxgbe.config_file=uwire
hw.cxgbe.cryptocaps_allowed=-1

318967:
Fail large requests with EFBIG.

The adapter firmware in general does not accept PDUs larger than 64k - 1
bytes in size. Sending crypto requests larger than this size result in
hangs or incorrect output, so reject them with EFBIG. For requests
chaining an AES cipher with an HMAC, the firmware appears to require
slightly smaller requests (around 512 bytes).

319721:
Add explicit handling for requests with an empty payload.

- For HMAC requests, construct a special input buffer to request an empty
hash result.
- For plain cipher requests and requests that chain an AES cipher with an
HMAC, fail with EINVAL if there is no cipher payload. If needed in
the future, chained requests that only contain AAD could be serviced as
HMAC-only requests.
- For GCM requests, the hardware does not support generating the tag for
an AAD-only request. Instead, complete these requests synchronously
in software on the assumption that such requests are rare.

319723:
Fix the software fallback for GCM to validate the existing tag for decrypts.

323600:
Fix some incorrect sysctl pointers for some error stats.

The bad_session, sglist_error, and process_error sysctl nodes were
returning the value of the pad_error node instead of the appropriate
error counters.

323724:
Enable support for lookaside crypto operations by default.

This permits ccr(4) to be used with the default firmware configuration
file.

328353:
Always store the IV in the immediate portion of a work request.

Combined authentication-encryption and GCM requests already stored the
IV in the immediate explicitly. This extends this behavior to block
cipher requests to work around a firmware bug. While here, simplify
the AEAD and GCM handlers to not include always-true conditions.

328354:
Always set the IV location to IV_NOP.

The firmware ignores this field in the FW_CRYPTO_LOOKASIDE_WR work
request.

328355:
Reject requests with AAD and IV larger than 511 bytes.

The T6 crypto engine's control messages only support a total AAD
length (including the prefixed IV) of 511 bytes. Reject requests with
large AAD rather than returning incorrect results.

328356:
Don't discard AAD and IV output data for AEAD requests.

The T6 can hang when processing certain AEAD requests if the request
sets a flag asking the crypto engine to discard the input IV and AAD
rather than copying them into the output buffer. The existing driver
always discards the IV and AAD as we do not need it. As a workaround,
allocate a single "dummy" buffer when the ccr driver attaches and
change all AEAD requests to write the IV and AAD to this scratch
buffer. The contents of the scratch buffer are never used (similar to
"bogus_page"), and it is ok for multiple in-flight requests to share
this dummy buffer.

328357:
Fail crypto requests when the resulting work request is too large.

Most crypto requests will not trigger this condition, but a request
with a highly-fragmented data buffer (and a resulting "large" S/G
list) could trigger it.

328358:
Clamp DSGL entries to a length of 2KB.

This works around an issue in the T6 that can result in DMA engine
stalls if an error occurs while processing a DSGL entry with a length
larger than 2KB.

328359:
Expand the software fallback for GCM to cover more cases.

- Extend ccr_gcm_soft() to handle requests with a non-empty payload.
While here, switch to allocating the GMAC context instead of placing
it on the stack since it is over 1KB in size.
- Allow ccr_gcm() to return a special error value (EMSGSIZE) which
triggers a fallback to ccr_gcm_soft(). Move the existing empty
payload check into ccr_gcm() and change a few other cases
(e.g. large AAD) to fallback to software via EMSGSIZE as well.
- Add a new 'sw_fallback' stat to count the number of requests
processed via the software fallback.

328360:
Don't read or generate an IV until all error checking is complete.

In particular, this avoids edge cases where a generated IV might be
written into the output buffer even though the request is failed with
an error.

328361:
Store IV in output buffer in GCM software fallback when requested.

Properly honor the lack of the CRD_F_IV_PRESENT flag in the GCM
software fallback case for encryption requests.

330042:
Don't overflow the ipad[] array when clearing the remainder.

After the auth key is copied into the ipad[] array, any remaining bytes
are cleared to zero (in case the key is shorter than one block size).
The full block size was used as the length of the zero rather than the
size of the remaining ipad[]. In practice this overflow was harmless as
it could only clear bytes in the following opad[] array which is
initialized with a copy of ipad[] in the next statement.

343056:
Reject new sessions if the necessary queues aren't initialized.

ccr reuses the control queue and first rx queue from the first port on
each adapter. The driver cannot send requests until those queues are
initialized. Refuse to create sessions for now if the queues aren't
ready. This is a workaround until cxgbe allocates one or more
dedicated queues for ccr.

Relnotes: yes
Sponsored by: Chelsio Communications


# 344858 06-Mar-2019 jhb

MFC 341098: Add read-only sysctls for all tunables in the cxgbe(4) driver.


# 339399 17-Oct-2018 np

MFC r338924:

cxgbe(4): Link related changes.

- Switch to using 32b port/link capabilities in the driver. The 32b
format is used internally by firmwares > 1.16.45.0 and the driver will
now interact with the firmware in its native format, whether it's 16b
or 32b. Note that the 16b format doesn't have room for 50G, 200G, or
400G speeds.

- Add a bit in the pause_settings knobs to allow negotiated PAUSE
settings to override manual settings.

- Ensure that manual link settings persist across an administrative
down/up as well as transceiver unplug/replug.

- Remove unused is_*G_port() functions.

Sponsored by: Chelsio Communications


# 339396 17-Oct-2018 np

MFC r325840, r327811, and r329701.

r325840:
CXGBE: fix big-endian behaviour

The setbit/clearbit pair casts the bitfield pointer
to uint8_t* which effectively treats its contents as
little-endian variable. The ffs() function accepts int as
the parameter, which is big-endian. Use uint8_t here to
avoid mismatch, as we have only 4 doorbells.

Submitted by: Wojciech Macek <wma@freebsd.org>
Reviewed by: np
Obtained from: Semihalf
Sponsored by: QCM Technologies
Differential revision: https://reviews.freebsd.org/D13084

r327811:
CXGBE: fix get_filt to be endianness-aware

Unconditional 32-bit shift is not endianness-safe.
Modify the logic to work both on LE and BE.

Submitted by: Wojciech Macek <wma@freebsd.org>
Reviewed by: np
Obtained from: Semihalf
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13102

r329701:
CXGBE: implement prefetch on non-Intel architectures

Submitted by: Michal Stanek <mst@semihalf.com>
Obtained from: Semihalf
Reviewed by: np, pdk@semihalf.com
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14452


# 334562 03-Jun-2018 np

MFC r333650, r333652, r333682, r334406, r334409-r334410, and r334489.

r333650:
cxgbe(4): Claim some more T5 and T6 boards.

r333652:
cxgbe(4): Add support for two more flash parts.

r333682:
cxgbe(4): Fall back to a failsafe configuration built into the firmware
if an error is reported while pre-processing the configuration file that
the driver attempted to use.

Also, allow the user to explicitly use the built-in configuration with
hw.cxgbe.config_file="built-in"

r334406:
cxgbe(4): Consider all supported speeds when building the ifmedia list
for a port. Fix other related issues while here:
- Require port lock for access to link_config.
- Allow 100Mbps operation by tracking the speed in Mbps. Yes, really.
- New port flag to indicate that the media list is immutable. It will
be used in future refinements.

This also fixes a bug where the driver reports incorrect media with
recent firmwares.

r334409:
cxgbe(4): Implement ifm_change callback.

r334410:
cxgbe(4): Use ifm for ifmedia just like the rest of the kernel.

No functional change.

r334489:
cxgbe(4): Include full duplex mediaopt in media that can be reported as
active. Always report full duplex in active media.

Sponsored by: Chelsio Communications


# 331769 30-Mar-2018 hselasky

MFC r303505, r303506, r303512, r303513, r303646, r320418, r323082,
r326169, r326563, r326649, r326716, r326764, r326765 and r329222:

RoCE/infiniband/iWarp upgrade to Linux 4.9 for kernel and userspace.
This commit merges projects/bsd_rdma_4_9 to 11-stable.

Compatibility wrappers have been made for existing 11-stable ibcore
APIs, including ib_reg_phys_mr().
Refer to "sys/ofed/include/rdma/ib_verbs_compat.h" for more information.

The iw_cxgb driver has not been updated and has been disconnected from
the build.

Sponsored by: Mellanox Technologies

MFC r326169 and r326563:
RoCE/infiniband upgrade to Linux v4.9 for kernel and userspace.

List of kernel sources used:
============================

1) kernel sources were cloned from git://github.com/torvalds/linux.git
Top commit 69973b830859bc6529a7a0468ba0d80ee5117826 - tag: v4.9, linux-4.9

2) krping was cloned from https://github.com/larrystevenwise/krping
Top commit 292a2f1abf0348285e678a82264740d52e4dcfe4

List of userspace sources used:
===============================

1) rdma-core was cloned from https://github.com/linux-rdma/rdma-core.git
Top commit d65138ef93af30b3ea249f3a84aa6a24ba7f8a75

2) OpenSM was cloned from git://git.openfabrics.org/~halr/opensm.git
Top commit 85f841cf209f791c89a075048a907020e924528d

3) libibmad was cloned from git://git.openfabrics.org/~iraweiny/libibmad.git
Tag 1.3.13 with some additional patches from Mellanox.

4) infiniband-diags was cloned from git://git.openfabrics.org/~iraweiny/infiniband-diags.git
Tag 1.6.7 with some additional patches from Mellanox.

NOTES:
======

1) The mthca driver has been removed from userspace.
2) All GPLv2 only sources have been removed and where applicable
rewritten from scratch under a BSD license.
3) List of fully supported drivers in userspace and kernel:
a) iw_cxgbe (Chelsio)
b) mlx4ib (Mellanox)
c) mlx5ib (Mellanox)
4) WITH_OFED=YES is still required by make in order to build
OFED userspace and kernel code.
5) Full support has been added for routable RoCE, RoCE v2.

MFC r326649:
Disconnect OFED after r326169 broke all DIRDEPS support for it.

MFC r326716:
Correctly define the unordered_map namespace in ofed/libibnetdisc .

This should fix ofed/libibnetdisc compilation with C-compilers
different from clang and GCC v4.2.1.

Submitted by: kib
Sponsored by: Mellanox Technologies

MFC r326764:
ofed: Remove duplicated symbols from the version file.

ld.bfd accepts multiple listing of the same symbol in the version script.
lld is stricter and errors out. Since arm64 and sometimes amd64 use lld,
we should correct this cosmetic issue.

Sponsored by: Mellanox Technologies
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D13329

MFC r326765:
ofed: Define barriers for mips and arm.

I used the strongest barriers available on the architectures, so if
the future analysis show that it is excessive, the barriers could be
relaxed. Still, it is unlikely that it is meaningful to run IB on 32bit
ARM or current MIPS machines, so the change is to make WITH_OFED to pass
tinderbox.

Sponsored by: Mellanox Technologies
Reviewed by: hselasky
Differential revision: https://reviews.freebsd.org/D13329

MFC r303505:
sdp: Use an mbufq for received control packets.

This is simpler than the hand-rolled queue, and fixes a use-after-free.

Sponsored by: EMC / Isilon Storage Division

MFC r303506:
sdp: Destroy the PCB lock before freeing to the zone.

Sponsored by: EMC / Isilon Storage Division

MFC r303512:
sdp: Use malloc(9) instead of the Linux compat layer.

SDP transmit and receive rings are always created in a sleepable context,
so we can use M_WAITOK and remove error checks.

Sponsored by: EMC / Isilon Storage Division

MFC r303513:
sdp: Destroy the RDMA ID after destroying the connection's queue pair.

This is the ordering documented by rdma_destroy_qp(). Also add a useful
KASSERT to sdp_pcbfree().

Sponsored by: EMC / Isilon Storage Division

MFC r303646:
ipoib: Bound the number of egress mbufs buffered during pathrec lookups.

In pathological situations where the master subnet manager becomes
unresponsive for an extended period, we may otherwise end up queuing all
of the system's mbufs while waiting for a response to a path record lookup.

This addresses the same issue as commit 1e85b806f9 in Linux.

Reviewed by: cem, ngie
Sponsored by: EMC / Isilon Storage Division

MFC r329222:
Import the mthca kernel side infiniband driver from Linux 4.9 and fix
compilation under FreeBSD. The mthca driver was temporarily removed as
part of the Linux 4.9 RoCE/infinband upgrade.

Top commit in Linux source tree:
69973b830859bc6529a7a0468ba0d80ee5117826

Sponsored by: Mellanox Technologies

MFC r320418. Note that the socket lock _is_ the same as so_rcv's lock
in 11 and this is a no-op in this branch.

Sponsored by: Chelsio Communications

MFC r323082:
cxgbe/iw_cxgbe: Set TCP_NODELAY before initiating connection so that
t4_tom picks it up right away. This is less work than waiting for
the connection to be established before applying the setting.

Sponsored by: Chelsio Communications


# 331722 29-Mar-2018 eadler

Revert r330897:

This was intended to be a non-functional change. It wasn't. The commit
message was thus wrong. In addition it broke arm, and merged crypto
related code.

Revert with prejudice.

This revert skips files touched in r316370 since that commit was since
MFCed. This revert also skips files that require $FreeBSD$ property
changes.

Thank you to those who helped me get out of this mess including but not
limited to gonzo, kevans, rgrimes.

Requested by: gjb (re)


# 330897 14-Mar-2018 eadler

Partial merge of the SPDX changes

These changes are incomplete but are making it difficult
to determine what other changes can/should be merged.

No objections from: pfg


# 330307 03-Mar-2018 np

MFC r319506, r319872, r321063, r321103, r321179, r321390, r321435,
r321582, r321671, r322014, r322034, r322055, r322123, r322167, r322425,
r322549, r322914, r322960, r322962, r322964, r322985, r322990, r323006,
r323026, r323041, r323069, r323078, r323343, r323514, r323520, r324296,
r324379, r324386, r324443, r324945, r325596, r325680, r325880,
r325883-r325884, r325961, r326026, r326042, r327062, r327093, r327332,
r327528, r328420, and r328423.

r319506:
cxgbe(4): Update the statistics for compound tx work requests once per
work request, not once per frame.

r319872:
cxgbe(4): Do not request an FEC setting that the port does not support.

r321063:
cxgbe(4): Various link/media related improvements.

- Deal with changes to port_type, and not just port_mod when a
transceiver is changed. This fixes hot swapping of transceivers of
different types (QSFP+ or QSA or QSFP28 in a QSFP28 port, SFP+ or
SFP28 in a SFP28 port, etc.).

- Always refresh media information for ifconfig if the port is down.
The firmware does not generate tranceiver-change interrupts unless at
least one VI is enabled on the physical port. Before this change
ifconfig diplayed potentially stale information for ports that were
administratively down.

- Always recalculate and reapply L1 config on a transceiver change.

- Display PAUSE settings in ifconfig. The driver sysctls for this
continue to work as well.

r321103:
cxgbe(4): New ioctls to flash bootrom and boot config to the card.

r321179:
cxgbe/t4_tom: Log more details about the newly ESTABLISHED tid to the
trace buffer.

r321390:
cxgbe(4): Install the firmware bundled with the driver to the card if it
doesn't seem to have one. This lets the driver recover automatically
from incomplete firmware upgrades (panic, reboot, power loss, etc. in
the middle of an upgrade).

r321435:
cxgbe(4): Display some more TOE parameters related to retransmission
and keepalive in the sysctl MIB. Provide tunables to change some of
these parameters. These are supposed to be setup by the firmware so
these tunables are for experimentation only.

r321582:
cxgbe(4): Some updates to the common code.

- Updated register ranges.
- Helper routines for access to TP registers.
- Updated routine to read flash parameters.

r321671:
cxgbe/iw_cxgbe: Log the end point's history and flags to the trace
buffer just before it's freed.

r322014:
cxgbe(4): Initial import of the "collect" component of Chelsio unified
debug (cudbg) code, hooked up to the main driver via an ioctl.

The ioctl can be used to collect the chip's internal state in a
compressed dump file. These dumps can be decoded with the "view"
component of cudbg.

r322034:
cxgbe(4): Always use the first and not the last virtual interface
associated with a port in begin_synchronized_op.

r322055:
cxgbe(4): Allow the TOE timer tunables to be set with microsecond
precision. These timers are already displayed in microseconds in the
sysctl MIB. Add variables to track these tunables while here.

r322123:
cxgbe(4): Avoid a NULL dereference that would occur during module unload
if there were problems earlier during attach.

r322167:
cxgbe(4): Add the T6 and T5 Unified Wire configuration files to the
kernel, just like for T4, when the driver is compiled into the kernel.

r322425:
cxgbe(4): Save the last reported link parameters and compare them with
the current state to determine whether to generate a link-state change
notification. This fixes a bug introduced in r321063 that caused the
driver to sometimes skip these notifications.

r322549:
cxgbe/t4_tom: Use correct name for the ISS-valid bit in options2.

r322914:
cxgbe(4): Dump the mailbox contents in the same format as CH_DUMP_MBOX.

r322960:
cxgbe(4): Verify that the driver accesses the firmware mailbox in a
thread-safe manner.

r322962:
cxgbe(4): Remove write only variable from t4_port_init.

r322964:
cxgbe(4): vi_mac_funcs should include the base Ethernet function. It is
already used in the driver as if it does.

r322985:
cxgbe(4): Maintain one ifmedia per physical port instead of one per
Virtual Interface (VI). All autonomous VIs that share a port share the
same media.

r322990:
cxgbe(4): Do not access the mailbox without appropriate locks while
creating hardware VIs.

This fixes a bad race on systems with hw.cxgbe.num_vis > 1.

r323006:
cxgbe(4): Update T6/T5/T4 firmwares to 1.16.59.0.

r323026:
cxgbe(4): Zero out the memory allocated for the debug dump.
cudbg_collect seems to expect it this way.

r323041:
cxgbe(4): Add two new debug flags -- one to allow manual firmware
install after full initialization, and another to disable the TCB
cache (T6+). The latter works as a tunable only.

Note that debug_flags are for debugging only and should not be set
normally.

r323069:
cxgbe/t4_tom: Add a knob to select the congestion control algorigthm
used by the TOE hardware for fully offloaded connections. The knob
affects new connections only.

r323078:
cxgbe/t4_tom: There may not be a tid to update if the connection isn't
established.

r323343:
cxgbe(4): Fix a couple of problems in the sge_wrq data path.

- start_wrq_wr must not drain the wr_list if there are incomplete_wrs
pending. This can happen when a t4_wrq_tx runs between two
start_wrq_wr.

- commit_wrq_wr must examine the cookie's pidx and ndesc with the
queue's lock held. Otherwise there is a bad race when incomplete WRs
are being completed and commit_wrq_wr for the WR that is ahead in the
queue updates the next incomplete WR's cookie's pidx/ndesc but the
commit_wrq_wr for the second one is using stale values that it read
without the lock.

r323514:
cxgbetool(8): mode must be specified when creating the dump file.

r323520:
cxgbe(4): Ignore capabilities that depend on TOE when the firmware
reports TOE is not available.

r324296:
cxgbe(4): Provide knobs to set the holdoff parameters of TOE rx queues
separately from NIC rx queues instead of using the same parameters for
both types of queues.

r324379:
cxgbetool(8): Do not create a large file devoid of useful content when
the dumpstate ioctl fails. Make the file world-readable while here.

r324386:
cxgbe(4): Update T6, T5, and T4 firmwares to 1.16.63.0.

r324443:
cxgbetool(8): Do not close uninitialized fd on malloc failure.

r324945:
cxgbe(4): Read the MPS buffer group map from the firmware as it could be
different from hardware defaults. The congestion channel map, which is
still fixed, needs to be tracked separately now. Change the congestion
setting for TOE rx queues to match the drivers on other OSes while here.

r325596:
cxgbe(4): Do not request settings not supported by the port.

r325680:
cxgbe(4): Excluce mdi from the check against port capabilities.

r325880:
cxgbe(4): Combine all _10g and _1g tunables and drop the suffix from
their names. The finer-grained knobs weren't practically useful.

r325883:
cxgbe(4): Sanitize t4_num_vis during MOD_LOAD like all other t4_*
tunables. Add num_vis to the intrs_and_queues structure as it affects
the number of interrupts requested and queues created. In future
cfg_itype_and_nqueues might lower it incrementally instead of going
straight to 1 when enough interrupts aren't available.

r325884:
cxgbe(4): Remove rsrv_noflowq from intrs_and_queues structure as it does
not influence or get affected by the number of interrupts or queues.

r325961:
cxgbe(4): Add core Vdd to the sysctl MIB.

r326026:
cxgbe(4): Add a custom board to the device id list.

r326042:
cxgbe(4): Fix unsafe mailbox access in cudbg.

r327062:
cxgbe(4): Read the MFG diags version from the VPD and make it available
in the sysctl MIB.

r327093:
cxgbe(4): Do not forward interrupts to queues with freelists. This
leaves the firmware event queue (fwq) as the only queue that can take
interrupts for others.

This simplifies cfg_itype_and_nqueues and queue allocation in the driver
at the cost of a little (never?) used configuration. It also allows
service_iq to be split into two specialized variants in the future.

r327332:
cxgbe(4): Reduce duplication by consolidating minor variations of the
same code into a single routine.

r327528:
cxgbe(4): Add a knob to enable/disable PCIe relaxed ordering. Disable it by
default when running on Intel CPUs.

r328420:
cxgbe(4): Do not display harmless warning in non-debug builds.

r328423:
cxgbe(4): Accept old names of a couple of tunables.

Sponsored by: Chelsio Communications


# 318854 25-May-2017 np

MFC r318014, r318091, r318125, and r318263.

r318014:
cxgbe(4): Fixes related to the knob that controls link autonegotiation.

- Do not leak the adapter lock in sysctl_autoneg.
- Accept only 0 or 1 as valid settings for autonegotiation.
- A fixed speed must be requested by the driver when autonegotiation is
disabled otherwise the firmware will reject the l1cfg command. Use
the top speed supported by the port for now.

r318091:
cxgbe(4): Do not assume that if_qflush is always followed by inteface-down.

r318125:
Adjust whitespace and fix a comment. No functional change.

r318263:
cxgbe(4): netmap-only interrupts for a VI do not have an associated rxq
or ofld_rxq and should be ignored by vi_intr_iq.

Sponsored by: Chelsio Communications


# 318850 25-May-2017 np

MFC r317702, r317847, r318307

r317702:
cxgbe(4): Support routines for Tx traffic scheduling.

- Create a new file, t4_sched.c, and move all of the code related to
traffic management from t4_main.c and t4_sge.c to this file.
- Track both Channel Rate Limiter (ch_rl) and Class Rate Limiter (cl_rl)
parameters in the PF driver.
- Initialize all the cl_rl limiters with somewhat arbitrary default
rates and provide routines to update them on the fly.
- Provide routines to reserve and release traffic classes.

r317847:
cxgbe(4): The Tx scheduler initialization either works or doesn't. It
doesn't need a refresh in either case.

r318307:
cxgbe(4): Avoid an out of bounds access when an attempt to unbind a tx
queue from a traffic class fails.

Sponsored by: Chelsio Communications


# 318842 25-May-2017 np

MFC r317041:

cxgbe: Add tunables to control the number of LRO entries and the number
of rx mbufs that should be presorted before LRO. There is no change in
default behavior.

Sponsored by: Chelsio Communications


# 311260 04-Jan-2017 np

MFC r309666, r310033, r310049, r310100, r310152, and r310807.

r309666:
cxgbe(4): unsigned short isn't large enough to store link speed (which
is in Mbps) for 100Gbps links.

r310033:
cxgbe(4): Retire t4_bus_space_read_8 and t4_bus_space_write_8.

r310049:
cxgbe(4): Fix the tid range shown for T6 cards in misc.tids.

r310100:
cxgbe(4): Deal with compressed error vectors.

r310152:
cxgbe(4): Fix typo in an unused macro.

r310807:
cxgbe(4): Updates to link configuration.

- Update struct link_settings and associated shared code.

- Add tunables to control FEC and autonegotiation. All ports inherit
these values as their initial settings.
hw.cxgbe.fec
hw.cxgbe.autoneg

- Add per-port sysctls to control FEC and autonegotiation. These can be
modified at any time.
dev.<port>.<n>.fec
dev.<port>.<n>.autoneg


# 309560 05-Dec-2016 jhb

MFC 305695,305696,305699,305702,305703,305713,305715,305827,305852,305906,
305908,306062,306063,306137,306138,306206,306216,306273,306295,306301,
306465,309302:
Add support for adapters using the Terminator T6 ASIC.

305695:
cxgbe(4): Set up fl_starve_threshold2 accurately for T6.

305696:
cxgbe(4): Use correct macro for header length with T6 ASICs. This
affects the transmit of the VF driver only.

305699:
cxgbe(4): Update the pad_boundary calculation for T6, which has a
different range of boundaries.

305702:
cxgbe(4): Use smaller min/max bursts for fl descriptors with a T6.

305703:
cxgbe(4): Deal with the slightly different SGE_STAT_CFG in T6.

305713:
cxgbe(4): Add support for additional port types and link speeds.

305715:
cxgbe(4): Catch up with the rename of tlscaps -> cryptocaps. TLS is one
of the capabilities of the crypto engine in T6.

305827:
cxgbe(4): Use the interface's viid to calculate the PF/VF/VFValid fields
to use in tx work requests.

305852:
cxgbe(4): Attach to cards with the Terminator 6 ASIC. T6 cards will
come up as 't6nex' nexus devices with 'cc' ports hanging off them.

The T6 firmware and configuration files will be added as soon as they
are released. For now the driver will try to work with whatever
firmware and configuration is on the card's flash.

305906:
cxgbe/t4_tom: The SMAC entry for a VI is at a different location in the T6.

305908:
cxgbe/t4_tom: Update the active/passive open code to support T6. Data
path works as-is.

306062:
cxgbe(4): Show wcwr_stats for T6 cards.

306063:
cxgbe(4): Setup congestion response for T6 rx queues.

306137:
cxgbetool: Add T6 support to the SGE context decoder.

306138:
Fix typo.

306206:
cxgbe(4): Catch up with the different layout of WHOAMI in T6.

Note that the code moved below t4_prep_adapter() as part of this change
because now it needs a working chip_id().

306216:
cxgbe(4): Fix the output of the "tids" sysctl on T6.

306273:
cxgbe(4): Fix netmap with T6, which doesn't encapsulate SGE_EGR_UPDATE
message inside a FW_MSG. The base NIC already deals with updates in
either form.

306295:
cxgbe(4): Support SIOGIFXMEDIA so that ifconfig displays correct media
for 25Gbps and 100Gbps ports. This should have been part of r305713,
which is when the driver first started reporting extended media types.

306301:
cxgbe(4): Use the port's top speed to figure out whether it is "high
speed" or not (for the purpose of calculating the number of queues etc.)
This does the right thing for 25Gbps and 100Gbps ports.

306465:
cxgbe(4): Claim the T6 -DBG card.

309302:
cxgbe(4): Include firmware for T6 cards in the driver. Update all
firmwares to 1.16.12.0.

Sponsored by: Chelsio Communications


# 309458 02-Dec-2016 jhb

MFC 302440,304873,305704,305985,306787,307531: Fixes for sysctls.

302440:
cxgbe(4): Add sysctl to display the RSS indirection table size for an
interface.

dev.cxl.<n>.rss_size
dev.vcxl.<n>.rss_size

304873:
cxgbe(4): Provide more details about the card in the sysctl MIB.

dev.t5nex.0.%desc: Chelsio T580-CR
dev.t5nex.0.hw_revision: 1
dev.t5nex.0.sn: PT13140042
dev.t5nex.0.pn: 110117150A0
dev.t5nex.0.ec: 0000000000000000
dev.t5nex.0.na: 0007432AF490
dev.t5nex.0.vpd_version: 3
dev.t5nex.0.scfg_version: 53255
dev.t5nex.0.bs_version: 1.1.0.0
dev.t5nex.0.er_version: 1.0.0.68
dev.t5nex.0.tp_version: 0.1.4.9
dev.t5nex.0.firmware_version: 1.16.2.0

305704:
cxgbe(4): Rename the debug_flags driver tunable/sysctl to dflags.
Tunables that end with _flags are special.

305985:
cxgbe(4): Fixes to wrq stats.

- Increment tx_wrs_copied in the correct place.
- Add tx_wrs_sspace to the sysctl MIB.

306787:
cxgbe(4): Fix whitespace in the pm_stats display.

307531:
cxgbe(4): Adjust whitespace to line up the column titles in cim_qcfg
with the values displayed.

Sponsored by: Chelsio Communications


# 306664 03-Oct-2016 jhb

MFC 303522,303647,303860,303880,304168,304169,304170,304479,304485,305549:
Chelsio T4/T5 VF driver.

303522:
Various fixes to the t4/5nex character device.

- Remove null open/close methods.
- Don't set d_flags to 0 explicitly.
- Remove t5_cdevsw as the .d_name member isn't really used and doesn't
warrant a separate cdevsw just for the name.
- Use ENOTTY as the error value for an unknown ioctl request.
- Use make_dev_s() to close race with setting si_drv1.

303647:
Store the offset of the KDOORBELL and GTS registers in the softc.

VF devices use a different register layout than PF devices. Storing
the offset in a value in the softc allows code to be shared between the
PF and VF drivers.

303860:
Reserve an adapter flag IS_VF to mark VF devices vs PF devices.

303880:
Track the base absolute ID of ingress and egress queues.

Use this to map an absolute queue ID to a logical queue ID in interrupt
handlers. For the regular cxgbe/cxl drivers this should be a no-op as
the base absolute ID should be zero. VF devices have a non-zero base
absolute ID and require this change. While here, export the absolute ID
of egress queues via a sysctl.

304168:
Make SGE parameter handling more VF-friendly.

Add fields to hold the SGE control register and free list buffer sizes to
the sge_params structure. Populate these new fields in
t4_init_sge_params() for PF devices and change t4_read_chip_settings() to
pull these values out of the params structure instead of reading
registers directly. This will permit t4_read_chip_settings() to be reused
for VF devices which cannot read SGE registers directly.

While here, move the call to t4_init_sge_params() to
get_params__post_init(). The VF driver will populate the SGE parameters
structure via a different method before calling t4_read_chip_settings().

304169:
Update mailbox writes to work with VF devices.

- Use alternate register locations for the data and control registers for
VFs.
- Do a dummy read to force the writes to the mailbox data registers to
post before the write to the control register on VFs.
- Do not check the PCI-e firmware register for errors on VFs.

304170:
Add support for register dumps on VF devices.

- Add handling of VF register sets to t4_get_regs_len() and t4_get_regs().
- While here, use t4_get_regs_len() in the ioctl handler for regdump
instead of inlining it.

304479:
Add structures for VF-specific adapter parameters.

While here, mark which parameters are PF-specific and which are
VF-specific.

304485:
Reorder sysctls so that nodes shared with the VF driver are added first.

This permits a single early return for VF devices in the routines that
add sysctl nodes.

305549:
Chelsio T4/T5 VF driver.

The cxgbev/cxlv driver supports Virtual Function devices for Chelsio
T4 and T4 adapters. The VF devices share most of their code with the
existing PF4 driver (cxgbe/cxl) and as such the VF device driver
currently depends on the PF4 driver.

Similar to the cxgbe/cxl drivers, the VF driver includes a t4vf/t5vf
PCI device driver that attaches to the VF device. It then creates
child cxgbev/cxlv devices representing ports assigned to the VF.
By default, the PF driver assigns a single port to each VF.

t4vf_hw.c contains VF-specific routines from the shared code used to
fetch VF-specific parameters from the firmware.

t4_vf.c contains the VF-specific PCI device driver and includes its
own attach routine.

VF devices are required to use a different firmware request when
transmitting packets (which in turn requires a different CPL message
to encapsulate messages). This alternate firmware request does not
permit chaining multiple packets in a single message, so each packet
results in a firmware request. In addition, the different CPL message
requires more detailed information when enabling hardware checksums,
so parse_pkt() on VF devices must examine L2 and L3 headers for all
packets (not just TSO packets) for VF devices. Finally, L2 checksums
on non-UDP/non-TCP packets do not work reliably (the firmware trashes
the IPv4 fragment field), so IPv4 checksums for such packets are
calculated in software.

Most of the other changes in the non-VF-specific code are to expose
various variables and functions private to the PF driver so that they
can be used by the VF driver.

Note that a limited subset of cxgbetool functions are supported on VF
devices including register dumps, scheduler classes, and clearing of
statistics. In addition, TOE is not supported on VF devices, only for
the PF interfaces.

Sponsored by: Chelsio Communications


# 302408 07-Jul-2016 gjb

Copy head@r302406 to stable/11 as part of the 11.0-RELEASE cycle.
Prune svn:mergeinfo from the new branch, as nothing has been merged
here.

Additional commits post-branch will follow.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


/freebsd-11-stable/MAINTAINERS
/freebsd-11-stable/cddl
/freebsd-11-stable/cddl/contrib/opensolaris
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/dtrace/test/tst/common/print
/freebsd-11-stable/cddl/contrib/opensolaris/cmd/zfs
/freebsd-11-stable/cddl/contrib/opensolaris/lib/libzfs
/freebsd-11-stable/contrib/amd
/freebsd-11-stable/contrib/apr
/freebsd-11-stable/contrib/apr-util
/freebsd-11-stable/contrib/atf
/freebsd-11-stable/contrib/binutils
/freebsd-11-stable/contrib/bmake
/freebsd-11-stable/contrib/byacc
/freebsd-11-stable/contrib/bzip2
/freebsd-11-stable/contrib/com_err
/freebsd-11-stable/contrib/compiler-rt
/freebsd-11-stable/contrib/dialog
/freebsd-11-stable/contrib/dma
/freebsd-11-stable/contrib/dtc
/freebsd-11-stable/contrib/ee
/freebsd-11-stable/contrib/elftoolchain
/freebsd-11-stable/contrib/elftoolchain/ar
/freebsd-11-stable/contrib/elftoolchain/brandelf
/freebsd-11-stable/contrib/elftoolchain/elfdump
/freebsd-11-stable/contrib/expat
/freebsd-11-stable/contrib/file
/freebsd-11-stable/contrib/gcc
/freebsd-11-stable/contrib/gcclibs/libgomp
/freebsd-11-stable/contrib/gdb
/freebsd-11-stable/contrib/gdtoa
/freebsd-11-stable/contrib/groff
/freebsd-11-stable/contrib/ipfilter
/freebsd-11-stable/contrib/ldns
/freebsd-11-stable/contrib/ldns-host
/freebsd-11-stable/contrib/less
/freebsd-11-stable/contrib/libarchive
/freebsd-11-stable/contrib/libarchive/cpio
/freebsd-11-stable/contrib/libarchive/libarchive
/freebsd-11-stable/contrib/libarchive/libarchive_fe
/freebsd-11-stable/contrib/libarchive/tar
/freebsd-11-stable/contrib/libc++
/freebsd-11-stable/contrib/libc-vis
/freebsd-11-stable/contrib/libcxxrt
/freebsd-11-stable/contrib/libexecinfo
/freebsd-11-stable/contrib/libpcap
/freebsd-11-stable/contrib/libstdc++
/freebsd-11-stable/contrib/libucl
/freebsd-11-stable/contrib/libxo
/freebsd-11-stable/contrib/llvm
/freebsd-11-stable/contrib/llvm/projects/libunwind
/freebsd-11-stable/contrib/llvm/tools/clang
/freebsd-11-stable/contrib/llvm/tools/lldb
/freebsd-11-stable/contrib/llvm/tools/llvm-dwarfdump
/freebsd-11-stable/contrib/llvm/tools/llvm-lto
/freebsd-11-stable/contrib/mdocml
/freebsd-11-stable/contrib/mtree
/freebsd-11-stable/contrib/ncurses
/freebsd-11-stable/contrib/netcat
/freebsd-11-stable/contrib/ntp
/freebsd-11-stable/contrib/nvi
/freebsd-11-stable/contrib/one-true-awk
/freebsd-11-stable/contrib/openbsm
/freebsd-11-stable/contrib/openpam
/freebsd-11-stable/contrib/openresolv
/freebsd-11-stable/contrib/pf
/freebsd-11-stable/contrib/sendmail
/freebsd-11-stable/contrib/serf
/freebsd-11-stable/contrib/sqlite3
/freebsd-11-stable/contrib/subversion
/freebsd-11-stable/contrib/tcpdump
/freebsd-11-stable/contrib/tcsh
/freebsd-11-stable/contrib/tnftp
/freebsd-11-stable/contrib/top
/freebsd-11-stable/contrib/top/install-sh
/freebsd-11-stable/contrib/tzcode/stdtime
/freebsd-11-stable/contrib/tzcode/zic
/freebsd-11-stable/contrib/tzdata
/freebsd-11-stable/contrib/unbound
/freebsd-11-stable/contrib/vis
/freebsd-11-stable/contrib/wpa
/freebsd-11-stable/contrib/xz
/freebsd-11-stable/crypto/heimdal
/freebsd-11-stable/crypto/openssh
/freebsd-11-stable/crypto/openssl
/freebsd-11-stable/gnu/lib
/freebsd-11-stable/gnu/usr.bin/binutils
/freebsd-11-stable/gnu/usr.bin/cc/cc_tools
/freebsd-11-stable/gnu/usr.bin/gdb
/freebsd-11-stable/lib/libc/locale/ascii.c
/freebsd-11-stable/sys/cddl/contrib/opensolaris
/freebsd-11-stable/sys/contrib/dev/acpica
/freebsd-11-stable/sys/contrib/ipfilter
/freebsd-11-stable/sys/contrib/libfdt
/freebsd-11-stable/sys/contrib/octeon-sdk
/freebsd-11-stable/sys/contrib/x86emu
/freebsd-11-stable/sys/contrib/xz-embedded
/freebsd-11-stable/usr.sbin/bhyve/atkbdc.h
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.c
/freebsd-11-stable/usr.sbin/bhyve/bhyvegc.h
/freebsd-11-stable/usr.sbin/bhyve/console.c
/freebsd-11-stable/usr.sbin/bhyve/console.h
/freebsd-11-stable/usr.sbin/bhyve/pci_fbuf.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.c
/freebsd-11-stable/usr.sbin/bhyve/pci_xhci.h
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.c
/freebsd-11-stable/usr.sbin/bhyve/ps2kbd.h
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.c
/freebsd-11-stable/usr.sbin/bhyve/ps2mouse.h
/freebsd-11-stable/usr.sbin/bhyve/rfb.c
/freebsd-11-stable/usr.sbin/bhyve/rfb.h
/freebsd-11-stable/usr.sbin/bhyve/sockstream.c
/freebsd-11-stable/usr.sbin/bhyve/sockstream.h
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.c
/freebsd-11-stable/usr.sbin/bhyve/usb_emul.h
/freebsd-11-stable/usr.sbin/bhyve/usb_mouse.c
/freebsd-11-stable/usr.sbin/bhyve/vga.c
/freebsd-11-stable/usr.sbin/bhyve/vga.h
# 302339 04-Jul-2016 np

cxgbe(4): Changes to the CPL-handler registration mechanism and code
related to "shared" CPLs.

a) Combine t4_set_tcb_field and t4_set_tcb_field_rpl into a single
function. Allow callers to direct the response to any iq. Tidy up
set_ulp_mode_iscsi while there to use names from t4_tcb.h instead of
magic constants.

b) Remove all CPL handler tables from struct adapter. This reduces its
size by around 2KB. All handlers are now registered at MOD_LOAD instead
of attach or some kind of initialization/activation. The registration
functions do not need an adapter parameter any more.

c) Add per-iq handlers to deal with CPLs whose destination cannot be
determined solely from the opcode. There are 2 such CPLs in use right
now: SET_TCB_RPL and L2T_WRITE_RPL. The base driver continues to send
filter and L2T_WRITEs over the mgmtq and solicits the reply on fwq.
t4_tom (including the DDP code) now uses the port's ctrlq to send
L2T_WRITEs and SET_TCB_FIELDs and solicits the reply on an ofld_rxq.
fwq and ofld_rxq have different handlers that know what kind of tid to
expect in the reply. Update t4_write_l2e and callers to to support any
wrq/iq combination.

Approved by: re@ (kib@)
Sponsored by: Chelsio Communications


# 302110 23-Jun-2016 np

cxgbe(4): Merge netmap support from the ncxgbe/ncxl interfaces to the
vcxgbe/vcxl interfaces and retire the 'n' interfaces. The main
cxgbe/cxl interfaces and tunables related to them are not affected by
any of this and will continue to operate as usual.

The driver used to create an additional 'n' interface for every
cxgbe/cxl interface if "device netmap" was in the kernel. The 'n'
interface shared the wire with the main interface but was otherwise
autonomous (with its own MAC address, etc.). It did not have normal
tx/rx but had a specialized netmap-only data path. r291665 added
another set of virtual interfaces (the 'v' interfaces) to the driver.
These had normal tx/rx but no netmap support.

This revision consolidates the features of both the interfaces into the
'v' interface which now has a normal data path, TOE support, and native
netmap support. The 'v' interfaces need to be created explicitly with
the hw.cxgbe.num_vis tunable. This means "device netmap" will not
result in the automatic creation of any virtual interfaces.

The following tunables can be used to override the default number of
queues allocated for each 'v' interface. nofld* = 0 will disable TOE on
the virtual interface and nnm* = 0 to will disable native netmap
support.

# number of normal NIC queues
hw.cxgbe.ntxq_vi
hw.cxgbe.nrxq_vi

# number of TOE queues
hw.cxgbe.nofldtxq_vi
hw.cxgbe.nofldrxq_vi

# number of netmap queues
hw.cxgbe.nnmtxq_vi
hw.cxgbe.nnmrxq_vi

hw.cxgbe.nnm{t,r}xq{10,1}g tunables have been removed.

--- tl;dr version ---
The workflow for netmap on cxgbe starting with FreeBSD 11 is:
1) "device netmap" in the kernel config.
2) "hw.cxgbe.num_vis=2" in loader.conf. num_vis > 2 is ok too, you'll
end up with multiple autonomous netmap-capable interfaces for every
port.
3) "dmesg | grep vcxl | grep netmap" to verify that the interface has
netmap queues.
4) Use any of the 'v' interfaces for netmap. pkt-gen -i vcxl<n>... .
One major improvement is that the netmap interface has a normal data
path as expected.
5) Just ignore the cxl interfaces if you want to use netmap only. No
need to bring them up. The vcxl interfaces are completely independent
and everything should just work.
---------------------

Approved by: re@ (gjb@)
Relnotes: Yes
Sponsored by: Chelsio Communications


# 301628 08-Jun-2016 np

cxgbe(4): Add a sysctl to manage the binding of a txq to a traffic class.

Sponsored by: Chelsio Communications


# 301535 06-Jun-2016 np

cxgbe(4): Track the state of the hardware traffic schedulers in the
driver. This works as long as everyone uses set_sched_class_params
to program them.

Sponsored by: Chelsio Communications


# 298955 03-May-2016 pfg

sys/dev: minor spelling fixes.

Most affect comments, very few have user-visible effects.


# 297467 31-Mar-2016 jhb

Remove #ifdef's from various structures used in the cxgbe/cxl driver.

This provides a constant ABI and layout for these structures (especially
struct adapter) avoiding some foot shooting.

Discussed with: np
Sponsored by: Chelsio Communications


# 296710 12-Mar-2016 np

cxgbe(4): Catch up with the latest list of card capabilities as reported
by the firmware.


# 296641 11-Mar-2016 np

cxgbe(4): Add sysctls to display the TP microcode version and the
expansion rom version (if there's one).

trantor:~# sysctl dev.t4nex dev.t5nex | grep _version
dev.t4nex.0.firmware_version: 1.15.28.0
dev.t4nex.0.tp_version: 0.1.9.4
dev.t5nex.0.firmware_version: 1.15.28.0
dev.t5nex.0.exprom_version: 1.0.0.68
dev.t5nex.0.tp_version: 0.1.4.9


# 296603 10-Mar-2016 np

cxgbe(4): Add general purpose routines that offer safe access to the
chip's memory windows. Convert existing users of these windows to the
new routines.


# 296552 08-Mar-2016 np

cxgbe(4): Rename regwin_lock to reg_lock. It is used to protect access
to indirect registers only.


# 296489 08-Mar-2016 np

cxgbe(4): Updates to the shared routines that deal with the serial EEPROM,
flash, and VPD.

Obtained from: Chelsio Communications


# 296481 08-Mar-2016 np

cxgbe(4): Overhaul the shared code that deals with the chip's TP block,
which is responsible for filtering and RSS.

Add the ability to use filters that match on PF/VF (aka "VNIC id") while
here. This is mutually exclusive with filtering on outer VLAN tag with
Q-in-Q.

Sponsored by: Chelsio Communications


# 296478 07-Mar-2016 np

cxgbe(4): Add a struct sge_params to store per-adapter SGE parameters.
Move the code that reads all the parameters to t4_init_sge_params in the
shared code. Use these per-adapter values instead of globals.

Sponsored by: Chelsio Communications


# 296383 04-Mar-2016 np

cxgbe(4): Very basic T6 awareness. This is part of ongoing work to
update to the latest internal shared code.

- Add a chip_params structure to keep track of hardware constants for
all generations of Terminators handled by cxgbe.
- Update t4_hw_pci_read_cfg4 to work with T6.
- Update the hardware debug sysctls (hidden within dev.<tNnex>.<n>.misc.*) to
work with T6. Most of the changes are in the decoders for the CIM
logic analyzer and the MPS TCAM.
- Acquire the regwin lock around indirect register accesses.

Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications


# 296018 24-Feb-2016 np

cxgbe(4): Add a sysctl to retrieve the maximum speed/bandwidth supported by a
port.

dev.cxgbe.<n>.max_speed
dev.cxl.<n>.max_speed

Sponsored by: Chelsio Communications


# 295778 18-Feb-2016 np

cxgbe: catch up with the latest hardware-related definitions.

Obtained from: Chelsio Communications
Sponsored by: Chelsio Communications


# 292736 25-Dec-2015 np

cxgbe(4): Updates to the base NIC driver and t4_tom to support the iSCSI
offload driver. These changes come from projects/cxl_iscsi.


# 291665 02-Dec-2015 jhb

Add support for configuring additional virtual interfaces (VIs) on a port.

Each virtual interface has its own MAC address, queues, and statistics.
The dedicated netmap interfaces (ncxgbeX / ncxlX) were already implemented
as additional VIs on each port. This change allows additional non-netmap
interfaces to be configured on each port. Additional virtual interfaces
use the naming scheme vcxgbeX or vcxlX.

Additional VIs are enabled by setting the hw.cxgbe.num_vis tunable to a
value greater than 1 before loading the cxgbe(4) or cxl(4) driver.
NB: The first VI on each port is the "main" interface (cxgbeX or cxlX).

T4/T5 NICs provide a limited number of MAC addresses for each physical port.
As a result, a maximum of six VIs can be configured on each port (including
the "main" interface and the netmap interface when netmap is enabled).

One user-visible result is that when netmap is enabled, packets received
or transmitted via the netmap interface are no longer counted in the stats
for the "main" interface, but are not accounted to the netmap interface.

The netmap interfaces now also have a new-bus device and export various
information sysctl nodes via dev.n(cxgbe|cxl).X.

The cxgbetool 'clearstats' command clears the stats for all VIs on the
specified port along with the port's stats. There is currently no way to
clear the stats of an individual VI.

Reviewed by: np
MFC after: 1 month
Sponsored by: Chelsio


# 286926 19-Aug-2015 np

cxgbe(4): Save the flags for the last adapter-wide synchronized
operation that was initiated successfully. (The caller and thread are
already recorded).

MFC after: 1 week


# 285648 17-Jul-2015 np

cxgbe(4): Ask the firmware for the start of the RSS slice for a port and
save it for later. This enables direct manipulation of the indirection
tables (although the stock driver doesn't do that right now).

MFC after: 1 month


# 285221 06-Jul-2015 np

cxgbe(4): Add a new knob that controls the congestion response of netmap
rx queues. The default is to drop rather than backpressure.

This decouples the congestion settings of NIC and netmap rx queues.

MFC after: 3 days


# 284445 16-Jun-2015 np

cxgbe(4): Add the ability to dump mailbox commands and replies. It is
enabled/disabled via bit 0 of adapter->debug_flags (which is available
at dev.t5nex.<n>.debug_flags).

MFC after: 1 week


# 279246 24-Feb-2015 np

cxgbe(4): set up congestion management for netmap rx queues.

The hw.cxgbe.cong_drop knob controls the response of the chip when
netmap queues are congested.


# 278374 08-Feb-2015 np

cxgbe(4): tidy up some of the interaction between the Upper Layer
Drivers (ULDs) and the base if_cxgbe driver.

Track the per-adapter activation of ULDs in a new "active_ulds" field.
This was done pretty arbitrarily before this change -- via TOM_INIT_DONE
in adapter->flags for TOM, and the (1 << MAX_NPORTS) bit in
adapter->offload_map for iWARP.

iWARP and hw-accelerated iSCSI rely on the TOE (supported by the TOM
ULD). The rules are:
a) If the iWARP and/or iSCSI ULDs are available when TOE is enabled then
iWARP and/or iSCSI are enabled too.
b) When the iWARP and iSCSI modules are loaded they go looking for
adapters with TOE enabled and enable themselves on that adapter.
c) You cannot deactivate or unload the TOM module from underneath iWARP
or iSCSI. Any such attempt will fail with EBUSY.

MFC after: 2 weeks


# 278371 08-Feb-2015 np

cxgbe(4): a change to the synchronization rules within the the driver.
This is purely cosmetic because the new rules are already followed.

MFC after: 1 week


# 276485 31-Dec-2014 np

cxgbe(4): major tx rework.

a) Front load as much work as possible in if_transmit, before any driver
lock or software queue has to get involved.

b) Replace buf_ring with a brand new mp_ring (multiproducer ring). This
is specifically for the tx multiqueue model where one of the if_transmit
producer threads becomes the consumer and other producers carry on as
usual. mp_ring is implemented as standalone code and it should be
possible to use it in any driver with tx multiqueue. It also has:
- the ability to enqueue/dequeue multiple items. This might become
significant if packet batching is ever implemented.
- an abdication mechanism to allow a thread to give up writing tx
descriptors and have another if_transmit thread take over. A thread
that's writing tx descriptors can end up doing so for an unbounded
time period if a) there are other if_transmit threads continuously
feeding the sofware queue, and b) the chip keeps up with whatever the
thread is throwing at it.
- accurate statistics about interesting events even when the stats come
at the expense of additional branches/conditional code.

The NIC txq lock is uncontested on the fast path at this point. I've
left it there for synchronization with the control events (interface
up/down, modload/unload).

c) Add support for "type 1" coalescing work request in the normal NIC tx
path. This work request is optimized for frames with a single item in
the DMA gather list. These are very common when forwarding packets.
Note that netmap tx in cxgbe already uses these "type 1" work requests.

d) Do not request automatic cidx updates every 32 descriptors. Instead,
request updates via bits in individual work requests (still every 32
descriptors approximately). Also, request an automatic final update
when the queue idles after activity. This means NIC tx reclaim is still
performed lazily but it will catch up quickly as soon as the queue
idles. This seems to be the best middle ground and I'll probably do
something similar for netmap tx as well.

e) Implement a faster tx path for WRQs (used by TOE tx and control
queues, _not_ by the normal NIC tx). Allow work requests to be written
directly to the hardware descriptor ring if room is available. I will
convert t4_tom and iw_cxgbe modules to this faster style gradually.

MFC after: 2 months


# 275733 12-Dec-2014 np

Move KTR_CXGBE from t4_tom.h to adapter.h so that the base if_cxgbe
code can use it too.

MFC after: 1 week


# 275554 05-Dec-2014 np

cxgbe(4): allow the driver to use rx buffers that do not end on a pack
boundary.

MFC after: 2 weeks


# 275539 05-Dec-2014 np

cxgbe(4): Allow for different pad and pack boundaries for different
adapters. Set the pack boundary for T5 cards to be the same as the
PCIe max payload size. The chip likes it this way.

In this revision the driver allocate rx buffers that align on both
boundaries. This is not a strict requirement and a followup commit
will switch the driver to a more relaxed allocation strategy.

MFC after: 2 weeks


# 272200 27-Sep-2014 np

cxgbe(4): implement if_get_counter.


# 269428 02-Aug-2014 np

cxgbe(4): some optimizations in freelist handling.

MFC after: 2 weeks.


# 269411 01-Aug-2014 np

cxgbe(4): minor optimizations in ingress queue processing.

Reorganize struct sge_iq. Make the iq entry size a compile time
constant. While here, eliminate RX_FL_ESIZE and use EQ_ESIZE directly.

MFC after: 2 weeks


# 269032 23-Jul-2014 np

cxgbe(4): Keep track of the clusters that have to be freed by the
custom free routine (rxb_free) in the driver. Fail MOD_UNLOAD with
EBUSY if any such cluster has been handed up to the kernel but hasn't
been freed yet. This prevents a panic later when the cluster finally
needs to be freed but rxb_free is gone from the kernel.

MFC after: 1 week


# 268971 22-Jul-2014 np

Simplify r267600, there's no need to distinguish between allocated and
inlined mbufs.

MFC after: 1 week


# 268536 11-Jul-2014 np

cxgbe(4): Add an iSCSI softc to the adapter structure.


# 267600 17-Jun-2014 np

cxgbe(4): Fix bug in the fast rx buffer recycle path. In some cases rx
buffers were getting recycled when they should have been left alone.

MFC after: 3 days


# 266757 27-May-2014 np

cxgbe(4): netmap support for Terminator 5 (T5) based 10G/40G cards.
Netmap gets its own hardware-assisted virtual interface and won't take
over or disrupt the "normal" interface in any way. You can use both
simultaneously.

For kernels with DEV_NETMAP, cxgbe(4) carves out an ncxl<N> interface
(note the 'n' prefix) in the hardware to accompany each cxl<N>
interface. These two ifnet's per port share the same wire but really
are separate interfaces in the hardware and software. Each gets its own
L2 MAC addresses (unicast and multicast), MTU, checksum caps, etc. You
should run netmap on the 'n' interfaces only, that's what they are for.

With this, pkt-gen is able to transmit > 45Mpps out of a single 40G port
of a T580 card. 2 port tx is at ~56Mpps total (28M + 28M) as of now.
Single port receive is at 33Mpps but this is very much a work in
progress. I expect it to be closer to 40Mpps once done. In any case
the current effort can already saturate multiple 10G ports of a T5 card
at the smallest legal packet size. T4 gear is totally untested.

trantor:~# ./pkt-gen -i ncxl0 -f tx -D 00:07:43:ab:cd:ef
881.952141 main [1621] interface is ncxl0
881.952250 extract_ip_range [275] range is 10.0.0.1:0 to 10.0.0.1:0
881.952253 extract_ip_range [275] range is 10.1.0.1:0 to 10.1.0.1:0
881.962540 main [1804] mapped 334980KB at 0x801dff000
Sending on netmap:ncxl0: 4 queues, 1 threads and 1 cpus.
10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> 00:07:43:ab:cd:ef)
881.962562 main [1882] Sending 512 packets every 0.000000000 s
881.962563 main [1884] Wait 2 secs for phy reset
884.088516 main [1886] Ready...
884.088535 nm_open [457] overriding ifname ncxl0 ringid 0x0 flags 0x1
884.088607 sender_body [996] start
884.093246 sender_body [1064] drop copy
885.090435 main_thread [1418] 45206353 pps (45289533 pkts in 1001840 usec)
886.091600 main_thread [1418] 45322792 pps (45375593 pkts in 1001165 usec)
887.092435 main_thread [1418] 45313992 pps (45351784 pkts in 1000834 usec)
888.094434 main_thread [1418] 45315765 pps (45406397 pkts in 1002000 usec)
889.095434 main_thread [1418] 45333218 pps (45378551 pkts in 1001000 usec)
890.097434 main_thread [1418] 45315247 pps (45405877 pkts in 1002000 usec)
891.099434 main_thread [1418] 45326515 pps (45417168 pkts in 1002000 usec)
892.101434 main_thread [1418] 45333039 pps (45423705 pkts in 1002000 usec)
893.103434 main_thread [1418] 45324105 pps (45414708 pkts in 1001999 usec)
894.105434 main_thread [1418] 45318042 pps (45408723 pkts in 1002001 usec)
895.106434 main_thread [1418] 45332430 pps (45377762 pkts in 1001000 usec)
896.107434 main_thread [1418] 45338072 pps (45383410 pkts in 1001000 usec)
...

Relnotes: Yes
Sponsored by: Chelsio Communications.


# 263317 18-Mar-2014 np

cxgbe(4): significant rx rework.

- More flexible cluster size selection, including the ability to fall
back to a safe cluster size (PAGE_SIZE from zone_jumbop by default) in
case an allocation of a larger size fails.
- A single get_fl_payload() function that assembles the payload into an
mbuf chain for any kind of freelist. This replaces two variants: one
for freelists with buffer packing enabled and another for those without.
- Buffer packing with any sized cluster. It was limited to 4K clusters
only before this change.
- Enable buffer packing for TOE rx queues as well.
- Statistics and tunables to go with all these changes. The driver's
man page will be updated separately.

MFC after: 5 weeks


# 261558 06-Feb-2014 scottl

Add a new sysctl, dev.cxgbe.N.rsrv_noflow, and a companion tunable,
hw.cxgbe.rsrv_noflow. When set, queue 0 of the port is reserved for
TX packets without a flowid. The hash value of packets with a flowid
is bumped up by 1. The intent is to provide a private queue for
link-level packets like LACP that is unlikely to overflow or suffer
deep queue latency.

Reviewed by: np
Obtained from: Netflix
MFC after: 3 days


# 261537 06-Feb-2014 np

cxgbe(4): Use the rx channel map (instead of the tx channel map) as the
congestion channel map.

MFC after: 1 week


# 261536 06-Feb-2014 np

cxgbe(4): The T5 allows for a different freelist starvation threshold
for queues with buffer packing. Use the correct value to calculate a
freelist's low water mark.

MFC after: 1 week


# 260210 02-Jan-2014 adrian

Add an option to enable or disable the small RX packet copying that
is done to improve performance of small frames.

When doing RX packing, the RX copying isn't necessarily required.

Reviewed by: np


# 259103 08-Dec-2013 np

cxgbe(4): save a copy of the RSS map for each port for the driver's use.


# 257176 26-Oct-2013 glebius

The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 256477 14-Oct-2013 np

cxgbe(4): Store the log2 of the # of doorbells per BAR2 page for both
ingress and egress queues, and for both T4 and T5. These values are
used by the T4/T5 iWARP driver.


# 255050 29-Aug-2013 np

Implement support for rx buffer packing. Enable it by default for T5
cards.

This is a T4 and T5 chip feature which lets the chip deliver multiple
Ethernet frames in a single buffer. This is more efficient within the
chip, in the driver, and reduces wastage of space in rx buffers.

- Always allocate rx buffers from the jumbop zone, no matter what the
MTU is. Do not use the normal cluster refcounting mechanism.
- Reserve space for an mbuf and a refcount in the cluster itself and let
the chip DMA multiple frames in the rest.
- Use the embedded mbuf for the first frame and allocate mbufs on the
fly for any additional frames delivered in the cluster. Each of these
mbufs has a reference on the underlying cluster.


# 255015 29-Aug-2013 np

Merge r254386 from user/np/cxl_tuning. Add an INET|INET6 check missing
in said revision.

r254386:
Flush inactive LRO entries periodically.


# 255005 28-Aug-2013 np

Add hooks in base cxgbe(4) for the iWARP upper-layer driver. Update a
couple of assertions in the TOE driver as well.


# 253829 31-Jul-2013 np

Display SGE tunables in the sysctl tree.

dev.t5nex.0.fl_pktshift: payload DMA offset in rx buffer (bytes)
dev.t5nex.0.fl_pad: payload pad boundary (bytes)
dev.t5nex.0.spg_len: status page size (bytes)
dev.t5nex.0.cong_drop: congestion drop setting

Discussed with: scottl


# 253691 26-Jul-2013 np

Add support for packet-sniffing tracers to cxgbe(4). This works with
all T4 and T5 based cards and is useful for analyzing TSO, LRO, TOE, and
for general purpose monitoring without tapping any cxgbe or cxl ifnet
directly.

Tracers on the T4/T5 chips provide access to Ethernet frames exactly as
they were received from or transmitted on the wire. On transmit, a
tracer will capture a frame after TSO segmentation, hw VLAN tag
insertion, hw L3 & L4 checksum insertion, etc. It will also capture
frames generated by the TCP offload engine (TOE traffic is normally
invisible to the kernel). On receive, a tracer will capture a frame
before hw VLAN extraction, runt filtering, other badness filtering,
before the steering/drop/L2-rewrite filters or the TOE have had a go at
it, and of course before sw LRO in the driver.

There are 4 tracers on a chip. A tracer can trace only in one direction
(tx or rx). For now cxgbetool will set up tracers to capture the first
128B of every transmitted or received frame on a given port. This is a
small subset of what the hardware can do. A pseudo ifnet with the same
name as the nexus driver (t4nex0 or t5nex0) will be created for tracing.
The data delivered to this ifnet is an additional copy made inside the
chip. Normal delivery to cxgbe<n> or cxl<n> will be made as usual.

/* watch cxl0, which is the first port hanging off t5nex0. */
# cxgbetool t5nex0 tracer 0 tx0 (watch what cxl0 is transmitting)
# cxgbetool t5nex0 tracer 1 rx0 (watch what cxl0 is receiving)
# cxgbetool t5nex0 tracer list
# tcpdump -i t5nex0 <== all that cxl0 sees and puts on the wire

If you were doing TSO, a tcpdump on cxl0 may have shown you ~64K
"frames" with no L3/L4 checksum but this will show you the frames that
were actually transmitted.

/* all done */
# cxgbetool t5nex0 tracer 0 disable
# cxgbetool t5nex0 tracer 1 disable
# cxgbetool t5nex0 tracer list
# ifconfig t5nex0 destroy


# 252747 05-Jul-2013 np

- Show the reason why link is down if this information is available.
- Display the temperature and PHY firmware version of the BT PHY.

MFC after: 1 day


# 252728 04-Jul-2013 np

- Make note of interface MTU change if the rx queues exist, and not just
when the interface is up.
- Add a tunable to control the TOE's rx coalesce feature (enabled by
default as it always has been). Consider the interface MTU or the
coalesce size when deciding which cluster zone to use to fill the
offload rx queue's free list. The tunable is:
dev.{t4nex,t5nex}.<N>.toe.rx_coalesce

MFC after: 1 day


# 252705 04-Jul-2013 np

- Read all TP parameters in one place.
- Read the filter mode, calculate various shifts, and use them
properly during active open (in select_ntuple).

MFC after: 1 day


# 250092 30-Apr-2013 np

- Provide accurate ifmedia information so that 40G ports/transceivers are
displayed properly in ifconfig, etc.

- Use the same number of tx and rx queues for a 40G port as for a 10G port.

MFC after: 1 week


# 249392 11-Apr-2013 np

Cosmetic change (s/wrwc/wcwr/;s/WRWC/WCWR/).

MFC after: 3 days.


# 248925 30-Mar-2013 np

cxgbe(4): Add support for Chelsio's Terminator 5 (aka T5) ASIC. This
includes support for the NIC and TOE features of the 40G, 10G, and
1G/100M cards based on the T5.

The ASIC is mostly backward compatible with the Terminator 4 so cxgbe(4)
has been updated instead of writing a brand new driver. T5 cards will
show up as cxl (short for cxlgb) ports attached to the t5nex bus driver.

Sponsored by: Chelsio


# 247291 25-Feb-2013 np

cxgbe(4): Ask the card's firmware to pad up tiny CPLs by encapsulating
them in a firmware message if it is able to do so. This works out
better for one of the FIFOs in the chip.

MFC after: 5 days


# 245936 26-Jan-2013 np

Force the 404-BT card (4 x 1G) to use the "uwire" configuration file.

MFC after: 3 days


# 245567 17-Jan-2013 np

cxgbe: Make the for_each macros safer to use by turning them
into a single statement each.

Submitted by: Christoph Mallon <christoph dot mallon at gmx dot de>
MFC after: 1 week


# 245517 16-Jan-2013 np

cxgbe: Fix the for_each_foo macros -- the last argument should not share
its name with any member of struct sge.

MFC after: 3 days


# 245274 10-Jan-2013 np

cxgbe(4): Add functions to help synchronize "slow" operations (those not
on the fast data path) and use them instead of frobbing the adapter lock
and busy flag directly.

Other changes made while reworking all slow operations:
- Wait for the reply to a filter request (add/delete). This guarantees
that the operation is complete by the time the ioctl returns.
- Tidy up the tid_info structure.
- Do not allow the tx queue size to be set to something that's not a
power of 2.

MFC after: 1 week


# 241733 19-Oct-2012 ed

Prefer __containerof() over __member2struct().

The former works better with qualifiers, but also properly type checks
the input pointer.


# 241397 10-Oct-2012 np

Remove unused item. cxgbe's rx queue's lock was removed a long time ago.

MFC after: 3 days


# 239341 16-Aug-2012 np

Initialize various DDP parameters in the main cxgbe(4) driver:

- Setup multiple DDP page sizes. When the driver attempts DDP it will
try to combine physically contiguous pages into regions of these sizes.

- Set the indicate size such that the payload carried in the indicate can
be copied in the header mbuf (and the 16K rx buffer can be recycled).

- Set DDP threshold to the max payload that the chip will coalesce and
deliver to the driver (this is ~16K by default, which is also why the
offload rx queue is backed by 16K buffers). If the chip is able to
coalesce up to the max it's allowed to, it's a good sign that the peer
is transmitting in bulk without any TCP PSH.

MFC after: 2 weeks


# 239338 16-Aug-2012 np

Add a routine (t4_set_tcb_field) to update arbitrary parts of a hardware
TCB. Filters are programmed by modifying the TCB too (via a different
routine) and the reply to any TCB update is delivered via a
CPL_SET_TCB_RPL. Figure out whether the reply is for a filter-write or
something else and route it appropriately.

MFC after: 2 weeks


# 239336 16-Aug-2012 np

Allow for a different handler for each type of firmware message.

MFC after: 2 weeks


# 237819 29-Jun-2012 np

cxgbe(4): support for IPv6 TSO and LRO.

Submitted by: bz (this is a modified version of that patch)


# 237263 19-Jun-2012 np

- Updated TOE support in the kernel.

- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)


# 235944 24-May-2012 bz

MFp4 bz_ipv6_fast:

Significantly update tcp_lro for mostly two things:
1) introduce basic support for IPv6 without extension headers.
2) try hard to also get the incremental checksum updates right,
especially also in the IPv4 case for the IP and TCP header.

Move variables around for better locality, factor things out into
functions, allow checksum updates to be compiled out, ...

Leave a few comments on further things to look at in the future,
though that is not the full list.

Update drivers with appropriate #includes as needed for IPv6 data
type in LRO.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


# 231115 07-Feb-2012 np

cxgbe: reduce diffs with other branches.
Will help future MFCs from HEAD.

MFC after: 3 days


# 228561 16-Dec-2011 np

Many updates to cxgbe(4)

- Device configuration via plain text config file. Also able to operate
when not attached to the chip as the master driver.

- Generic "work request" queue that serves as the base for both ctrl and
ofld tx queues.

- Generic interrupt handler routine that can process any event on any
kind of ingress queue (via a dispatch table).

- A couple of new driver ioctls. cxgbetool can now install a firmware
to the card ("loadfw" command) and can read the card's memory
("memdump" and "tcb" commands).

- Lots of assorted information within dev.t4nex.X.misc.* This is
primarily for debugging and won't show up in sysctl -a.

- Code to manage the L2 tables on the chip.

- Updates to cxgbe(4) man page to go with the tunables that have changed.

- Updates to the shared code in common/

- Updates to the driver-firmware interface (now at fw 1.4.16.0)

MFC after: 1 month


# 222973 11-Jun-2011 np

- driver ioctl to get SGE context for any given queue.
- sysctls to display the context id, cidx, and pidx of all kinds of queues.

MFC after: 3 days


# 222701 04-Jun-2011 np

Allow lazy fill up of freelists.

MFC after: 3 days


# 222510 30-May-2011 np

- Specialized ingress queues that take interrupts for other ingress
queues. Try to have a set of these per port when possible, fall back
to sharing a common pool between all ports otherwise.

- One control queue per port (used to be one per hardware channel).

- t4_eth_rx now handles Ethernet rx only.

- sysctls to display pidx/cidx for some queues.

MFC after: 1 week


# 222509 30-May-2011 np

L2 table code. This is enough to get the T4's switch + L2 rewrite
filters working. (All other filters - switch without L2 info rewrite,
steer, and drop - were already fully-functional).

Some contrived examples of "switch" filters with L2 rewriting:

# cxgbetool t4nex0 iport 0 dport 80 action switch vlan +9 eport 3
Intercept all packets received on physical port 0 with TCP port 80 as
destination, insert a vlan tag with VID 9, and send them out of port 3.

# cxgbetool t4nex0 sip 192.168.1.1/32 ivlan 5 action switch \
vlan =9 smac aa:bb:cc:dd:ee:ff eport 0
Intercept all packets (received on any port) with source IP address
192.168.1.1 and VLAN id 5, rewrite the VLAN id to 9, rewrite source mac
to aa:bb:cc:dd:ee:ff, and send it out of port 0.

MFC after: 1 week


# 220873 19-Apr-2011 np

- Move all Ethernet specific items from sge_eq to sge_txq. sge_eq is
now a suitable base for all kinds of egress queues.

- Add control queues (sge_ctrlq) and allocate one of these per hardware
channel. They can be used to program filters and steer traffic (and
more).

MFC after: 1 week


# 220649 15-Apr-2011 np

Fix a couple of bad races that can occur when a cxgbe interface is taken
down. The ingress queue lock was unused and has been removed as part of
these changes.

- An in-flight egress update from the SGE must be handled before the
queue that requested it is destroyed. Wait for the update to arrive.

- Interrupt handlers must stop processing rx events for a queue before
the queue is destroyed. Events that have not yet been processed
should be ignored once the queue disappears.

MFC after: 1 week


# 220643 14-Apr-2011 np

There is no need to request a tx credit flush if such a request is already
pending.

MFC after: 3 days


# 219944 23-Mar-2011 np

Do not over-allocate MSI interrupts for the case where each ingress
queue has its own interrupt. If the exact number that we need is not a
power of 2 and we're using MSI, then switch to interrupt multiplexing.

While here, replace the magic numbers with something more readable.

MFC after: 3 days


# 219392 08-Mar-2011 np

cxgbe shouldn't directly know of the UMA zones where network buffers
come from.

MFC after: 1 week


# 219290 05-Mar-2011 np

Tweaks for rx:

- everything related to LRO should be in #ifdef INET blocks
- reorder sge_iq's fields so that the most frequently used are all together
- pull all rx code into t4_intr_data directly
- let go of the ingress queue lock when passing up data
- refill the freelist only if it is short of at least 32 buffers


# 219289 05-Mar-2011 np

Store the ifnet rather than the port_info in each txq and rxq struct.

MFC after: 1 week


# 219288 05-Mar-2011 np

A txpkts work request should have a valid FID.

MFC after: 1 week


# 219286 05-Mar-2011 np

Resume tx immediately in response to an SGE egress update from the hardware.

MFC after: 1 week


# 219285 05-Mar-2011 np

Fix incorrect assertion.

MFC after: 3 days


# 218792 18-Feb-2011 np

cxgbe(4) - NIC driver for Chelsio T4 (Terminator 4) based 10Gb/1Gb adapters.

MFC after: 3 weeks