#
345664 |
|
28-Mar-2019 |
jhb |
MFC 330040,330041,330079,330884,330946,330947,331649,333068,333810,337722, 340466,340468,340469,340473: Add TOE-based TLS offload.
Note that this requires a modified OpenSSL library.
330040: Fetch TLS key parameters from the firmware.
The parameters describe how much of the adapter's memory is reserved for storing TLS keys. The 'meminfo' sysctl now lists this region of adapter memory as 'TLS keys' if present.
330041: Move ccr_aes_getdeckey() from ccr(4) to the cxgbe(4) driver.
This routine will also be used by the TOE module to manage TLS keys.
330079: Move #include for rijndael.h out of x86-specific region.
The #include was added inside of the conditional by accident and the lack of it broke non-x86 builds.
330884: Support for TLS offload of TOE connections on T6 adapters.
The TOE engine in Chelsio T6 adapters supports offloading of TLS encryption and TCP segmentation for offloaded connections. Sockets using TLS are required to use a set of custom socket options to upload RX and TX keys to the NIC and to enable RX processing. Currently these socket options are implemented as TCP options in the vendor specific range. A patched OpenSSL library will be made available in a port / package for use with the TLS TOE support.
TOE sockets can either offload both transmit and reception of TLS records or just transmit. TLS offload (both RX and TX) is enabled by setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be enabled on the relevant interface. Transmit offload can be used on any "normal" or TLS TOE socket by using the custom socket option to program a transmit key. This permits most TOE sockets to transparently offload TLS when applications use a patched SSL library (e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL library). Receive offload can only be used with TOE sockets using the TLS mode. The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a list of TCP port numbers. Any connection with either a local or remote port number in that list will be created as a TLS socket rather than a plain TOE socket. Note that although this sysctl accepts an arbitrary list of port numbers, the sysctl(8) tool is only able to set sysctl nodes to a single value. A TLS socket will hang without receiving data if used by an application that is not using a patched SSL library. Thus, the tls_rx_ports node should be used with care. For a server mostly concerned with offloading TLS transmit, this node is not needed as plain TOE sockets will fall back to software crypto when using an unpatched SSL library.
New per-interface statistics nodes are added giving counts of TLS packets and payload bytes (payload bytes do not include TLS headers or authentication tags/MACs) offloaded via the TOE engine, e.g.:
dev.cc.0.stats.rx_tls_octets: 149 dev.cc.0.stats.rx_tls_records: 13 dev.cc.0.stats.tx_tls_octets: 26501823 dev.cc.0.stats.tx_tls_records: 1620
TLS transmit work requests are constructed by a new variant of t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.
TLS transmit work requests require a buffer containing IVs. If the IVs are too large to fit into the work request, a separate buffer is allocated when constructing a work request. This buffer is associated with the transmit descriptor and freed when the descriptor is ACKed by the adapter.
Received TLS frames use two new CPL messages. The first message is a CPL_TLS_DATA containing the decryped payload of a single TLS record. The handler places the mbuf containing the received payload on an mbufq in the TOE pcb. The second message is a CPL_RX_TLS_CMP message which includes a copy of the TLS header and indicates if there were any errors. The handler for this message places the TLS header into the socket buffer followed by the saved mbuf with the payload data. Both of these handlers are contained in tom/t4_tls.c.
A few routines were exposed from t4_cpl_io.c for use by t4_tls.c including send_rx_credits(), a new send_rx_modulate(), and t4_close_conn().
TLS keys for both transmit and receive are stored in onboard memory in the NIC in the "TLS keys" memory region.
In some cases a TLS socket can hang with pending data available in the NIC that is not delivered to the host. As a workaround, TLS sockets are more aggressive about sending CPL_RX_DATA_ACK messages anytime that any data is read from a TLS socket. In addition, a fallback timer will periodically send CPL_RX_DATA_ACK messages to the NIC for connections that are still in the handshake phase. Once the connection has finished the handshake and programmed RX keys via the socket option, the timer is stopped.
A new function select_ulp_mode() is used to determine what sub-mode a given TOE socket should use (plain TOE, DDP, or TLS). The existing set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and handles initialization of TLS-specific state when necessary in addition to DDP-specific state.
Since TLS sockets do not receive individual TCP segments but always receive full TLS records, they can receive more data than is available in the current window (e.g. if a 16k TLS record is received but the socket buffer is itself 16k). To cope with this, just drop the window to 0 when this happens, but track the overage and "eat" the overage as it is read from the socket buffer not opening the window (or adding rx_credits) for the overage bytes.
330946: Remove TLS-related inlines from t4_tom.h to fix iw_cxgbe(4) build.
- Remove the one use of is_tls_offload() and the function. AIO special handling only needs to be disabled when a TOE socket is actively doing TLS offload on transmit. The TOE socket's mode (which affects receive operation) doesn't matter, so remove the check for the socket's mode and only check if a TOE socket has TLS transmit keys configured to determine if an AIO write request should fall back to the normal socket handling instead of the TOE fast path. - Move can_tls_offload() into t4_tls.c. It is not used in critical paths, so inlining isn't that important. Change return type to bool while here.
330947: Fix the check for an empty send socket buffer on a TOE TLS socket.
Compare sbavail() with the cached sb_off of already-sent data instead of always comparing with zero. This will correctly close the connection and send the FIN if the socket buffer contains some previously-sent data but no unsent data.
331649: Use the offload transmit queue to set flags on TLS connections.
Requests to modify the state of TLS connections need to be sent on the same queue as TLS record transmit requests to ensure ordering.
However, in order to use the offload transmit queue in t4_set_tcb_field(), the function needs to be updated to do proper flow control / credit management when queueing a request to an offload queue. This required passing a pointer to the toepcb itself to this function, so while here remove the 'tid' and 'iqid' parameters and obtain those values from the toepcb in t4_set_tcb_field() itself.
333068: Use the correct key address when renegotiating the transmit key.
Previously, get_keyid() was returning the address of the receive key instead of the transmit key when renegotiating the transmit key. This could either hang the card (if a connection was only offloading TLS TX and thus had a receive key address of -1) or cause the connection to fail by overwriting the wrong key (if both RX and TX TLS were offloaded).
333810: Be more robust against garbage input on a TOE TLS TX socket.
If a socket is closed or shutdown and a partial record (or what appears to be a partial record) is waiting in the socket buffer, discard the partial record and close the connection rather than waiting forever for the rest of the record.
337722: Whitespace nit in t4_tom.h
340466: Move the TLS key map into the adapter softc so non-TOE code can use it.
340468: Change the quantum for TLS key addresses to 32 bytes.
The addresses passed when reading and writing keys are always shifted right by 5 as the memory locations are addressed in 32-byte chunks, so the quantum needs to be 32, not 8.
340469: Remove bogus roundup2() of the key programming work request header.
The key context is always placed immediately after the work request header. The total work request length has to be rounded up by 16 however.
340473: Restore the <sys/vmem.h> header to fix build of cxgbe(4) TOM.
vmem's are not just used for TLS memory in TOM and the #include actually predates the TLS code so should not have been removed when the TLS vmem moved in r340466.
Sponsored by: Chelsio Communications
|
#
345040 |
|
11-Mar-2019 |
jhb |
MFC 318429,318967,319721,319723,323600,323724,328353-328361,330042,343056: Add a driver for the Chelsio T6 crypto accelerator engine.
Note that with the set of commits in this batch, no additional tunables are needed to use the driver once it is loaded.
318429: Add a driver for the Chelsio T6 crypto accelerator engine.
The ccr(4) driver supports use of the crypto accelerator engine on Chelsio T6 NICs in "lookaside" mode via the opencrypto framework.
Currently, the driver supports AES-CBC, AES-CTR, AES-GCM, and AES-XTS cipher algorithms as well as the SHA1-HMAC, SHA2-256-HMAC, SHA2-384-HMAC, and SHA2-512-HMAC authentication algorithms. The driver also supports chaining one of AES-CBC, AES-CTR, or AES-XTS with an authentication algorithm for encrypt-then-authenticate operations.
Note that this driver is still under active development and testing and may not yet be ready for production use. It does pass the tests in tests/sys/opencrypto with the exception that the AES-GCM implementation in the driver does not yet support requests with a zero byte payload.
To use this driver currently, the "uwire" configuration must be used along with explicitly enabling support for lookaside crypto capabilities in the cxgbe(4) driver. These can be done by setting the following tunables before loading the cxgbe(4) driver:
hw.cxgbe.config_file=uwire hw.cxgbe.cryptocaps_allowed=-1
318967: Fail large requests with EFBIG.
The adapter firmware in general does not accept PDUs larger than 64k - 1 bytes in size. Sending crypto requests larger than this size result in hangs or incorrect output, so reject them with EFBIG. For requests chaining an AES cipher with an HMAC, the firmware appears to require slightly smaller requests (around 512 bytes).
319721: Add explicit handling for requests with an empty payload.
- For HMAC requests, construct a special input buffer to request an empty hash result. - For plain cipher requests and requests that chain an AES cipher with an HMAC, fail with EINVAL if there is no cipher payload. If needed in the future, chained requests that only contain AAD could be serviced as HMAC-only requests. - For GCM requests, the hardware does not support generating the tag for an AAD-only request. Instead, complete these requests synchronously in software on the assumption that such requests are rare.
319723: Fix the software fallback for GCM to validate the existing tag for decrypts.
323600: Fix some incorrect sysctl pointers for some error stats.
The bad_session, sglist_error, and process_error sysctl nodes were returning the value of the pad_error node instead of the appropriate error counters.
323724: Enable support for lookaside crypto operations by default.
This permits ccr(4) to be used with the default firmware configuration file.
328353: Always store the IV in the immediate portion of a work request.
Combined authentication-encryption and GCM requests already stored the IV in the immediate explicitly. This extends this behavior to block cipher requests to work around a firmware bug. While here, simplify the AEAD and GCM handlers to not include always-true conditions.
328354: Always set the IV location to IV_NOP.
The firmware ignores this field in the FW_CRYPTO_LOOKASIDE_WR work request.
328355: Reject requests with AAD and IV larger than 511 bytes.
The T6 crypto engine's control messages only support a total AAD length (including the prefixed IV) of 511 bytes. Reject requests with large AAD rather than returning incorrect results.
328356: Don't discard AAD and IV output data for AEAD requests.
The T6 can hang when processing certain AEAD requests if the request sets a flag asking the crypto engine to discard the input IV and AAD rather than copying them into the output buffer. The existing driver always discards the IV and AAD as we do not need it. As a workaround, allocate a single "dummy" buffer when the ccr driver attaches and change all AEAD requests to write the IV and AAD to this scratch buffer. The contents of the scratch buffer are never used (similar to "bogus_page"), and it is ok for multiple in-flight requests to share this dummy buffer.
328357: Fail crypto requests when the resulting work request is too large.
Most crypto requests will not trigger this condition, but a request with a highly-fragmented data buffer (and a resulting "large" S/G list) could trigger it.
328358: Clamp DSGL entries to a length of 2KB.
This works around an issue in the T6 that can result in DMA engine stalls if an error occurs while processing a DSGL entry with a length larger than 2KB.
328359: Expand the software fallback for GCM to cover more cases.
- Extend ccr_gcm_soft() to handle requests with a non-empty payload. While here, switch to allocating the GMAC context instead of placing it on the stack since it is over 1KB in size. - Allow ccr_gcm() to return a special error value (EMSGSIZE) which triggers a fallback to ccr_gcm_soft(). Move the existing empty payload check into ccr_gcm() and change a few other cases (e.g. large AAD) to fallback to software via EMSGSIZE as well. - Add a new 'sw_fallback' stat to count the number of requests processed via the software fallback.
328360: Don't read or generate an IV until all error checking is complete.
In particular, this avoids edge cases where a generated IV might be written into the output buffer even though the request is failed with an error.
328361: Store IV in output buffer in GCM software fallback when requested.
Properly honor the lack of the CRD_F_IV_PRESENT flag in the GCM software fallback case for encryption requests.
330042: Don't overflow the ipad[] array when clearing the remainder.
After the auth key is copied into the ipad[] array, any remaining bytes are cleared to zero (in case the key is shorter than one block size). The full block size was used as the length of the zero rather than the size of the remaining ipad[]. In practice this overflow was harmless as it could only clear bytes in the following opad[] array which is initialized with a copy of ipad[] in the next statement.
343056: Reject new sessions if the necessary queues aren't initialized.
ccr reuses the control queue and first rx queue from the first port on each adapter. The driver cannot send requests until those queues are initialized. Refuse to create sessions for now if the queues aren't ready. This is a workaround until cxgbe allocates one or more dedicated queues for ccr.
Relnotes: yes Sponsored by: Chelsio Communications
|