#
345664 |
|
28-Mar-2019 |
jhb |
MFC 330040,330041,330079,330884,330946,330947,331649,333068,333810,337722, 340466,340468,340469,340473: Add TOE-based TLS offload.
Note that this requires a modified OpenSSL library.
330040: Fetch TLS key parameters from the firmware.
The parameters describe how much of the adapter's memory is reserved for storing TLS keys. The 'meminfo' sysctl now lists this region of adapter memory as 'TLS keys' if present.
330041: Move ccr_aes_getdeckey() from ccr(4) to the cxgbe(4) driver.
This routine will also be used by the TOE module to manage TLS keys.
330079: Move #include for rijndael.h out of x86-specific region.
The #include was added inside of the conditional by accident and the lack of it broke non-x86 builds.
330884: Support for TLS offload of TOE connections on T6 adapters.
The TOE engine in Chelsio T6 adapters supports offloading of TLS encryption and TCP segmentation for offloaded connections. Sockets using TLS are required to use a set of custom socket options to upload RX and TX keys to the NIC and to enable RX processing. Currently these socket options are implemented as TCP options in the vendor specific range. A patched OpenSSL library will be made available in a port / package for use with the TLS TOE support.
TOE sockets can either offload both transmit and reception of TLS records or just transmit. TLS offload (both RX and TX) is enabled by setting the dev.t6nex.<x>.tls sysctl to 1 and requires TOE to be enabled on the relevant interface. Transmit offload can be used on any "normal" or TLS TOE socket by using the custom socket option to program a transmit key. This permits most TOE sockets to transparently offload TLS when applications use a patched SSL library (e.g. using LD_LIBRARY_PATH to request use of a patched OpenSSL library). Receive offload can only be used with TOE sockets using the TLS mode. The dev.t6nex.0.toe.tls_rx_ports sysctl can be set to a list of TCP port numbers. Any connection with either a local or remote port number in that list will be created as a TLS socket rather than a plain TOE socket. Note that although this sysctl accepts an arbitrary list of port numbers, the sysctl(8) tool is only able to set sysctl nodes to a single value. A TLS socket will hang without receiving data if used by an application that is not using a patched SSL library. Thus, the tls_rx_ports node should be used with care. For a server mostly concerned with offloading TLS transmit, this node is not needed as plain TOE sockets will fall back to software crypto when using an unpatched SSL library.
New per-interface statistics nodes are added giving counts of TLS packets and payload bytes (payload bytes do not include TLS headers or authentication tags/MACs) offloaded via the TOE engine, e.g.:
dev.cc.0.stats.rx_tls_octets: 149 dev.cc.0.stats.rx_tls_records: 13 dev.cc.0.stats.tx_tls_octets: 26501823 dev.cc.0.stats.tx_tls_records: 1620
TLS transmit work requests are constructed by a new variant of t4_push_frames() called t4_push_tls_records() in tom/t4_tls.c.
TLS transmit work requests require a buffer containing IVs. If the IVs are too large to fit into the work request, a separate buffer is allocated when constructing a work request. This buffer is associated with the transmit descriptor and freed when the descriptor is ACKed by the adapter.
Received TLS frames use two new CPL messages. The first message is a CPL_TLS_DATA containing the decryped payload of a single TLS record. The handler places the mbuf containing the received payload on an mbufq in the TOE pcb. The second message is a CPL_RX_TLS_CMP message which includes a copy of the TLS header and indicates if there were any errors. The handler for this message places the TLS header into the socket buffer followed by the saved mbuf with the payload data. Both of these handlers are contained in tom/t4_tls.c.
A few routines were exposed from t4_cpl_io.c for use by t4_tls.c including send_rx_credits(), a new send_rx_modulate(), and t4_close_conn().
TLS keys for both transmit and receive are stored in onboard memory in the NIC in the "TLS keys" memory region.
In some cases a TLS socket can hang with pending data available in the NIC that is not delivered to the host. As a workaround, TLS sockets are more aggressive about sending CPL_RX_DATA_ACK messages anytime that any data is read from a TLS socket. In addition, a fallback timer will periodically send CPL_RX_DATA_ACK messages to the NIC for connections that are still in the handshake phase. Once the connection has finished the handshake and programmed RX keys via the socket option, the timer is stopped.
A new function select_ulp_mode() is used to determine what sub-mode a given TOE socket should use (plain TOE, DDP, or TLS). The existing set_tcpddp_ulp_mode() function has been renamed to set_ulp_mode() and handles initialization of TLS-specific state when necessary in addition to DDP-specific state.
Since TLS sockets do not receive individual TCP segments but always receive full TLS records, they can receive more data than is available in the current window (e.g. if a 16k TLS record is received but the socket buffer is itself 16k). To cope with this, just drop the window to 0 when this happens, but track the overage and "eat" the overage as it is read from the socket buffer not opening the window (or adding rx_credits) for the overage bytes.
330946: Remove TLS-related inlines from t4_tom.h to fix iw_cxgbe(4) build.
- Remove the one use of is_tls_offload() and the function. AIO special handling only needs to be disabled when a TOE socket is actively doing TLS offload on transmit. The TOE socket's mode (which affects receive operation) doesn't matter, so remove the check for the socket's mode and only check if a TOE socket has TLS transmit keys configured to determine if an AIO write request should fall back to the normal socket handling instead of the TOE fast path. - Move can_tls_offload() into t4_tls.c. It is not used in critical paths, so inlining isn't that important. Change return type to bool while here.
330947: Fix the check for an empty send socket buffer on a TOE TLS socket.
Compare sbavail() with the cached sb_off of already-sent data instead of always comparing with zero. This will correctly close the connection and send the FIN if the socket buffer contains some previously-sent data but no unsent data.
331649: Use the offload transmit queue to set flags on TLS connections.
Requests to modify the state of TLS connections need to be sent on the same queue as TLS record transmit requests to ensure ordering.
However, in order to use the offload transmit queue in t4_set_tcb_field(), the function needs to be updated to do proper flow control / credit management when queueing a request to an offload queue. This required passing a pointer to the toepcb itself to this function, so while here remove the 'tid' and 'iqid' parameters and obtain those values from the toepcb in t4_set_tcb_field() itself.
333068: Use the correct key address when renegotiating the transmit key.
Previously, get_keyid() was returning the address of the receive key instead of the transmit key when renegotiating the transmit key. This could either hang the card (if a connection was only offloading TLS TX and thus had a receive key address of -1) or cause the connection to fail by overwriting the wrong key (if both RX and TX TLS were offloaded).
333810: Be more robust against garbage input on a TOE TLS TX socket.
If a socket is closed or shutdown and a partial record (or what appears to be a partial record) is waiting in the socket buffer, discard the partial record and close the connection rather than waiting forever for the rest of the record.
337722: Whitespace nit in t4_tom.h
340466: Move the TLS key map into the adapter softc so non-TOE code can use it.
340468: Change the quantum for TLS key addresses to 32 bytes.
The addresses passed when reading and writing keys are always shifted right by 5 as the memory locations are addressed in 32-byte chunks, so the quantum needs to be 32, not 8.
340469: Remove bogus roundup2() of the key programming work request header.
The key context is always placed immediately after the work request header. The total work request length has to be rounded up by 16 however.
340473: Restore the <sys/vmem.h> header to fix build of cxgbe(4) TOM.
vmem's are not just used for TLS memory in TOM and the #include actually predates the TLS code so should not have been removed when the TLS vmem moved in r340466.
Sponsored by: Chelsio Communications
|
#
345040 |
|
11-Mar-2019 |
jhb |
MFC 318429,318967,319721,319723,323600,323724,328353-328361,330042,343056: Add a driver for the Chelsio T6 crypto accelerator engine.
Note that with the set of commits in this batch, no additional tunables are needed to use the driver once it is loaded.
318429: Add a driver for the Chelsio T6 crypto accelerator engine.
The ccr(4) driver supports use of the crypto accelerator engine on Chelsio T6 NICs in "lookaside" mode via the opencrypto framework.
Currently, the driver supports AES-CBC, AES-CTR, AES-GCM, and AES-XTS cipher algorithms as well as the SHA1-HMAC, SHA2-256-HMAC, SHA2-384-HMAC, and SHA2-512-HMAC authentication algorithms. The driver also supports chaining one of AES-CBC, AES-CTR, or AES-XTS with an authentication algorithm for encrypt-then-authenticate operations.
Note that this driver is still under active development and testing and may not yet be ready for production use. It does pass the tests in tests/sys/opencrypto with the exception that the AES-GCM implementation in the driver does not yet support requests with a zero byte payload.
To use this driver currently, the "uwire" configuration must be used along with explicitly enabling support for lookaside crypto capabilities in the cxgbe(4) driver. These can be done by setting the following tunables before loading the cxgbe(4) driver:
hw.cxgbe.config_file=uwire hw.cxgbe.cryptocaps_allowed=-1
318967: Fail large requests with EFBIG.
The adapter firmware in general does not accept PDUs larger than 64k - 1 bytes in size. Sending crypto requests larger than this size result in hangs or incorrect output, so reject them with EFBIG. For requests chaining an AES cipher with an HMAC, the firmware appears to require slightly smaller requests (around 512 bytes).
319721: Add explicit handling for requests with an empty payload.
- For HMAC requests, construct a special input buffer to request an empty hash result. - For plain cipher requests and requests that chain an AES cipher with an HMAC, fail with EINVAL if there is no cipher payload. If needed in the future, chained requests that only contain AAD could be serviced as HMAC-only requests. - For GCM requests, the hardware does not support generating the tag for an AAD-only request. Instead, complete these requests synchronously in software on the assumption that such requests are rare.
319723: Fix the software fallback for GCM to validate the existing tag for decrypts.
323600: Fix some incorrect sysctl pointers for some error stats.
The bad_session, sglist_error, and process_error sysctl nodes were returning the value of the pad_error node instead of the appropriate error counters.
323724: Enable support for lookaside crypto operations by default.
This permits ccr(4) to be used with the default firmware configuration file.
328353: Always store the IV in the immediate portion of a work request.
Combined authentication-encryption and GCM requests already stored the IV in the immediate explicitly. This extends this behavior to block cipher requests to work around a firmware bug. While here, simplify the AEAD and GCM handlers to not include always-true conditions.
328354: Always set the IV location to IV_NOP.
The firmware ignores this field in the FW_CRYPTO_LOOKASIDE_WR work request.
328355: Reject requests with AAD and IV larger than 511 bytes.
The T6 crypto engine's control messages only support a total AAD length (including the prefixed IV) of 511 bytes. Reject requests with large AAD rather than returning incorrect results.
328356: Don't discard AAD and IV output data for AEAD requests.
The T6 can hang when processing certain AEAD requests if the request sets a flag asking the crypto engine to discard the input IV and AAD rather than copying them into the output buffer. The existing driver always discards the IV and AAD as we do not need it. As a workaround, allocate a single "dummy" buffer when the ccr driver attaches and change all AEAD requests to write the IV and AAD to this scratch buffer. The contents of the scratch buffer are never used (similar to "bogus_page"), and it is ok for multiple in-flight requests to share this dummy buffer.
328357: Fail crypto requests when the resulting work request is too large.
Most crypto requests will not trigger this condition, but a request with a highly-fragmented data buffer (and a resulting "large" S/G list) could trigger it.
328358: Clamp DSGL entries to a length of 2KB.
This works around an issue in the T6 that can result in DMA engine stalls if an error occurs while processing a DSGL entry with a length larger than 2KB.
328359: Expand the software fallback for GCM to cover more cases.
- Extend ccr_gcm_soft() to handle requests with a non-empty payload. While here, switch to allocating the GMAC context instead of placing it on the stack since it is over 1KB in size. - Allow ccr_gcm() to return a special error value (EMSGSIZE) which triggers a fallback to ccr_gcm_soft(). Move the existing empty payload check into ccr_gcm() and change a few other cases (e.g. large AAD) to fallback to software via EMSGSIZE as well. - Add a new 'sw_fallback' stat to count the number of requests processed via the software fallback.
328360: Don't read or generate an IV until all error checking is complete.
In particular, this avoids edge cases where a generated IV might be written into the output buffer even though the request is failed with an error.
328361: Store IV in output buffer in GCM software fallback when requested.
Properly honor the lack of the CRD_F_IV_PRESENT flag in the GCM software fallback case for encryption requests.
330042: Don't overflow the ipad[] array when clearing the remainder.
After the auth key is copied into the ipad[] array, any remaining bytes are cleared to zero (in case the key is shorter than one block size). The full block size was used as the length of the zero rather than the size of the remaining ipad[]. In practice this overflow was harmless as it could only clear bytes in the following opad[] array which is initialized with a copy of ipad[] in the next statement.
343056: Reject new sessions if the necessary queues aren't initialized.
ccr reuses the control queue and first rx queue from the first port on each adapter. The driver cannot send requests until those queues are initialized. Refuse to create sessions for now if the queues aren't ready. This is a workaround until cxgbe allocates one or more dedicated queues for ccr.
Relnotes: yes Sponsored by: Chelsio Communications
|
#
330307 |
|
03-Mar-2018 |
np |
MFC r319506, r319872, r321063, r321103, r321179, r321390, r321435, r321582, r321671, r322014, r322034, r322055, r322123, r322167, r322425, r322549, r322914, r322960, r322962, r322964, r322985, r322990, r323006, r323026, r323041, r323069, r323078, r323343, r323514, r323520, r324296, r324379, r324386, r324443, r324945, r325596, r325680, r325880, r325883-r325884, r325961, r326026, r326042, r327062, r327093, r327332, r327528, r328420, and r328423.
r319506: cxgbe(4): Update the statistics for compound tx work requests once per work request, not once per frame.
r319872: cxgbe(4): Do not request an FEC setting that the port does not support.
r321063: cxgbe(4): Various link/media related improvements.
- Deal with changes to port_type, and not just port_mod when a transceiver is changed. This fixes hot swapping of transceivers of different types (QSFP+ or QSA or QSFP28 in a QSFP28 port, SFP+ or SFP28 in a SFP28 port, etc.).
- Always refresh media information for ifconfig if the port is down. The firmware does not generate tranceiver-change interrupts unless at least one VI is enabled on the physical port. Before this change ifconfig diplayed potentially stale information for ports that were administratively down.
- Always recalculate and reapply L1 config on a transceiver change.
- Display PAUSE settings in ifconfig. The driver sysctls for this continue to work as well.
r321103: cxgbe(4): New ioctls to flash bootrom and boot config to the card.
r321179: cxgbe/t4_tom: Log more details about the newly ESTABLISHED tid to the trace buffer.
r321390: cxgbe(4): Install the firmware bundled with the driver to the card if it doesn't seem to have one. This lets the driver recover automatically from incomplete firmware upgrades (panic, reboot, power loss, etc. in the middle of an upgrade).
r321435: cxgbe(4): Display some more TOE parameters related to retransmission and keepalive in the sysctl MIB. Provide tunables to change some of these parameters. These are supposed to be setup by the firmware so these tunables are for experimentation only.
r321582: cxgbe(4): Some updates to the common code.
- Updated register ranges. - Helper routines for access to TP registers. - Updated routine to read flash parameters.
r321671: cxgbe/iw_cxgbe: Log the end point's history and flags to the trace buffer just before it's freed.
r322014: cxgbe(4): Initial import of the "collect" component of Chelsio unified debug (cudbg) code, hooked up to the main driver via an ioctl.
The ioctl can be used to collect the chip's internal state in a compressed dump file. These dumps can be decoded with the "view" component of cudbg.
r322034: cxgbe(4): Always use the first and not the last virtual interface associated with a port in begin_synchronized_op.
r322055: cxgbe(4): Allow the TOE timer tunables to be set with microsecond precision. These timers are already displayed in microseconds in the sysctl MIB. Add variables to track these tunables while here.
r322123: cxgbe(4): Avoid a NULL dereference that would occur during module unload if there were problems earlier during attach.
r322167: cxgbe(4): Add the T6 and T5 Unified Wire configuration files to the kernel, just like for T4, when the driver is compiled into the kernel.
r322425: cxgbe(4): Save the last reported link parameters and compare them with the current state to determine whether to generate a link-state change notification. This fixes a bug introduced in r321063 that caused the driver to sometimes skip these notifications.
r322549: cxgbe/t4_tom: Use correct name for the ISS-valid bit in options2.
r322914: cxgbe(4): Dump the mailbox contents in the same format as CH_DUMP_MBOX.
r322960: cxgbe(4): Verify that the driver accesses the firmware mailbox in a thread-safe manner.
r322962: cxgbe(4): Remove write only variable from t4_port_init.
r322964: cxgbe(4): vi_mac_funcs should include the base Ethernet function. It is already used in the driver as if it does.
r322985: cxgbe(4): Maintain one ifmedia per physical port instead of one per Virtual Interface (VI). All autonomous VIs that share a port share the same media.
r322990: cxgbe(4): Do not access the mailbox without appropriate locks while creating hardware VIs.
This fixes a bad race on systems with hw.cxgbe.num_vis > 1.
r323006: cxgbe(4): Update T6/T5/T4 firmwares to 1.16.59.0.
r323026: cxgbe(4): Zero out the memory allocated for the debug dump. cudbg_collect seems to expect it this way.
r323041: cxgbe(4): Add two new debug flags -- one to allow manual firmware install after full initialization, and another to disable the TCB cache (T6+). The latter works as a tunable only.
Note that debug_flags are for debugging only and should not be set normally.
r323069: cxgbe/t4_tom: Add a knob to select the congestion control algorigthm used by the TOE hardware for fully offloaded connections. The knob affects new connections only.
r323078: cxgbe/t4_tom: There may not be a tid to update if the connection isn't established.
r323343: cxgbe(4): Fix a couple of problems in the sge_wrq data path.
- start_wrq_wr must not drain the wr_list if there are incomplete_wrs pending. This can happen when a t4_wrq_tx runs between two start_wrq_wr.
- commit_wrq_wr must examine the cookie's pidx and ndesc with the queue's lock held. Otherwise there is a bad race when incomplete WRs are being completed and commit_wrq_wr for the WR that is ahead in the queue updates the next incomplete WR's cookie's pidx/ndesc but the commit_wrq_wr for the second one is using stale values that it read without the lock.
r323514: cxgbetool(8): mode must be specified when creating the dump file.
r323520: cxgbe(4): Ignore capabilities that depend on TOE when the firmware reports TOE is not available.
r324296: cxgbe(4): Provide knobs to set the holdoff parameters of TOE rx queues separately from NIC rx queues instead of using the same parameters for both types of queues.
r324379: cxgbetool(8): Do not create a large file devoid of useful content when the dumpstate ioctl fails. Make the file world-readable while here.
r324386: cxgbe(4): Update T6, T5, and T4 firmwares to 1.16.63.0.
r324443: cxgbetool(8): Do not close uninitialized fd on malloc failure.
r324945: cxgbe(4): Read the MPS buffer group map from the firmware as it could be different from hardware defaults. The congestion channel map, which is still fixed, needs to be tracked separately now. Change the congestion setting for TOE rx queues to match the drivers on other OSes while here.
r325596: cxgbe(4): Do not request settings not supported by the port.
r325680: cxgbe(4): Excluce mdi from the check against port capabilities.
r325880: cxgbe(4): Combine all _10g and _1g tunables and drop the suffix from their names. The finer-grained knobs weren't practically useful.
r325883: cxgbe(4): Sanitize t4_num_vis during MOD_LOAD like all other t4_* tunables. Add num_vis to the intrs_and_queues structure as it affects the number of interrupts requested and queues created. In future cfg_itype_and_nqueues might lower it incrementally instead of going straight to 1 when enough interrupts aren't available.
r325884: cxgbe(4): Remove rsrv_noflowq from intrs_and_queues structure as it does not influence or get affected by the number of interrupts or queues.
r325961: cxgbe(4): Add core Vdd to the sysctl MIB.
r326026: cxgbe(4): Add a custom board to the device id list.
r326042: cxgbe(4): Fix unsafe mailbox access in cudbg.
r327062: cxgbe(4): Read the MFG diags version from the VPD and make it available in the sysctl MIB.
r327093: cxgbe(4): Do not forward interrupts to queues with freelists. This leaves the firmware event queue (fwq) as the only queue that can take interrupts for others.
This simplifies cfg_itype_and_nqueues and queue allocation in the driver at the cost of a little (never?) used configuration. It also allows service_iq to be split into two specialized variants in the future.
r327332: cxgbe(4): Reduce duplication by consolidating minor variations of the same code into a single routine.
r327528: cxgbe(4): Add a knob to enable/disable PCIe relaxed ordering. Disable it by default when running on Intel CPUs.
r328420: cxgbe(4): Do not display harmless warning in non-debug builds.
r328423: cxgbe(4): Accept old names of a couple of tunables.
Sponsored by: Chelsio Communications
|
#
309560 |
|
05-Dec-2016 |
jhb |
MFC 305695,305696,305699,305702,305703,305713,305715,305827,305852,305906, 305908,306062,306063,306137,306138,306206,306216,306273,306295,306301, 306465,309302: Add support for adapters using the Terminator T6 ASIC.
305695: cxgbe(4): Set up fl_starve_threshold2 accurately for T6.
305696: cxgbe(4): Use correct macro for header length with T6 ASICs. This affects the transmit of the VF driver only.
305699: cxgbe(4): Update the pad_boundary calculation for T6, which has a different range of boundaries.
305702: cxgbe(4): Use smaller min/max bursts for fl descriptors with a T6.
305703: cxgbe(4): Deal with the slightly different SGE_STAT_CFG in T6.
305713: cxgbe(4): Add support for additional port types and link speeds.
305715: cxgbe(4): Catch up with the rename of tlscaps -> cryptocaps. TLS is one of the capabilities of the crypto engine in T6.
305827: cxgbe(4): Use the interface's viid to calculate the PF/VF/VFValid fields to use in tx work requests.
305852: cxgbe(4): Attach to cards with the Terminator 6 ASIC. T6 cards will come up as 't6nex' nexus devices with 'cc' ports hanging off them.
The T6 firmware and configuration files will be added as soon as they are released. For now the driver will try to work with whatever firmware and configuration is on the card's flash.
305906: cxgbe/t4_tom: The SMAC entry for a VI is at a different location in the T6.
305908: cxgbe/t4_tom: Update the active/passive open code to support T6. Data path works as-is.
306062: cxgbe(4): Show wcwr_stats for T6 cards.
306063: cxgbe(4): Setup congestion response for T6 rx queues.
306137: cxgbetool: Add T6 support to the SGE context decoder.
306138: Fix typo.
306206: cxgbe(4): Catch up with the different layout of WHOAMI in T6.
Note that the code moved below t4_prep_adapter() as part of this change because now it needs a working chip_id().
306216: cxgbe(4): Fix the output of the "tids" sysctl on T6.
306273: cxgbe(4): Fix netmap with T6, which doesn't encapsulate SGE_EGR_UPDATE message inside a FW_MSG. The base NIC already deals with updates in either form.
306295: cxgbe(4): Support SIOGIFXMEDIA so that ifconfig displays correct media for 25Gbps and 100Gbps ports. This should have been part of r305713, which is when the driver first started reporting extended media types.
306301: cxgbe(4): Use the port's top speed to figure out whether it is "high speed" or not (for the purpose of calculating the number of queues etc.) This does the right thing for 25Gbps and 100Gbps ports.
306465: cxgbe(4): Claim the T6 -DBG card.
309302: cxgbe(4): Include firmware for T6 cards in the driver. Update all firmwares to 1.16.12.0.
Sponsored by: Chelsio Communications
|