#
7aa50380 |
|
21-Feb-2023 |
Rahul Rameshbabu <rrameshbabu@nvidia.com> |
net/mlx5e: Fix SQ wake logic in ptp napi_poll context Check in the mlx5e_ptp_poll_ts_cq context if the ptp tx sq should be woken up. Before change, the ptp tx sq may never wake up if the ptp tx ts skb fifo is full when mlx5e_poll_tx_cq checks if the queue should be woken up. Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c1783e74 |
|
17-Apr-2023 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5e: XDP, Add support for multi-buffer XDP redirect-in Handle multi-buffer XDP redirect-in requests coming through mlx5e_xdp_xmit. Extend struct mlx5e_xmit_data_frags with an additional dma_arr field, to point to the fragments dma mapping, as they cannot be retrieved via the page_pool_get_dma_addr() function. Push a dma_addr xdpi instance per each fragment, and use them in the completion flow to dma_unmap the frags. Finally, remove the restriction in mlx5e_open_xdpsq, and set the flag in xdp_features. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb9b9fdc |
|
17-Apr-2023 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5e: Introduce extended version for mlx5e_xmit_data Introduce struct mlx5e_xmit_data_frags to be used for non-linear xmit buffers. Let it include sinfo pointer. Take one bit from the len field to indicate if the descriptor has fragments and can be casted-up into the extended version. Zero-init to make sure has_frags, and potentially future fields, are zero when not explicitly assigned. Another field will be added in a downstream patch to indicate and point to dma addresses of the different frags, for redirect-in requests. This simplifies the mlx5e_xmit_xdp_frame/mlx5e_xmit_xdp_frame_mpwqe functions params. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e32654f1 |
|
17-Apr-2023 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5e: Move struct mlx5e_xmit_data to datapath header Move TX datapath struct from the generic en.h to the datapath txrx.h header, where it belongs. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4c2a1323 |
|
13-Feb-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Defer page release in striding rq for better recycling Currently, for striding RQ, fragmented pages from the page pool can get released in two ways: 1) In the mlx5e driver when trimming off the unused fragments AND the associated skb fragments have been released. This path allows recycling of pages to the page pool cache (allow_direct == true). 2) On the skb release path (last fragment release), which will always release pages to the page pool ring (allow_direct == false). Whichever is releasing the last fragment will be decisive on where the page gets released: the cache or the ring. So we obviously want to maximize for doing the release from 1. This patch does that by deferring the release of page fragments right before requesting new ones from the page pool. Extra care needs to be taken for the corner cases: * On first call, make sure that release is not called. The skip_release_bitmap is used for this purpose. * On rq shutdown, make sure that all wqes that were not in the linked list are released. For a single ring, single core, default MTU (1500) TCP stream test the number of pages allocated from the cache directly (rx_pp_recycle_cached) increases from 31 % to 98 %: +----------------------------------------------+ | Page Pool stats (/sec) | Before | After | +-------------------------+---------+----------+ |rx_pp_alloc_fast | 2137754 | 2261033 | |rx_pp_alloc_slow | 47 | 9 | |rx_pp_alloc_empty | 47 | 9 | |rx_pp_alloc_refill | 23230 | 819 | |rx_pp_alloc_waive | 0 | 0 | |rx_pp_recycle_cached | 672182 | 2209015 | |rx_pp_recycle_cache_full | 1789 | 0 | |rx_pp_recycle_ring | 1485848 | 52259 | |rx_pp_recycle_ring_full | 3003 | 584 | +----------------------------------------------+ With this patch, the performance in striding rq for the above test is back to baseline. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
6f574284 |
|
18-Jan-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Enable skb page recycling through the page_pool Start using the page_pool skb recycling api to recycle all pages back to the page pool and stop using atomic page reference counting. The mlx5e driver used to manage in-flight pages using page refcounting: for each fragment there were 2 atomic write operations happening (one for building the skb and one on skb release). The page_pool api introduced a method to track page fragments more optimally: * The page's pp_fragment_count is set to a large bias on page alloc (1 x atomic write operation). * The driver tracks the actual page fragments in a non atomic variable. * When the skb is recycled, pp_fragment_count is decremented (atomic write operation). * When page is released in the driver, the unused number of fragments (relative to the bias) is deducted from pp_fragment_count (atomic write operation). * Last page defragmentation will only be an atomic read. So in total there are `number of fragments + 1` atomic write ops. As opposed to previously: `2 * frags` atomic writes ops. Pages are wrapped in a mlx5e_frag_page structure which also contains the number of fragments. This makes it easy to count the fragments in the driver. This change brings performance improvements for the case when the old rx page_cache had low recycling rates due to head of queue blocking. For a iperf3 TCP test with a single stream, on a single core (iperf and receive queue running on same core), the following improvements can be noticed: * Striding rq: - before (net-next baseline): bitrate = 30.1 Gbits/sec - after : bitrate = 31.4 Gbits/sec (diff: 4.14 %) * Legacy rq: - before (net-next baseline): bitrate = 30.2 Gbits/sec - after : bitrate = 33.0 Gbits/sec (diff: 8.48 %) There are 2 temporary performance degradations introduced: 1) TCP streams that had a good recycling rate with the old page_cache have a degradation for both striding and linear rq. This is due to very low page pool cache recycling: the pages are released during skb recycle which will release pages to the page pool ring for safety. The following patches in this series will tackle this problem by deferring the page release in the driver to increase the chance of having pages recycled to the cache. 2) XDP performance is now lower (4-5 %) due to the higher number of atomic operations used for fragment management. But this opens the door for supporting multiple packets per page in XDP, which will bring a big gain. Otherwise, performance is similar to baseline. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
4a5c5e25 |
|
14-Dec-2022 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Enable dma map and sync from page_pool allocator Remove driver dma mapping and unmapping of pages. Let the page_pool api do it. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d39092ca |
|
29-Jan-2023 |
Dragos Tatulea <dtatulea@nvidia.com> |
net/mlx5e: RX, Remove alloc unit layout constraint for striding rq This change removes the usage of mlx5e_alloc_unit union for striding rq. The change is more straightforward than legacy rq as the alloc units union is already in place. This patch only moves things around: instead of an array of unions make it a union of arrays. This has the effect that each mlx5e_mpw_info will allocate the largest possible size of the array member. For xsk this means that the array of xdp_buff pointers for the wqe will still be contiguous, but there will be some extra unused bytes at the end of the array. As further patch in the series will add the mlx5e_frag_page struct for which the described size constraint will no longer hold. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
aa98d15e |
|
13-Mar-2023 |
Rahul Rameshbabu <rrameshbabu@nvidia.com> |
net/mlx5e: Utilize the entire fifo Previous check was comparing against the fifo mask. The mask is size of the fifo (power of two) minus one, so a less than or equal comparator should be used for checking if the fifo has room for the SKB. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-6-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
3a50cf1e |
|
02-Feb-2023 |
Vadim Fedorenko <vadfed@meta.com> |
mlx5: fix possible ptp queue fifo use-after-free Fifo indexes are not checked during pop operations and it leads to potential use-after-free when poping from empty queue. Such case was possible during re-sync action. WARN_ON_ONCE covers future cases. There were out-of-order cqe spotted which lead to drain of the queue and use-after-free because of lack of fifo pointers check. Special check and counter are added to avoid resync operation if SKB could not exist in the fifo because of OOO cqe (skb_id must be between consumer and producer index). Fixes: 58a518948f60 ("net/mlx5e: Add resiliency for PTP TX port timestamp") Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e435941b |
|
02-Feb-2023 |
Vadim Fedorenko <vadfed@meta.com> |
mlx5: fix skb leak while fifo resync and push During ptp resync operation SKBs were poped from the fifo but were never freed neither by napi_consume nor by dev_kfree_skb_any. Add call to napi_consume_skb to properly free SKBs. Another leak was happening because mlx5e_skb_fifo_has_room() had an error in the check. Comparing free running counters works well unless C promotes the types to something wider than the counter. In this case counters are u16 but the result of the substraction is promouted to int and it causes wrong result (negative value) of the check when producer have already overlapped but consumer haven't yet. Explicit cast to u16 fixes the issue. Fixes: 58a518948f60 ("net/mlx5e: Add resiliency for PTP TX port timestamp") Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b5618a6b |
|
15-Feb-2023 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5e: Remove unused function mlx5e_sq_xmit_simple The last usage was removed as part of commit 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support"). Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
bc8d405b |
|
19-Jan-2023 |
Toke Høiland-Jørgensen <toke@redhat.com> |
net/mlx5e: Support RX XDP metadata Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe pointer to the mlx5e_skb_from* functions so it can be retrieved from the XDP ctx to do this. Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20230119221536.3349901-17-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
#
022dbea0 |
|
28-Nov-2022 |
Rahul Rameshbabu <rrameshbabu@nvidia.com> |
net/mlx5e: Suppress Send WQEBB room warning for PAGE_SIZE >= 16KB Send WQEBB size is 64 bytes and the max number of WQEBBs for an SQ is 255. For 16KB pages and greater, there is always sufficient spaces for all WQEBBs of an SQ. Cast mlx5e_get_max_sq_wqebbs(mdev) to u16. Prevents -Wtautological-constant-out-of-range-compare warnings from occurring when PAGE_SIZE >= 16KB. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f9c955b4 |
|
03-Nov-2022 |
Maxim Mikityanskiy <maximmi@nvidia.com> |
net/mlx5e: Add missing sanity checks for max TX WQE size The commit cited below started using the firmware capability for the maximum TX WQE size. This commit adds an important check to verify that the driver doesn't attempt to exceed this capability, and also restores another check mistakenly removed in the cited commit (a WQE must not exceed the page size). Fixes: c27bd1718c06 ("net/mlx5e: Read max WQEBBs on the SQ from firmware") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
19b43a43 |
|
26-Oct-2022 |
Aya Levin <ayal@nvidia.com> |
net/mlx5e: Extend SKB room check to include PTP-SQ When tx_port_ts is set, the driver diverts all UPD traffic over PTP port to a dedicated PTP-SQ. The SKBs are cached until the wire-CQE arrives. When the packet size is greater then MTU, the firmware might drop it and the packet won't be transmitted to the wire, hence the wire-CQE won't reach the driver. In this case the SKBs are accumulated in the SKB fifo. Add room check to consider the PTP-SQ SKB fifo, when the SKB fifo is full, driver stops the queue resulting in a TX timeout. Devlink TX-reporter can recover from it. Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
cf544517 |
|
30-Sep-2022 |
Maxim Mikityanskiy <maximmi@nvidia.com> |
net/mlx5e: xsk: Use xsk_buff_alloc_batch on striding RQ XSK provides a function to allocate frames in batches for more efficient processing. This commit starts using this function on striding RQ and creates an optimized flow for XSK. A side effect is an opportunity to optimize the regular RX flow by dropping branching for XSK cases. Performance improvement is up to 6.4% in the aligned mode and up to 7.5% in the unaligned mode. Aligned mode, 2048-byte frames: 12.9 Mpps -> 13.8 Mpps Aligned mode, 4096-byte frames: 11.8 Mpps -> 12.5 Mpps Unaligned mode, 2048-byte frames: 11.9 Mpps -> 12.8 Mpps Unaligned mode, 3072-byte frames: 11.4 Mpps -> 12.1 Mpps Unaligned mode, 4096-byte frames: 11.0 Mpps -> 11.2 Mpps CPU: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
4c78782e |
|
27-Sep-2022 |
Maxim Mikityanskiy <maximmi@nvidia.com> |
net/mlx5e: kTLS, Check ICOSQ WQE size in advance Instead of WARNing in runtime when TLS offload WQEs posted to ICOSQ are over the hardware limit, check their size before enabling TLS RX offload, and block the offload if the condition fails. It also allows to drop a u16 field from struct mlx5e_icosq. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
21a0502d |
|
27-Sep-2022 |
Maxim Mikityanskiy <maximmi@nvidia.com> |
net/mlx5e: Use the aligned max TX MPWQE size TX MPWQE size is limited to the cacheline-aligned maximum. Use the same value for the stop room and the capability check. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ddc87e7d |
|
28-Jan-2022 |
Maxim Mikityanskiy <maximmi@nvidia.com> |
net/mlx5e: Store DMA address inside struct page Use page_pool_set_dma_addr() to store the DMA address of a page inside struct page, in order to avoid passing struct mlx5e_dma_info to XDP handlers. Previously, struct mlx5e_dma_info was used to pass both the DMA address and the page, and it worked well for the single-fragment case. When XDP multi buffer is in use, and a fragmented xdp_frame has to be transmitted, the driver needs to know the DMA addresses of fragments, however, the array of fragments in struct skb_shared_info doesn't contain them. In order to pass the DMA addresses, the driver puts them into struct page itself, which is accessible from the array of fragments in struct skb_shared_info. The existing XDP handlers are modified to remove the dependency on struct mlx5e_dma_info. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
6b23f6ab |
|
24-Jan-2022 |
Maxim Mikityanskiy <maximmi@nvidia.com> |
net/mlx5e: Move mlx5e_select_queue to en/selq.c This commit moves mlx5e_select_queue and all stuff related to ndo_select_queue to en/selq.c to put all stuff working with selq into a separate file. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
76c31e5f |
|
10-May-2021 |
Aya Levin <ayal@nvidia.com> |
net/mlx5e: Use FW limitation for max MPW WQEBBs Calculate maximal count of MPW WQEBBs on SQ's creation and store it there. Remove MLX5E_TX_MPW_MAX_NUM_DS and MLX5E_TX_MPW_MAX_WQEBBS. Update mlx5e_tx_mpwqe_is_full() and mlx5e_xdp_mpqwe_is_full() . Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c27bd171 |
|
17-Jan-2022 |
Aya Levin <ayal@nvidia.com> |
net/mlx5e: Read max WQEBBs on the SQ from firmware Prior to this patch the maximal value for max WQEBBs (WQE Basic Blocks, where WQE is a Work Queue Element) on the TX side was assumed to be 16 (fixed value). All firmware versions till today comply to this. In order to be more flexible and resilient, read from FW the corresponding: max_wqe_sz_sq. This value describes the maximum WQE size given in bytes, thus max WQEBBs is given by the division in WQEBB's byte size. The driver uses the top between 16 and the division result. This ensures synchronization between driver and firmware and avoids unexpected behavior. Store this value on the different SQs (Send Queues) for easy access. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b8d91145 |
|
26-Jan-2022 |
Khalid Manaa <khalidm@nvidia.com> |
net/mlx5e: Fix wrong calculation of header index in HW_GRO The HW doesn't wrap the CQE.shampo.header_index field according to the headers buffer size, instead it always increases it until reaching overflow of u16 size. Thus the mlx5e_handle_rx_cqe_mpwrq_shampo handler should mask the CQE header_index field to find the actual header index in the headers buffer. Fixes: f97d5c2a453e ("net/mlx5e: Add handle SHAMPO cqe support") Signed-off-by: Khalid Manaa <khalidm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
64509b05 |
|
13-Sep-2020 |
Ben Ben-Ishay <benishay@nvidia.com> |
net/mlx5e: Add data path for SHAMPO feature The header buffer is used to store the headers of the rx packets. The header buffer size deduced from WorkQueue size + restriction of max packets per WorkQueueElement. This commit adds the functionality for posting/updating memory for the header buffer during the posting/updating of WQEs. Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
3ff3874f |
|
10-Feb-2021 |
Tariq Toukan <tariqt@nvidia.com> |
net/mlx5e: Guarantee room for XSK wakeup NOP on async ICOSQ XSK wakeup flow triggers an IRQ by posting a NOP WQE and hitting the doorbell on the async ICOSQ. It maintains its state so that it doesn't issue another NOP WQE if it has an outstanding one already. For this flow to work properly, the NOP post must not fail. Make sure to reserve room for the NOP WQE in all WQE posts to the async ICOSQ. Fixes: 8d94b590f1e4 ("net/mlx5e: Turn XSK ICOSQ into a general asynchronous one") Fixes: 1182f3659357 ("net/mlx5e: kTLS, Add kTLS RX HW offload support") Fixes: 0419d8c9d8f8 ("net/mlx5e: kTLS, Add kTLS RX resync support") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
432119de |
|
12-Feb-2021 |
Aya Levin <ayal@nvidia.com> |
net/mlx5: Add cyc2time HW translation mode support Device timestamp can be in real time mode (cycles to time translation is offloaded into the Hardware). With real time mode, HW provides timestamp which is already translated into nanoseconds. With this mode, driver adjusts both the HW and timecounter (to keep clock_info_page updated) using callbacks: adjfreq, adjtime and settime. HW clock modifications are done via MTUTC access reg commands. Driver is allowed to modify HW real time clock only if MCAM ptpcyc2realtime_modify capability is set. Add MTUTC set function to be used for configuring the HW real time clock. Modify existing code to support both internal timer (with conversion via timecounter_cyc2time() and real time (no conversions). Align the signatures of the helpers converting from timestamp to nanoseconds. With that, when allocating a queue assign the corresponding callback with respect to the capability. Adjust 1PPS timestamp calculation flows based on the timestamp mode. Cyc2time offload brings two major advantages: - Improve MTAE (Max Time Absolute Error) for HW TS by up to 160 ns over a 100% loaded CPU. - Faster data-path timestamp to nanoseconds, as translation is lock-less and done in HW. On real time mode, timestamp format is 32 high bits of seconds and 32 low bits of nanoseconds. On some flows, driver shall convert this format into nanoseconds wall-clock with REAL_TIME_TO_NS macro. HW supports a single clock, and it is shared by all functions on a device. In case real time clock is used, it is recommended to use a single GM to all device's functions. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b544011f |
|
12-Nov-2020 |
Moshe Shemesh <moshe@mellanox.com> |
net/mlx5e: Fix SWP offsets when vlan inserted by driver In case WQE includes inline header the vlan is inserted by driver even if vlan offload is set. On geneve over vlan interface where software parser is used the SWP offsets should be updated according to the added vlan. Fixes: e3cfc7e6b7bd ("net/mlx5e: TX, Add geneve tunnel stateless offload support") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
cecaa6a7 |
|
01-Dec-2020 |
Eran Ben Elisha <eranbe@nvidia.com> |
net/mlx5e: Move MLX5E_RX_ERR_CQE macro MLX5E_RX_ERR_CQE Macro is used only in data-path, move it to the appropriate header file. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
0b676aae |
|
01-Dec-2020 |
Eran Ben Elisha <eranbe@nvidia.com> |
net/mlx5e: Change skb fifo push/pop API to be used without SQ The skb fifo push/pop API used pre-defined attributes within the mlx5e_txqsq. In order to share the skb fifo API with other non-SQ use cases, change the API input to get newly defined mlx5e_skb_fifo struct. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
4d0b7ef9 |
|
01-Dec-2020 |
Aya Levin <ayal@nvidia.com> |
net/mlx5e: Allow CQ outside of channel context In order to be able to create a CQ outside of a channel context, remove cq->channel direct pointer. This requires adding a direct pointer to channel statistics, netdevice, priv and to mlx5_core in order to support CQs that are a part of mlx5e_channel. In addition, parameters the were previously derived from the channel like napi, NUMA node, channel stats and index are now assembled in struct mlx5e_create_cq_param which is given to mlx5e_open_cq() instead of channel pointer. Generalizing mlx5e_open_cq() allows opening CQ outside of channel context which will be used in following patches in the patch-set. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
5af75c74 |
|
01-Jul-2020 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Enhanced TX MPWQE for SKBs This commit adds support for Enhanced TX MPWQE feature in the regular (SKB) data path. A MPWQE (multi-packet work queue element) can serve multiple packets, reducing the PCI bandwidth on control traffic. Two new stats (tx*_mpwqe_blks and tx*_mpwqe_pkts) are added. The feature is on by default and controlled by the skb_tx_mpwqe private flag. In a MPWQE, eseg is shared among all packets, so eseg-based offloads (IPSEC, GENEVE, checksum) run on a separate eseg that is compared to the eseg of the current MPWQE session to decide if the new packet can be added to the same session. MPWQE is not compatible with certain offloads and features, such as TLS offload, TSO, nonlinear SKBs. If such incompatible features are in use, the driver gracefully falls back to non-MPWQE. This change has no performance impact in TCP single stream test and XDP_TX single stream test. UDP pktgen, 64-byte packets, single stream, MPWQE off: Packet rate: 16.96 Mpps (±0.12 Mpps) -> 17.01 Mpps (±0.20 Mpps) Instructions per packet: 421 -> 429 Cycles per packet: 156 -> 161 Instructions per cycle: 2.70 -> 2.67 UDP pktgen, 64-byte packets, single stream, MPWQE on: Packet rate: 16.96 Mpps (±0.12 Mpps) -> 20.94 Mpps (±0.33 Mpps) Instructions per packet: 421 -> 329 Cycles per packet: 156 -> 123 Instructions per cycle: 2.70 -> 2.67 Enabling MPWQE can reduce PCI bandwidth: PCI Gen2, pktgen at fixed rate of 36864000 pps on 24 CPU cores: Inbound PCI utilization with MPWQE off: 80.3% Inbound PCI utilization with MPWQE on: 59.0% PCI Gen3, pktgen at fixed rate of 56064000 pps on 24 CPU cores: Inbound PCI utilization with MPWQE off: 65.4% Inbound PCI utilization with MPWQE on: 49.3% Enabling MPWQE can also reduce CPU load, increasing the packet rate in case of CPU bottleneck: PCI Gen2, pktgen at full rate on 24 CPU cores: Packet rate with MPWQE off: 37.5 Mpps Packet rate with MPWQE on: 49.0 Mpps PCI Gen3, pktgen at full rate on 24 CPU cores: Packet rate with MPWQE off: 57.0 Mpps Packet rate with MPWQE on: 66.8 Mpps Burst size in all pktgen tests is 32. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64) NIC: Mellanox ConnectX-6 Dx GCC 10.2.0 Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b39fe61e |
|
16-Apr-2020 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Rename xmit-related structs to generalize them As preparation for the upcoming TX MPWQE support for SKBs, rename struct mlx5e_xdp_mpwqe to mlx5e_tx_mpwqe and move it above struct mlx5e_txqsq. This structure will be reused in the regular SQ and in the regular TX data path. Also rename mlx5e_xdp_xmit_data to mlx5e_xmit_data - it will be used in the upcoming TX MPWQE flow. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
530d5ce2 |
|
04-Jun-2020 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Generalize TX MPWQE checks for full session As preparation for the upcoming TX MPWQE for SKBs, create a function (mlx5e_tx_mpwqe_is_full) to check whether an MPWQE session is full. This function will be shared by MPWQE code for XDP and for SKBs. Defines are renamed and moved to make them not XDP-specific. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
338c46c6 |
|
16-Apr-2020 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Support multiple SKBs in a TX WQE TX MPWQE support for SKBs is coming in one of the following patches, and a single MPWQE can send multiple SKBs. This commit prepares the TX path code to handle such cases: 1. An additional FIFO for SKBs is added, just like the FIFO for DMA chunks. 2. struct mlx5e_tx_wqe_info will contain num_fifo_pkts. If a given WQE contains only one packet, num_fifo_pkts will be zero, and the SKB will be stored in mlx5e_tx_wqe_info, as usual. If num_fifo_pkts > 0, the SKB pointer will be NULL, and the SKBs will be stored in the FIFO. This change has no performance impact in TCP single stream test and XDP_TX single stream test. When compiled with a recent GCC, this change shows no visible performance impact on UDP pktgen (burst 32) single stream test either: Packet rate: 16.95 Mpps (±0.15 Mpps) -> 16.96 Mpps (±0.12 Mpps) Instructions per packet: 429 -> 421 Cycles per packet: 160 -> 156 Instructions per cycle: 2.69 -> 2.70 CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64) NIC: Mellanox ConnectX-6 Dx GCC 10.2.0 Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
97e3afd6 |
|
09-Jul-2020 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Unify constants for WQE_EMPTY_DS_COUNT A constant for the number of DS in an empty WQE (i.e. a WQE without data segments) is needed in multiple places (normal TX data path, MPWQE in XDP), but currently we have a constant for XDP and an inline formula in normal TX. This patch introduces a common constant. Additionally, mlx5e_xdp_mpwqe_session_start is converted to use struct assignment, because the code nearby is touched. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8e4b53f6 |
|
14-Feb-2020 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Refactor xmit functions A huge function mlx5e_sq_xmit was split into several to achieve multiple goals: 1. Reuse the code in IPoIB. 2. Better intergrate with TLS, IPSEC, GENEVE and checksum offloads. Now it's possible to reserve space in the WQ before running eseg-based offloads, so: 2.1. It's not needed to copy cseg and eseg after mlx5e_fill_sq_frag_edge anymore. 2.2. mlx5e_txqsq_get_next_pi will be used instead of the legacy mlx5e_fill_sq_frag_edge for better code maintainability and reuse. 3. Prepare for the upcoming TX MPWQE for SKBs. It will intervene after mlx5e_sq_calc_wqe_attr to check if it's possible to use MPWQE, and the code flow will split into two paths: MPWQE and non-MPWQE. Two high-level functions are provided to send packets: * mlx5e_xmit is called by the networking stack, runs offloads and sends the packet. In one of the following patches, MPWQE support will be added to this flow. * mlx5e_sq_xmit_simple is called by the TLS offload, runs only the checksum offload and sends the packet. This change has no performance impact in TCP single stream test and XDP_TX single stream test. When compiled with a recent GCC, this change shows no visible performance impact on UDP pktgen (burst 32) single stream test either: Packet rate: 16.86 Mpps (±0.15 Mpps) -> 16.95 Mpps (±0.15 Mpps) Instructions per packet: 434 -> 429 Cycles per packet: 158 -> 160 Instructions per cycle: 2.75 -> 2.69 CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64) NIC: Mellanox ConnectX-6 Dx GCC 10.2.0 Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d02dfcd5 |
|
08-Sep-2020 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Move mlx5e_tx_wqe_inline_mode to en_tx.c Move mlx5e_tx_wqe_inline_mode from en/txrx.h to en_tx.c as it's only used there. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
47c97e6b |
|
10-May-2020 |
Ron Diskin <rondi@mellanox.com> |
net/mlx5e: Fix multicast counter not up-to-date in "ip -s" Currently the FW does not generate events for counters other than error counters. Unlike ".get_ethtool_stats", ".ndo_get_stats64" (which ip -s uses) might run in atomic context, while the FW interface is non atomic. Thus, 'ip' is not allowed to issue FW commands, so it will only display cached counters in the driver. Add a SW counter (mcast_packets) in the driver to count rx multicast packets. The counter also counts broadcast packets, as we consider it a special case of multicast. Use the counter value when calling "ip -s"/"ifconfig". Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality") Signed-off-by: Ron Diskin <rondi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c8b838d1 |
|
27-Jul-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
net/mlx5: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5d0b8476 |
|
12-Jul-2020 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Use indirect call wrappers for RX post WQEs functions Use the indirect call wrapper API macros for declaration and scope of the RX post WQEs functions. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b307f7f1 |
|
30-Apr-2020 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Move exposure of datapath function to txrx header Move them from the generic header file "en.h", to the datapath header file "txrx.h". Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
de6c6ab7 |
|
30-Apr-2020 |
Aya Levin <ayal@mellanox.com> |
net/mlx5e: Add helper to get the RQ WQE counter Add a helper which retrieves the RQ's WQE counter. Use this helper in the RX reporter diagnose callback. $ devlink health diagnose pci/0000:00:0b.0 reporter rx Common config: RQ: type: 2 stride size: 2048 size: 8 CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 2113 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1032 HW status: 0 channel ix: 1 rqn: 2118 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1036 HW status: 0 Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
fc42d0de |
|
30-Apr-2020 |
Aya Levin <ayal@mellanox.com> |
net/mlx5e: Add helper to get RQ WQE's head Add helper which retrieves the RQ WQE's head. Use this helper in RX reporter diagnose callback. Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5d95c816 |
|
30-Apr-2020 |
Aya Levin <ayal@mellanox.com> |
net/mlx5e: Move RQ helpers to txrx.h Use txrx.h to contain helper function regarding TX/RX. In the coming patches, I will add more RQ helpers. Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b9961af7 |
|
30-Apr-2020 |
Aya Levin <ayal@mellanox.com> |
net/mlx5e: Remove redundant RQ state query When received a CQE error, the driver inspect the syndrome given by the firmware. RQ recovery is initiated only as a result of a fatal syndrome; syndrome which set the RQ into an error state. Hence no need to query the RQ state at the beginning of the recovery process. Add additional debug prints before recovering. Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
0419d8c9 |
|
16-Jun-2020 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: kTLS, Add kTLS RX resync support Implement the RX resync procedure, using the TLS async resync API. The HW offload of TLS decryption in RX side might get out-of-sync due to out-of-order reception of packets. This requires SW intervention to update the HW context and get it back in-sync. Performance: CPU: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz, 24 cores, HT off NIC: ConnectX-6 Dx 100GbE dual port Goodput (app-layer throughput) comparison: +---------------+-------+-------+---------+ | # connections | 1 | 4 | 8 | +---------------+-------+-------+---------+ | SW (Gbps) | 7.26 | 24.70 | 50.30 | +---------------+-------+-------+---------+ | HW (Gbps) | 18.50 | 64.30 | 92.90 | +---------------+-------+-------+---------+ | Speedup | 2.55x | 2.56x | 1.85x * | +---------------+-------+-------+---------+ * After linerate is reached, diff is observed in CPU util. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
1182f365 |
|
28-May-2020 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: kTLS, Add kTLS RX HW offload support Implement driver support for the kTLS RX HW offload feature. Resync support is added in a downstream patch. New offload contexts post their static/progress params WQEs over the per-channel async ICOSQ, protected under a spin-lock. The Channel/RQ is selected according to the socket's rxq index. Feature is OFF by default. Can be turned on by: $ ethtool -K <if> tls-hw-rx-offload on A new TLS-RX workqueue is used to allow asynchronous addition of steering rules, out of the NAPI context. It will be also used in a downstream patch in the resync procedure. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2d1b69ed |
|
25-Jun-2020 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5: kTLS, Improve TLS params layout structures Add explicit WQE segment structures for the TLS static and progress params. According to the HW spec, TISN is not part of the progress params context, take it out of it. Rename the control segment tisn field as it could hold either a TIS or a TIR number. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5ffb4d85 |
|
30-Mar-2020 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Calculate SQ stop room in a robust way Currently, different formulas are used to estimate the space that may be taken by WQEs in the SQ during a single packet transmit. This space is called stop room, and it's checked in the end of packet transmit to find out if the next packet could overflow the SQ. If it could, the driver tells the kernel to stop sending next packets. Many factors affect the stop room: 1. Padding with NOPs to avoid WQEs spanning over page boundaries. 2. Enabled and disabled offloads (TLS, upcoming MPWQE). 3. The maximum size of a WQE. The padding is performed before every WQE if it doesn't fit the current page. The current formula assumes that only one padding will be required per packet, and it doesn't take into account that the WQEs posted during the transmission of a single packet might exceed the page size in very rare circumstances. For example, to hit this condition with 4096-byte pages, TLS offload will have to interrupt an almost-full MPWQE session, be in the resync flow and try to transmit a near to maximum amount of data. To avoid SQ overflows in such rare cases after MPWQE is added, this patch introduces a more robust formula to estimate the stop room. The new formula uses the fact that a WQE of size X will not require more than X-1 WQEBBs of padding. More exact estimations are possible, but they result in much more complex and error-prone code for little gain. Before this patch, the TLS stop room included space for both INNOVA and ConnectX TLS offloads that couldn't run at the same time anyway, so this patch accounts only for the active one. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
28bff095 |
|
16-Dec-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Enhance ICOSQ WQE info fields The same WQE opcode might be used in different ICOSQ flows and WQE types. To have a better distinguishability, replace it with an enum that better indicates the WQE type and flow it is used for. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
05dfd570 |
|
09-Apr-2020 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Take TX WQE info structures out of general EN header Into the txrx header file. The mlx5e_sq_wqe_info structure describes WQE info for the ICOSQ, rename it to better reflect this. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ec9cdca0 |
|
16-Apr-2020 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Unify reserving space for WQEs In our fast-path design, a WQE (Work Queue Element) must not cross the page boundary. To enforce that, for WQEs consisting of more than one BB (Basic Block), the driver checks the available contiguous space in the WQ in advance, and if it's not enough, it pads it with NOPs. This patch modifies the code that calculates the position of next WQE, considering the padding, and prepares the WQE. This code is common for all SQ types. In this patch it's reorganized in a way that makes the usage pattern unified for all SQ types, and makes the implementations self-contained and look almost the same, preparing the repeating code to further attempts to deduplicate it. One place is left as is: mlx5e_sq_xmit and mlx5e_fill_sq_frag_edge call inside, because it is special in a way that it may also copy WQE's cseg and eseg when reserving space. This will be eliminated in one of the following patches, and this place will be converted to the new approach, too. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
fed0c6cf |
|
15-Nov-2019 |
Maxim Mikityanskiy <maximmi@mellanox.com> |
net/mlx5e: Fetch WQE: reuse code and enforce typing There are multiple functions mlx5{e,i}_*_fetch_wqe that contain the same code, that is repeated, because they operate on different SQ struct types. mlx5e_sq_fetch_wqe also returns void *, instead of the concrete WQE type. This commit generalizes the fetch WQE operation by putting this code into a single function. To simplify calls of the generic function in concrete use cases, macros are provided that substitute the right WQE size and cast the return type. Before this patch, fetch_wqe used to calculate pi itself, but the value was often known to the caller. This calculation is moved outside to eliminate this unnecessary step and prepare for the fill_frag_edge refactoring in the next patch. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f1b95753 |
|
09-Feb-2020 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: TX, Generalise code and usage of error CQE dump Error CQE was dumped only for TXQ SQs. Generalise the function, and add usage for error completions on ICO SQs and XDP SQs. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
82fe2996 |
|
17-Feb-2020 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Set of completion request bit should not clear other adjacent bits In notify HW (ring doorbell) flow, we set the bit to request a completion on the TX descriptor. When doing so, we should not unset other bits in the same byte. Currently, this does not fix a real issue, as we still don't have a flow where both MLX5_WQE_CTRL_CQ_UPDATE and any adjacent bit are set together. Fixes: 542578c67936 ("net/mlx5e: Move helper functions to a new txrx datapath header") Fixes: 864b2d715300 ("net/mlx5e: Generalize tx helper functions for different SQ types") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
39369fd5 |
|
11-Mar-2020 |
Aya Levin <ayal@mellanox.com> |
net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset When resetting the RQ (moving RQ state from RST to RDY), the driver resets the WQ's SW metadata. In striding RQ mode, we maintain a field that reflects the actual expected WQ head (including in progress WQEs posted to the ICOSQ). It was mistakenly not reset together with the WQ. Fix this here. Fixes: 8276ea1353a4 ("net/mlx5e: Report and recover from CQE with error on RQ") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5ee090ed |
|
09-Dec-2019 |
Aya Levin <ayal@mellanox.com> |
net/mlx5e: Reset RQ doorbell counter before moving RQ state from RST to RDY Initialize RQ doorbell counters to zero prior to moving an RQ from RST to RDY state. Per HW spec, when RQ is back to RDY state, the descriptor ID on the completion is reset. The doorbell record must comply. Fixes: 8276ea1353a4 ("net/mlx5e: Report and recover from CQE with error on RQ") Signed-off-by: Aya Levin <ayal@mellanox.com> Reported-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
84d1bb2b |
|
07-Oct-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: kTLS, Limit DUMP wqe size HW expects the data size in DUMP WQEs to be up to MTU. Make sure they are in range. We elevate the frag page refcount by 'n-1', in addition to the one obtained in tx_sync_info_get(), having an overall of 'n' references. We bulk increments by using a single page_ref_add() command, to optimize perfermance. The refcounts are released one by one, by the corresponding completions. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
9b1fef2f |
|
01-Sep-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: kTLS, Size of a Dump WQE is fixed No Eth segment, so no dynamic inline headers. The size of a Dump WQE is fixed, use constants and remove unnecessary checks. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
500f36a4 |
|
16-Sep-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Tx, Zero-memset WQE info struct upon update Not all fields of WQE info are being written in the function, having some leftovers from previous rounds. Zero-memset it upon update. Particularly, not nullifying the wi->resync_dump_frag field will cause double free of the kTLS DUMPed frags. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b431302e |
|
30-Jun-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Tx, Soften inline mode VLAN dependencies If capable, use zero inline mode in TX WQE for non-VLAN packets. For VLAN ones, keep the enforcement of at least L2 inline mode, unless the WQE VLAN insertion offload cap is on. Performance: Tested single core packet rate of 64Bytes. NIC: ConnectX-5 CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz pktgen: Before: 12.46 Mpps After: 14.65 Mpps (+17.5%) XDP_TX: The MPWQE flow is not affected, as it already has this optimization. So we test with priv-flag xdp_tx_mpwqe: off. Before: 9.90 Mpps After: 10.20 Mpps (+3%) Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Tested-by: Noam Stolero <noams@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
68865419 |
|
11-Jul-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Tx, Strict the room needed for SQ edge NOPs We use NOPs to populate the WQ fragment edge if the WQE does not fit in frag, to avoid WQEs crossing a page boundary (or wrap-around the WQ). The upper bound on the needed number of NOPs is one WQEBB less than the largest possible WQE, for otherwise the WQE would certainly fit. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d2ead1f3 |
|
05-Jul-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Add kTLS TX HW offload support Add support for transmit side kernel-TLS acceleration. Offload the crypto encryption to HW. Per TLS connection: - Use a separate TIS to maintain the HW context. - Use a separate encryption key. - Maintain static and progress HW contexts by posting the proper WQEs at creation time, or upon resync. - Use a special DUMP opcode to replay the previous frags and sync the HW context. To make sure the SQ is able to serve an xmit request, increase SQ stop room to cover: - static params WQE, - progress params WQE, and - resync DUMP per frag. Currently supporting TLS 1.2, and key size 128bit. Tested over SimX simulator. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
37badd15 |
|
05-Jul-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Introduce a fenced NOP WQE posting function Similar to the existing mlx5e_post_nop(), but marks a fence in the WQE control segment. Added as a separate new function to not hurt the performance of the common case. To be used in a downstream patch of the series. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
01614d4f |
|
05-Jul-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Tx, Unconstify SQ stop room Use an SQ field for stop_room, and use the larger value only if TLS is supported. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fd1b2259 |
|
05-Jul-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Tx, Make SQ WQE fetch function type generic Change mlx5e_sq_fetch_wqe to be agnostic to the Work Queue Element (WQE) type. Before this patch, it was specific for struct mlx5e_tx_wqe. In order to allow the change, the function now returns the generic void pointer, and gets the WQE size to do the zero memset. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
740114a8 |
|
05-Jul-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Tx, Enforce L4 inline copy when needed When ctrl->tisn field exists, this indicates an operation (HW offload) on the TCP payload. For such WQEs, inline the headers up to L4. This is in preparation for kTLS HW offload support, added in a downstream patch. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
542578c6 |
|
05-Jul-2019 |
Tariq Toukan <tariqt@mellanox.com> |
net/mlx5e: Move helper functions to a new txrx datapath header Take datapath helper functions to a new header file en/txrx.h. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|