#
2e7d3b67 |
|
01-Feb-2024 |
Ivan Vecera <ivecera@redhat.com> |
net: atlantic: Fix DMA mapping for PTP hwts ring Function aq_ring_hwts_rx_alloc() maps extra AQ_CFG_RXDS_DEF bytes for PTP HWTS ring but then generic aq_ring_free() does not take this into account. Create and use a specific function to free HWTS ring to fix this issue. Trace: [ 215.351607] ------------[ cut here ]------------ [ 215.351612] DMA-API: atlantic 0000:4b:00.0: device driver frees DMA memory with different size [device address=0x00000000fbdd0000] [map size=34816 bytes] [unmap size=32768 bytes] [ 215.351635] WARNING: CPU: 33 PID: 10759 at kernel/dma/debug.c:988 check_unmap+0xa6f/0x2360 ... [ 215.581176] Call Trace: [ 215.583632] <TASK> [ 215.585745] ? show_trace_log_lvl+0x1c4/0x2df [ 215.590114] ? show_trace_log_lvl+0x1c4/0x2df [ 215.594497] ? debug_dma_free_coherent+0x196/0x210 [ 215.599305] ? check_unmap+0xa6f/0x2360 [ 215.603147] ? __warn+0xca/0x1d0 [ 215.606391] ? check_unmap+0xa6f/0x2360 [ 215.610237] ? report_bug+0x1ef/0x370 [ 215.613921] ? handle_bug+0x3c/0x70 [ 215.617423] ? exc_invalid_op+0x14/0x50 [ 215.621269] ? asm_exc_invalid_op+0x16/0x20 [ 215.625480] ? check_unmap+0xa6f/0x2360 [ 215.629331] ? mark_lock.part.0+0xca/0xa40 [ 215.633445] debug_dma_free_coherent+0x196/0x210 [ 215.638079] ? __pfx_debug_dma_free_coherent+0x10/0x10 [ 215.643242] ? slab_free_freelist_hook+0x11d/0x1d0 [ 215.648060] dma_free_attrs+0x6d/0x130 [ 215.651834] aq_ring_free+0x193/0x290 [atlantic] [ 215.656487] aq_ptp_ring_free+0x67/0x110 [atlantic] ... [ 216.127540] ---[ end trace 6467e5964dd2640b ]--- [ 216.132160] DMA-API: Mapped at: [ 216.132162] debug_dma_alloc_coherent+0x66/0x2f0 [ 216.132165] dma_alloc_attrs+0xf5/0x1b0 [ 216.132168] aq_ring_hwts_rx_alloc+0x150/0x1f0 [atlantic] [ 216.132193] aq_ptp_ring_alloc+0x1bb/0x540 [atlantic] [ 216.132213] aq_nic_init+0x4a1/0x760 [atlantic] Fixes: 94ad94558b0f ("net: aquantia: add PTP rings infrastructure") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240201094752.883026-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
b3cb7a83 |
|
13-Dec-2023 |
Igor Russkikh <irusskikh@marvell.com> |
net: atlantic: eliminate double free in error handling logic Driver has a logic leak in ring data allocation/free, where aq_ring_free could be called multiple times on same ring, if system is under stress and got memory allocation error. Ring pointer was used as an indicator of failure, but this is not correct since only ring data is allocated/deallocated. Ring itself is an array member. Changing ring allocation functions to return error code directly. This simplifies error handling and eliminates aq_ring_free on higher layer. Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Link: https://lore.kernel.org/r/20231213095044.23146-1-irusskikh@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
45638f01 |
|
17-Apr-2022 |
Taehee Yoo <ap420073@gmail.com> |
net: atlantic: Implement .ndo_xdp_xmit handler aq_xdp_xmit() is the callback function of .ndo_xdp_xmit. It internally calls aq_nic_xmit_xdpf() to send packet. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
26efaef7 |
|
17-Apr-2022 |
Taehee Yoo <ap420073@gmail.com> |
net: atlantic: Implement xdp data plane It supports XDP_PASS, XDP_DROP and multi buffer. The new function aq_nic_xmit_xdpf() is used to send packet with xdp_frame and internally it calls aq_nic_map_xdp(). AQC chip supports 32 multi-queues and 8 vectors(irq). there are two option 1. under 8 cores and 4 tx queues per core. 2. under 4 cores and 8 tx queues per core. Like ixgbe, these tx queues can be used only for XDP_TX, XDP_REDIRECT queue. If so, no tx_lock is needed. But this patchset doesn't use this strategy because getting hardware tx queue index cost is too high. So, tx_lock is used in the aq_nic_xmit_xdpf(). single-core, single queue, 80% cpu utilization. 30.75% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx 10.35% [kernel] [k] aq_hw_read_reg <---------- here 4.38% [kernel] [k] get_page_from_freelist single-core, 8 queues, 100% cpu utilization, half PPS. 45.56% [kernel] [k] aq_hw_read_reg <---------- here 17.58% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx 4.72% [kernel] [k] hw_atl_b0_hw_ring_rx_receive The new function __aq_ring_xdp_clean() is a xdp rx handler and this is called only when XDP is attached. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0d14657f |
|
17-Apr-2022 |
Taehee Yoo <ap420073@gmail.com> |
net: atlantic: Implement xdp control plane aq_xdp() is a xdp setup callback function for Atlantic driver. When XDP is attached or detached, the device will be restarted because it uses different headroom, tailroom, and page order value. If XDP enabled, it switches default page order value from 0 to 2. Because the default maximum frame size is still 2K and it needs additional area for headroom and tailroom. The total size(headroom + frame size + tailroom) is 2624. So, 1472Bytes will be always wasted for every frame. But when order-2 is used, these pages can be used 6 times with flip strategy. It means only about 106Bytes per frame will be wasted. Also, It supports xdp fragment feature. MTU can be 16K if xdp prog supports xdp fragment. If not, MTU can not exceed 2K - ETH_HLEN - ETH_FCS. And a static key is added and It will be used to call the xdp_clean handler in ->poll(). data plane implementation will be contained the followed patch. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aa7e17a3 |
|
20-Jul-2020 |
Dmitry Bogdanov <dbogdanov@marvell.com> |
net: atlantic: additional per-queue stats This patch adds additional per-queue stats, these could be useful for debugging and diagnostics. Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d7d8bb92 |
|
20-Jul-2020 |
Mark Starovoytov <mstarovo@pm.me> |
net: atlantic: use u64_stats_update_* to protect access to 64-bit stats This patch adds u64_stats_update_* usage to protect access to 64-bit stats, where necessary. This is necessary for per-ring stats, because they are updated by the driver directly, so there is a possibility for a partial read. Other stats require no additional protection, e.g.: * all MACSec stats are fetched directly from HW (under semaphore); * nic/ndev stats (aq_stats_s) are fetched directly from FW (under mutex). Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
508f2e3d |
|
20-Jul-2020 |
Mark Starovoytov <mstarovo@pm.me> |
net: atlantic: split rx and tx per-queue stats This patch splits rx and tx per-queue stats. This change simplifies the follow-up introduction of PTP stats and u64_stats_update_* usage. Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
15beab0a |
|
14-Feb-2020 |
Dmitry Bezrukov <dbezrukov@marvell.com> |
net: atlantic: checksum compat issue Yet another checksum offload compatibility issue was found. The known issue is that AQC HW marks tcp packets with 0xFFFF checksum as invalid (1). This is workarounded in driver, passing all the suspicious packets up to the stack for further csum validation. Another HW problem (2) is that it hides invalid csum of LRO aggregated packets inside of the individual descriptors. That was workarounded by forced scan of all LRO descriptors for checksum errors. However the scan logic was joint for both LRO and multi-descriptor packets (jumbos). And this causes the issue. We have to drop LRO packets with the detected bad checksum because of (2), but we have to pass jumbo packets to stack because of (1). When using windows tcp partner with jumbo frames but with LSO disabled driver discards such frames as bad checksummed. But only LRO frames should be dropped, not jumbos. On such a configurations tcp stream have a chance of drops and stucks. (1) 76f254d4afe2 ("net: aquantia: tcp checksum 0xffff being handled incorrectly") (2) d08b9a0a3ebd ("net: aquantia: do not pass lro session with invalid tcp checksum") Fixes: d08b9a0a3ebd ("net: aquantia: do not pass lro session with invalid tcp checksum") Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
822cd114 |
|
07-Nov-2019 |
Igor Russkikh <irusskikh@marvell.com> |
net: atlantic: implement UDP GSO offload atlantic hardware does support UDP hardware segmentation offload. This allows user to specify one large contiguous buffer with data which then will be split automagically into multiple UDP packets of specified size. Bulk sending of large UDP streams lowers CPU usage and increases bandwidth. We did estimations both with udpgso_bench_tx test tool and with modified iperf3 measurement tool (4 streams, multithread, 200b packet size) over AQC<->AQC 10G link. Flow control is disabled to prevent RX side impact on measurements. No UDP GSO: iperf3 -c 10.0.1.2 -u -b0 -l 200 -P4 --multithread UDP GSO: iperf3 -c 10.0.1.2 -u -b0 -l 12600 --udp-lso 200 -P4 --multithread Mode CPU iperf speed Line speed Packets per second ------------------------------------------------------------- NO UDP GSO 350% 3.07 Gbps 3.8 Gbps 1,919,419 SW UDP GSO 200% 5.55 Gbps 6.4 Gbps 3,286,144 HW UDP GSO 90% 6.80 Gbps 8.4 Gbps 4,273,117 Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
04a18399 |
|
22-Oct-2019 |
Egor Pomozov <epomozov@marvell.com> |
net: aquantia: implement data PTP datapath Here we do alloc/free IRQs for PTP rings. We also implement processing of PTP packets on TX and RX sides. Signed-off-by: Egor Pomozov <epomozov@marvell.com> Co-developed-by: Sergey Samoilenko <sergey.samoilenko@aquantia.com> Signed-off-by: Sergey Samoilenko <sergey.samoilenko@aquantia.com> Co-developed-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
94ad9455 |
|
22-Oct-2019 |
Egor Pomozov <epomozov@marvell.com> |
net: aquantia: add PTP rings infrastructure Add implementations of PTP rings alloc/free. PTP desing on this device uses two separate rings on a separate traffic class for traffic rx/tx. Third ring (hwts) is not a traffic ring, but is used only to receive timestamps of the transmitted packets. Signed-off-by: Egor Pomozov <epomozov@marvell.com> Co-developed-by: Sergey Samoilenko <sergey.samoilenko@aquantia.com> Signed-off-by: Sergey Samoilenko <sergey.samoilenko@aquantia.com> Co-developed-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d3ed7c5c |
|
25-Jun-2019 |
Igor Russkikh <Igor.Russkikh@aquantia.com> |
net: aquantia: adding fields and device features for vlan offload Updating features and vlan_features with vlan HW offload. Added vlan_tag fields to rx/tx ring_buff to track vlan related data. Tested-by: Nikita Danilov <ndanilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75a6faf6 |
|
01-Jun-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 422 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms and conditions of the gnu general public license version 2 as published by the free software foundation extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 101 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190113.822954939@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
46f4c29d |
|
23-Mar-2019 |
Igor Russkikh <Igor.Russkikh@aquantia.com> |
net: aquantia: optimize rx performance by page reuse strategy We introduce internal aq_rxpage wrapper over regular page where extra field is tracked: rxpage offset inside of allocated page. This offset allows to reuse one page for multiple packets. When needed (for example with large frames processing), allocated pageorder could be customized. This gives even larger page reuse efficiency. page_ref_count is used to track page users. If during rx refill underlying page has users, we increase pg_off by rx frame size thus the top half of the page is reused. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b647d398 |
|
20-Mar-2018 |
Igor Russkikh <igor.russkikh@aquantia.com> |
net: aquantia: Add tx clean budget and valid budget handling logic We should report to napi full budget only when we have more job to do. Before this fix, on any tx queue cleanup we forced napi to do poll again. Thats a waste of cpu resources and caused storming with napi polls when there was at least one tx on each interrupt. With this fix we report full budget only when there is more job on TX to do. Or, as before, when rx budget was fully consumed. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
db550615 |
|
15-Jan-2018 |
Igor Russkikh <igor.russkikh@aquantia.com> |
net: aquantia: Eliminate aq_nic structure abstraction aq_nic_s was hidden in aq_nic_internal.h, that made it difficult to access nic fields and structures from other modules. This change moves aq_nic_s struct into aq_nic.h and thus makes it available to other driver modules, mainly pci module and hw related module. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
78f5193d |
|
15-Jan-2018 |
Igor Russkikh <igor.russkikh@aquantia.com> |
net: aquantia: Cleanup status flags accesses Usage of aq_obj_s structure is noop, here we remove it replacing access to flags filed directly. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c7545689 |
|
25-Sep-2017 |
Pavel Belous <pavel.belous@aquantia.com> |
atlantic: fix iommu errors Call skb_frag_dma_map multiple times if tx length is greater than device max and avoid processing tx ring until entire packet has been sent. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: Pavel Belous <pavel.belous@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3aec6412 |
|
25-Sep-2017 |
Igor Russkikh <igor.russkikh@aquantia.com> |
aquantia: Fix Tx queue hangups Driver did a poor job in managing its Tx queues: Sometimes it could stop tx queues due to link down condition in aq_nic_xmit - but never waked up them. That led to Tx path total suspend. This patch fixes this and improves generic queue management: - introduces queue restart counter - uses generic netif_ interface to disable and enable tx path - refactors link up/down condition and introduces dmesg log event when link changes. - introduces new constant for minimum descriptors count required for queue wakeup Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a54df682 |
|
03-Aug-2017 |
Pavel Belous <pavel.belous@aquantia.com> |
aquantia: Switch to use napi_gro_receive Add support for GRO (generic receive offload) for aQuantia Atlantic driver. This results in a perfomance improvement when GRO is enabled. Signed-off-by: Pavel Belous <pavel.belous@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
386aff88 |
|
23-Mar-2017 |
Pavel Belous <pavel.belous@aquantia.com> |
net:ethernet:aquantia: Fix for LSO with IPv6. Fix Context Command bit: L3 type = "0" for IPv4, "1" for IPv6. Fixes: bab6de8fd180 ("net: ethernet: aquantia: Atlantic A0 and B0 specific functions.") Signed-off-by: Pavel Belous <pavel.belous@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e399553d |
|
20-Feb-2017 |
Pavel Belous <pavel.belous@aquantia.com> |
net: ethernet: aquantia: Copying tx buffers is not needed. This fix removes copying of tx biffers. Now we use ring->buff_fing directly. Signed-off-by: Pavel Belous <pavel.belous@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb36bedf |
|
17-Feb-2017 |
Lino Sanfilippo <LinoSanfilippo@gmx.de> |
net: aquantia: remove function aq_ring_tx_deinit Both functions aq_ring_rx_deinit() and aq_ring_tx_clean() are almost identical aside from an additional check in the latter. Move that check from the function into its caller and replace aq_ring_rx_deinit() with aq_ring_rx_deinit(). By doing this also adjust the functions return value from int to void since it can never fail. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Tested-by: Pavel Belous <pavel.belous@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
018423e9 |
|
23-Jan-2017 |
David VomLehn <vomlehn@texas.net> |
net: ethernet: aquantia: Add ring support code Add code to support the transmit and receive ring buffers. Signed-off-by: Alexander Loktionov <Alexander.Loktionov@aquantia.com> Signed-off-by: Dmitrii Tarakanov <Dmitrii.Tarakanov@aquantia.com> Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com> Signed-off-by: Dmitry Bezrukov <Dmitry.Bezrukov@aquantia.com> Signed-off-by: David M. VomLehn <vomlehn@texas.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|