History log of /linux-master/drivers/net/ethernet/aquantia/atlantic/aq_ring.h
Revision Date Author Comments
# 2e7d3b67 01-Feb-2024 Ivan Vecera <ivecera@redhat.com>

net: atlantic: Fix DMA mapping for PTP hwts ring

Function aq_ring_hwts_rx_alloc() maps extra AQ_CFG_RXDS_DEF bytes
for PTP HWTS ring but then generic aq_ring_free() does not take this
into account.
Create and use a specific function to free HWTS ring to fix this
issue.

Trace:
[ 215.351607] ------------[ cut here ]------------
[ 215.351612] DMA-API: atlantic 0000:4b:00.0: device driver frees DMA memory with different size [device address=0x00000000fbdd0000] [map size=34816 bytes] [unmap size=32768 bytes]
[ 215.351635] WARNING: CPU: 33 PID: 10759 at kernel/dma/debug.c:988 check_unmap+0xa6f/0x2360
...
[ 215.581176] Call Trace:
[ 215.583632] <TASK>
[ 215.585745] ? show_trace_log_lvl+0x1c4/0x2df
[ 215.590114] ? show_trace_log_lvl+0x1c4/0x2df
[ 215.594497] ? debug_dma_free_coherent+0x196/0x210
[ 215.599305] ? check_unmap+0xa6f/0x2360
[ 215.603147] ? __warn+0xca/0x1d0
[ 215.606391] ? check_unmap+0xa6f/0x2360
[ 215.610237] ? report_bug+0x1ef/0x370
[ 215.613921] ? handle_bug+0x3c/0x70
[ 215.617423] ? exc_invalid_op+0x14/0x50
[ 215.621269] ? asm_exc_invalid_op+0x16/0x20
[ 215.625480] ? check_unmap+0xa6f/0x2360
[ 215.629331] ? mark_lock.part.0+0xca/0xa40
[ 215.633445] debug_dma_free_coherent+0x196/0x210
[ 215.638079] ? __pfx_debug_dma_free_coherent+0x10/0x10
[ 215.643242] ? slab_free_freelist_hook+0x11d/0x1d0
[ 215.648060] dma_free_attrs+0x6d/0x130
[ 215.651834] aq_ring_free+0x193/0x290 [atlantic]
[ 215.656487] aq_ptp_ring_free+0x67/0x110 [atlantic]
...
[ 216.127540] ---[ end trace 6467e5964dd2640b ]---
[ 216.132160] DMA-API: Mapped at:
[ 216.132162] debug_dma_alloc_coherent+0x66/0x2f0
[ 216.132165] dma_alloc_attrs+0xf5/0x1b0
[ 216.132168] aq_ring_hwts_rx_alloc+0x150/0x1f0 [atlantic]
[ 216.132193] aq_ptp_ring_alloc+0x1bb/0x540 [atlantic]
[ 216.132213] aq_nic_init+0x4a1/0x760 [atlantic]

Fixes: 94ad94558b0f ("net: aquantia: add PTP rings infrastructure")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240201094752.883026-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# b3cb7a83 13-Dec-2023 Igor Russkikh <irusskikh@marvell.com>

net: atlantic: eliminate double free in error handling logic

Driver has a logic leak in ring data allocation/free,
where aq_ring_free could be called multiple times on same ring,
if system is under stress and got memory allocation error.

Ring pointer was used as an indicator of failure, but this is
not correct since only ring data is allocated/deallocated.
Ring itself is an array member.

Changing ring allocation functions to return error code directly.
This simplifies error handling and eliminates aq_ring_free
on higher layer.

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20231213095044.23146-1-irusskikh@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 45638f01 17-Apr-2022 Taehee Yoo <ap420073@gmail.com>

net: atlantic: Implement .ndo_xdp_xmit handler

aq_xdp_xmit() is the callback function of .ndo_xdp_xmit.
It internally calls aq_nic_xmit_xdpf() to send packet.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 26efaef7 17-Apr-2022 Taehee Yoo <ap420073@gmail.com>

net: atlantic: Implement xdp data plane

It supports XDP_PASS, XDP_DROP and multi buffer.

The new function aq_nic_xmit_xdpf() is used to send packet with
xdp_frame and internally it calls aq_nic_map_xdp().

AQC chip supports 32 multi-queues and 8 vectors(irq).
there are two option
1. under 8 cores and 4 tx queues per core.
2. under 4 cores and 8 tx queues per core.

Like ixgbe, these tx queues can be used only for XDP_TX, XDP_REDIRECT
queue. If so, no tx_lock is needed.
But this patchset doesn't use this strategy because getting hardware tx
queue index cost is too high.
So, tx_lock is used in the aq_nic_xmit_xdpf().

single-core, single queue, 80% cpu utilization.

30.75% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx
10.35% [kernel] [k] aq_hw_read_reg <---------- here
4.38% [kernel] [k] get_page_from_freelist

single-core, 8 queues, 100% cpu utilization, half PPS.

45.56% [kernel] [k] aq_hw_read_reg <---------- here
17.58% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx
4.72% [kernel] [k] hw_atl_b0_hw_ring_rx_receive

The new function __aq_ring_xdp_clean() is a xdp rx handler and this is
called only when XDP is attached.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0d14657f 17-Apr-2022 Taehee Yoo <ap420073@gmail.com>

net: atlantic: Implement xdp control plane

aq_xdp() is a xdp setup callback function for Atlantic driver.
When XDP is attached or detached, the device will be restarted because
it uses different headroom, tailroom, and page order value.

If XDP enabled, it switches default page order value from 0 to 2.
Because the default maximum frame size is still 2K and it needs
additional area for headroom and tailroom.
The total size(headroom + frame size + tailroom) is 2624.
So, 1472Bytes will be always wasted for every frame.
But when order-2 is used, these pages can be used 6 times
with flip strategy.
It means only about 106Bytes per frame will be wasted.

Also, It supports xdp fragment feature.
MTU can be 16K if xdp prog supports xdp fragment.
If not, MTU can not exceed 2K - ETH_HLEN - ETH_FCS.

And a static key is added and It will be used to call the xdp_clean
handler in ->poll(). data plane implementation will be contained
the followed patch.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aa7e17a3 20-Jul-2020 Dmitry Bogdanov <dbogdanov@marvell.com>

net: atlantic: additional per-queue stats

This patch adds additional per-queue stats, these could
be useful for debugging and diagnostics.

Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d7d8bb92 20-Jul-2020 Mark Starovoytov <mstarovo@pm.me>

net: atlantic: use u64_stats_update_* to protect access to 64-bit stats

This patch adds u64_stats_update_* usage to protect access to 64-bit stats,
where necessary.

This is necessary for per-ring stats, because they are updated by the
driver directly, so there is a possibility for a partial read.

Other stats require no additional protection, e.g.:
* all MACSec stats are fetched directly from HW (under semaphore);
* nic/ndev stats (aq_stats_s) are fetched directly from FW (under mutex).

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 508f2e3d 20-Jul-2020 Mark Starovoytov <mstarovo@pm.me>

net: atlantic: split rx and tx per-queue stats

This patch splits rx and tx per-queue stats.
This change simplifies the follow-up introduction of PTP stats and
u64_stats_update_* usage.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 15beab0a 14-Feb-2020 Dmitry Bezrukov <dbezrukov@marvell.com>

net: atlantic: checksum compat issue

Yet another checksum offload compatibility issue was found.

The known issue is that AQC HW marks tcp packets with 0xFFFF checksum
as invalid (1). This is workarounded in driver, passing all the suspicious
packets up to the stack for further csum validation.

Another HW problem (2) is that it hides invalid csum of LRO aggregated
packets inside of the individual descriptors. That was workarounded
by forced scan of all LRO descriptors for checksum errors.

However the scan logic was joint for both LRO and multi-descriptor
packets (jumbos). And this causes the issue.

We have to drop LRO packets with the detected bad checksum
because of (2), but we have to pass jumbo packets to stack because of (1).

When using windows tcp partner with jumbo frames but with LSO disabled
driver discards such frames as bad checksummed. But only LRO frames
should be dropped, not jumbos.

On such a configurations tcp stream have a chance of drops and stucks.

(1) 76f254d4afe2 ("net: aquantia: tcp checksum 0xffff being handled incorrectly")
(2) d08b9a0a3ebd ("net: aquantia: do not pass lro session with invalid tcp checksum")

Fixes: d08b9a0a3ebd ("net: aquantia: do not pass lro session with invalid tcp checksum")
Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 822cd114 07-Nov-2019 Igor Russkikh <irusskikh@marvell.com>

net: atlantic: implement UDP GSO offload

atlantic hardware does support UDP hardware segmentation offload.
This allows user to specify one large contiguous buffer with data
which then will be split automagically into multiple UDP packets
of specified size.

Bulk sending of large UDP streams lowers CPU usage and increases
bandwidth.

We did estimations both with udpgso_bench_tx test tool and with modified
iperf3 measurement tool (4 streams, multithread, 200b packet size)
over AQC<->AQC 10G link. Flow control is disabled to prevent RX side
impact on measurements.

No UDP GSO:
iperf3 -c 10.0.1.2 -u -b0 -l 200 -P4 --multithread
UDP GSO:
iperf3 -c 10.0.1.2 -u -b0 -l 12600 --udp-lso 200 -P4 --multithread

Mode CPU iperf speed Line speed Packets per second
-------------------------------------------------------------
NO UDP GSO 350% 3.07 Gbps 3.8 Gbps 1,919,419
SW UDP GSO 200% 5.55 Gbps 6.4 Gbps 3,286,144
HW UDP GSO 90% 6.80 Gbps 8.4 Gbps 4,273,117

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 04a18399 22-Oct-2019 Egor Pomozov <epomozov@marvell.com>

net: aquantia: implement data PTP datapath

Here we do alloc/free IRQs for PTP rings.
We also implement processing of PTP packets on TX and RX sides.

Signed-off-by: Egor Pomozov <epomozov@marvell.com>
Co-developed-by: Sergey Samoilenko <sergey.samoilenko@aquantia.com>
Signed-off-by: Sergey Samoilenko <sergey.samoilenko@aquantia.com>
Co-developed-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 94ad9455 22-Oct-2019 Egor Pomozov <epomozov@marvell.com>

net: aquantia: add PTP rings infrastructure

Add implementations of PTP rings alloc/free.

PTP desing on this device uses two separate rings on a separate traffic
class for traffic rx/tx.

Third ring (hwts) is not a traffic ring, but is used only to receive timestamps
of the transmitted packets.

Signed-off-by: Egor Pomozov <epomozov@marvell.com>
Co-developed-by: Sergey Samoilenko <sergey.samoilenko@aquantia.com>
Signed-off-by: Sergey Samoilenko <sergey.samoilenko@aquantia.com>
Co-developed-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d3ed7c5c 25-Jun-2019 Igor Russkikh <Igor.Russkikh@aquantia.com>

net: aquantia: adding fields and device features for vlan offload

Updating features and vlan_features with vlan HW offload.
Added vlan_tag fields to rx/tx ring_buff to track vlan related data.

Tested-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 75a6faf6 01-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 422

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 101 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190113.822954939@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 46f4c29d 23-Mar-2019 Igor Russkikh <Igor.Russkikh@aquantia.com>

net: aquantia: optimize rx performance by page reuse strategy

We introduce internal aq_rxpage wrapper over regular page
where extra field is tracked: rxpage offset inside of allocated page.

This offset allows to reuse one page for multiple packets.
When needed (for example with large frames processing), allocated
pageorder could be customized. This gives even larger page reuse
efficiency.

page_ref_count is used to track page users. If during rx refill
underlying page has users, we increase pg_off by rx frame size
thus the top half of the page is reused.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b647d398 20-Mar-2018 Igor Russkikh <igor.russkikh@aquantia.com>

net: aquantia: Add tx clean budget and valid budget handling logic

We should report to napi full budget only when we have more job to do.
Before this fix, on any tx queue cleanup we forced napi to do poll again.
Thats a waste of cpu resources and caused storming with napi polls when
there was at least one tx on each interrupt.

With this fix we report full budget only when there is more job on TX
to do. Or, as before, when rx budget was fully consumed.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# db550615 15-Jan-2018 Igor Russkikh <igor.russkikh@aquantia.com>

net: aquantia: Eliminate aq_nic structure abstraction

aq_nic_s was hidden in aq_nic_internal.h, that made it difficult to access
nic fields and structures from other modules.
This change moves aq_nic_s struct into aq_nic.h and thus makes it available
to other driver modules, mainly pci module and hw related module.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 78f5193d 15-Jan-2018 Igor Russkikh <igor.russkikh@aquantia.com>

net: aquantia: Cleanup status flags accesses

Usage of aq_obj_s structure is noop, here we remove it
replacing access to flags filed directly.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c7545689 25-Sep-2017 Pavel Belous <pavel.belous@aquantia.com>

atlantic: fix iommu errors

Call skb_frag_dma_map multiple times if tx length is greater than
device max and avoid processing tx ring until entire packet has been
sent.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3aec6412 25-Sep-2017 Igor Russkikh <igor.russkikh@aquantia.com>

aquantia: Fix Tx queue hangups

Driver did a poor job in managing its Tx queues: Sometimes it could stop
tx queues due to link down condition in aq_nic_xmit - but never waked up
them. That led to Tx path total suspend.
This patch fixes this and improves generic queue management:
- introduces queue restart counter
- uses generic netif_ interface to disable and enable tx path
- refactors link up/down condition and introduces dmesg log event when
link changes.
- introduces new constant for minimum descriptors count required for queue
wakeup

Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a54df682 03-Aug-2017 Pavel Belous <pavel.belous@aquantia.com>

aquantia: Switch to use napi_gro_receive

Add support for GRO (generic receive offload) for aQuantia Atlantic driver.
This results in a perfomance improvement when GRO is enabled.

Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 386aff88 23-Mar-2017 Pavel Belous <pavel.belous@aquantia.com>

net:ethernet:aquantia: Fix for LSO with IPv6.

Fix Context Command bit: L3 type = "0" for IPv4, "1" for IPv6.

Fixes: bab6de8fd180 ("net: ethernet: aquantia:
Atlantic A0 and B0 specific functions.")

Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e399553d 20-Feb-2017 Pavel Belous <pavel.belous@aquantia.com>

net: ethernet: aquantia: Copying tx buffers is not needed.

This fix removes copying of tx biffers.
Now we use ring->buff_fing directly.

Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eb36bedf 17-Feb-2017 Lino Sanfilippo <LinoSanfilippo@gmx.de>

net: aquantia: remove function aq_ring_tx_deinit

Both functions aq_ring_rx_deinit() and aq_ring_tx_clean() are almost
identical aside from an additional check in the latter.
Move that check from the function into its caller and replace
aq_ring_rx_deinit() with aq_ring_rx_deinit().

By doing this also adjust the functions return value from int to void
since it can never fail.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Tested-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 018423e9 23-Jan-2017 David VomLehn <vomlehn@texas.net>

net: ethernet: aquantia: Add ring support code

Add code to support the transmit and receive ring buffers.

Signed-off-by: Alexander Loktionov <Alexander.Loktionov@aquantia.com>
Signed-off-by: Dmitrii Tarakanov <Dmitrii.Tarakanov@aquantia.com>
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: Dmitry Bezrukov <Dmitry.Bezrukov@aquantia.com>
Signed-off-by: David M. VomLehn <vomlehn@texas.net>
Signed-off-by: David S. Miller <davem@davemloft.net>