History log of /linux-master/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
Revision Date Author Comments
# 2e7d3b67 01-Feb-2024 Ivan Vecera <ivecera@redhat.com>

net: atlantic: Fix DMA mapping for PTP hwts ring

Function aq_ring_hwts_rx_alloc() maps extra AQ_CFG_RXDS_DEF bytes
for PTP HWTS ring but then generic aq_ring_free() does not take this
into account.
Create and use a specific function to free HWTS ring to fix this
issue.

Trace:
[ 215.351607] ------------[ cut here ]------------
[ 215.351612] DMA-API: atlantic 0000:4b:00.0: device driver frees DMA memory with different size [device address=0x00000000fbdd0000] [map size=34816 bytes] [unmap size=32768 bytes]
[ 215.351635] WARNING: CPU: 33 PID: 10759 at kernel/dma/debug.c:988 check_unmap+0xa6f/0x2360
...
[ 215.581176] Call Trace:
[ 215.583632] <TASK>
[ 215.585745] ? show_trace_log_lvl+0x1c4/0x2df
[ 215.590114] ? show_trace_log_lvl+0x1c4/0x2df
[ 215.594497] ? debug_dma_free_coherent+0x196/0x210
[ 215.599305] ? check_unmap+0xa6f/0x2360
[ 215.603147] ? __warn+0xca/0x1d0
[ 215.606391] ? check_unmap+0xa6f/0x2360
[ 215.610237] ? report_bug+0x1ef/0x370
[ 215.613921] ? handle_bug+0x3c/0x70
[ 215.617423] ? exc_invalid_op+0x14/0x50
[ 215.621269] ? asm_exc_invalid_op+0x16/0x20
[ 215.625480] ? check_unmap+0xa6f/0x2360
[ 215.629331] ? mark_lock.part.0+0xca/0xa40
[ 215.633445] debug_dma_free_coherent+0x196/0x210
[ 215.638079] ? __pfx_debug_dma_free_coherent+0x10/0x10
[ 215.643242] ? slab_free_freelist_hook+0x11d/0x1d0
[ 215.648060] dma_free_attrs+0x6d/0x130
[ 215.651834] aq_ring_free+0x193/0x290 [atlantic]
[ 215.656487] aq_ptp_ring_free+0x67/0x110 [atlantic]
...
[ 216.127540] ---[ end trace 6467e5964dd2640b ]---
[ 216.132160] DMA-API: Mapped at:
[ 216.132162] debug_dma_alloc_coherent+0x66/0x2f0
[ 216.132165] dma_alloc_attrs+0xf5/0x1b0
[ 216.132168] aq_ring_hwts_rx_alloc+0x150/0x1f0 [atlantic]
[ 216.132193] aq_ptp_ring_alloc+0x1bb/0x540 [atlantic]
[ 216.132213] aq_nic_init+0x4a1/0x760 [atlantic]

Fixes: 94ad94558b0f ("net: aquantia: add PTP rings infrastructure")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240201094752.883026-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# b3cb7a83 13-Dec-2023 Igor Russkikh <irusskikh@marvell.com>

net: atlantic: eliminate double free in error handling logic

Driver has a logic leak in ring data allocation/free,
where aq_ring_free could be called multiple times on same ring,
if system is under stress and got memory allocation error.

Ring pointer was used as an indicator of failure, but this is
not correct since only ring data is allocated/deallocated.
Ring itself is an array member.

Changing ring allocation functions to return error code directly.
This simplifies error handling and eliminates aq_ring_free
on higher layer.

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20231213095044.23146-1-irusskikh@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 7bb26ea7 13-Dec-2023 Igor Russkikh <irusskikh@marvell.com>

net: atlantic: fix double free in ring reinit logic

Driver has a logic leak in ring data allocation/free,
where double free may happen in aq_ring_free if system is under
stress and driver init/deinit is happening.

The probability is higher to get this during suspend/resume cycle.

Verification was done simulating same conditions with

stress -m 2000 --vm-bytes 20M --vm-hang 10 --backoff 1000
while true; do sudo ifconfig enp1s0 down; sudo ifconfig enp1s0 up; done

Fixed by explicitly clearing pointers to NULL on deallocation

Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/netdev/CAHk-=wiZZi7FcvqVSUirHBjx0bBUZ4dFrMDVLc3+3HCrtq0rBA@mail.gmail.com/
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20231213094044.22988-1-irusskikh@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# cbe860be 04-Dec-2023 Daniil Maximov <daniil31415it@gmail.com>

net: atlantic: Fix NULL dereference of skb pointer in

If is_ptp_ring == true in the loop of __aq_ring_xdp_clean function,
then a timestamp is stored from a packet in a field of skb object,
which is not allocated at the moment of the call (skb == NULL).

Generalize aq_ptp_extract_ts and other affected functions so they don't
work with struct sk_buff*, but with struct skb_shared_hwtstamps*.

Found by Linux Verification Center (linuxtesting.org) with SVACE

Fixes: 26efaef759a1 ("net: atlantic: Implement xdp data plane")
Signed-off-by: Daniil Maximov <daniil31415it@gmail.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20231204085810.1681386-1-daniil31415it@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# b51f4113 10-May-2023 Yunsheng Lin <linyunsheng@huawei.com>

net: introduce and use skb_frag_fill_page_desc()

Most users use __skb_frag_set_page()/skb_frag_off_set()/
skb_frag_size_set() to fill the page desc for a skb frag.

Introduce skb_frag_fill_page_desc() to do that.

net/bpf/test_run.c does not call skb_frag_off_set() to
set the offset, "copy_from_user(page_address(page), ...)"
and 'shinfo' being part of the 'data' kzalloced in
bpf_test_init() suggest that it is assuming offset to be
initialized as zero, so call skb_frag_fill_page_desc()
with offset being zero for this case.

Also, skb_frag_set_page() is not used anymore, so remove
it.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 37d01039 15-Mar-2023 Toke Høiland-Jørgensen <toke@redhat.com>

net: atlantic: Fix crash when XDP is enabled but no program is loaded

The aq_xdp_run_prog() function falls back to the XDP_ABORTED action
handler (using a goto) if the operations for any of the other actions fail.
The XDP_ABORTED handler in turn calls the bpf_warn_invalid_xdp_action()
tracepoint. However, the function also jumps into the XDP_PASS helper if no
XDP program is loaded on the device, which means the XDP_ABORTED handler
can be run with a NULL program pointer. This results in a NULL pointer
deref because the tracepoint dereferences the 'prog' pointer passed to it.

This situation can happen in multiple ways:
- If a packet arrives between the removal of the program from the interface
and the static_branch_dec() in aq_xdp_setup()
- If there are multiple devices using the same driver in the system and
one of them has an XDP program loaded and the other does not.

Fix this by refactoring the aq_xdp_run_prog() function to remove the 'goto
pass' handling if there is no XDP program loaded. Instead, factor out the
skb building in a separate small helper function.

Fixes: 26efaef759a1 ("net: atlantic: Implement xdp data plane")
Reported-by: Freysteinn Alfredsson <Freysteinn.Alfredsson@kau.se>
Tested-by: Freysteinn Alfredsson <Freysteinn.Alfredsson@kau.se>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20230315125539.103319-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 068c38ad 26-Oct-2022 Thomas Gleixner <tglx@linutronix.de>

net: Remove the obsolte u64_stats_fetch_*_irq() users (drivers).

Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.

Convert to the regular interface.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 45638f01 17-Apr-2022 Taehee Yoo <ap420073@gmail.com>

net: atlantic: Implement .ndo_xdp_xmit handler

aq_xdp_xmit() is the callback function of .ndo_xdp_xmit.
It internally calls aq_nic_xmit_xdpf() to send packet.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 26efaef7 17-Apr-2022 Taehee Yoo <ap420073@gmail.com>

net: atlantic: Implement xdp data plane

It supports XDP_PASS, XDP_DROP and multi buffer.

The new function aq_nic_xmit_xdpf() is used to send packet with
xdp_frame and internally it calls aq_nic_map_xdp().

AQC chip supports 32 multi-queues and 8 vectors(irq).
there are two option
1. under 8 cores and 4 tx queues per core.
2. under 4 cores and 8 tx queues per core.

Like ixgbe, these tx queues can be used only for XDP_TX, XDP_REDIRECT
queue. If so, no tx_lock is needed.
But this patchset doesn't use this strategy because getting hardware tx
queue index cost is too high.
So, tx_lock is used in the aq_nic_xmit_xdpf().

single-core, single queue, 80% cpu utilization.

30.75% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx
10.35% [kernel] [k] aq_hw_read_reg <---------- here
4.38% [kernel] [k] get_page_from_freelist

single-core, 8 queues, 100% cpu utilization, half PPS.

45.56% [kernel] [k] aq_hw_read_reg <---------- here
17.58% bpf_prog_xxx_xdp_prog_tx [k] bpf_prog_xxx_xdp_prog_tx
4.72% [kernel] [k] hw_atl_b0_hw_ring_rx_receive

The new function __aq_ring_xdp_clean() is a xdp rx handler and this is
called only when XDP is attached.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0d14657f 17-Apr-2022 Taehee Yoo <ap420073@gmail.com>

net: atlantic: Implement xdp control plane

aq_xdp() is a xdp setup callback function for Atlantic driver.
When XDP is attached or detached, the device will be restarted because
it uses different headroom, tailroom, and page order value.

If XDP enabled, it switches default page order value from 0 to 2.
Because the default maximum frame size is still 2K and it needs
additional area for headroom and tailroom.
The total size(headroom + frame size + tailroom) is 2624.
So, 1472Bytes will be always wasted for every frame.
But when order-2 is used, these pages can be used 6 times
with flip strategy.
It means only about 106Bytes per frame will be wasted.

Also, It supports xdp fragment feature.
MTU can be 16K if xdp prog supports xdp fragment.
If not, MTU can not exceed 2K - ETH_HLEN - ETH_FCS.

And a static key is added and It will be used to call the xdp_clean
handler in ->poll(). data plane implementation will be contained
the followed patch.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6aecbba1 09-May-2022 Grant Grundler <grundler@chromium.org>

net: atlantic: add check for MAX_SKB_FRAGS

Enforce that the CPU can not get stuck in an infinite loop.

Reported-by: Aashay Shringarpure <aashay@google.com>
Reported-by: Yi Chou <yich@google.com>
Reported-by: Shervin Oloumi <enlightened@google.com>
Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 79784d77 09-May-2022 Grant Grundler <grundler@chromium.org>

net: atlantic: reduce scope of is_rsc_complete

Don't defer handling the err case outside the loop. That's pointless.

And since is_rsc_complete is only used inside this loop, declare
it inside the loop to reduce it's scope.

Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 62e0ae0f 09-May-2022 Grant Grundler <grundler@chromium.org>

net: atlantic: fix "frag[0] not initialized"

In aq_ring_rx_clean(), if buff->is_eop is not set AND
buff->len < AQ_CFG_RX_HDR_SIZE, then hdr_len remains equal to
buff->len and skb_add_rx_frag(xxx, *0*, ...) is not called.

The loop following this code starts calling skb_add_rx_frag() starting
with i=1 and thus frag[0] is never initialized. Since i is initialized
to zero at the top of the primary loop, we can just reference and
post-increment i instead of hardcoding the 0 when calling
skb_add_rx_frag() the first time.

Reported-by: Aashay Shringarpure <aashay@google.com>
Reported-by: Yi Chou <yich@google.com>
Reported-by: Shervin Oloumi <enlightened@google.com>
Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5f501532 26-Dec-2021 Zekun Shen <bruceshenzk@gmail.com>

atlantic: Fix buff_ring OOB in aq_ring_rx_clean

The function obtain the next buffer without boundary check.
We should return with I/O error code.

The bug is found by fuzzing and the crash report is attached.
It is an OOB bug although reported as use-after-free.

[ 4.804724] BUG: KASAN: use-after-free in aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[ 4.805661] Read of size 4 at addr ffff888034fe93a8 by task ksoftirqd/0/9
[ 4.806505]
[ 4.806703] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G W 5.6.0 #34
[ 4.809030] Call Trace:
[ 4.809343] dump_stack+0x76/0xa0
[ 4.809755] print_address_description.constprop.0+0x16/0x200
[ 4.810455] ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[ 4.811234] ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[ 4.813183] __kasan_report.cold+0x37/0x7c
[ 4.813715] ? aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[ 4.814393] kasan_report+0xe/0x20
[ 4.814837] aq_ring_rx_clean+0x1e88/0x2730 [atlantic]
[ 4.815499] ? hw_atl_b0_hw_ring_rx_receive+0x9a5/0xb90 [atlantic]
[ 4.816290] aq_vec_poll+0x179/0x5d0 [atlantic]
[ 4.816870] ? _GLOBAL__sub_I_65535_1_aq_pci_func_init+0x20/0x20 [atlantic]
[ 4.817746] ? __next_timer_interrupt+0xba/0xf0
[ 4.818322] net_rx_action+0x363/0xbd0
[ 4.818803] ? call_timer_fn+0x240/0x240
[ 4.819302] ? __switch_to_asm+0x40/0x70
[ 4.819809] ? napi_busy_loop+0x520/0x520
[ 4.820324] __do_softirq+0x18c/0x634
[ 4.820797] ? takeover_tasklets+0x5f0/0x5f0
[ 4.821343] run_ksoftirqd+0x15/0x20
[ 4.821804] smpboot_thread_fn+0x2f1/0x6b0
[ 4.822331] ? smpboot_unregister_percpu_thread+0x160/0x160
[ 4.823041] ? __kthread_parkme+0x80/0x100
[ 4.823571] ? smpboot_unregister_percpu_thread+0x160/0x160
[ 4.824301] kthread+0x2b5/0x3b0
[ 4.824723] ? kthread_create_on_node+0xd0/0xd0
[ 4.825304] ret_from_fork+0x35/0x40

Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6a405f6c 18-Nov-2021 Zekun Shen <bruceshenzk@gmail.com>

atlantic: fix double-free in aq_ring_tx_clean

We found this bug while fuzzing the device driver. Using and freeing
the dangling pointer buff->skb would cause use-after-free and
double-free.

This bug is triggerable with compromised/malfunctioning devices. We
found the bug with QEMU emulation and tested the patch by emulation.
We did NOT test on a real device.

Attached is the bug report.

BUG: KASAN: double-free or invalid-free in consume_skb+0x6c/0x1c0

Call Trace:
dump_stack+0x76/0xa0
print_address_description.constprop.0+0x16/0x200
? consume_skb+0x6c/0x1c0
kasan_report_invalid_free+0x61/0xa0
? consume_skb+0x6c/0x1c0
__kasan_slab_free+0x15e/0x170
? consume_skb+0x6c/0x1c0
kfree+0x8c/0x230
consume_skb+0x6c/0x1c0
aq_ring_tx_clean+0x5c2/0xa80 [atlantic]
aq_vec_poll+0x309/0x5d0 [atlantic]
? _sub_I_65535_1+0x20/0x20 [atlantic]
? __next_timer_interrupt+0xba/0xf0
net_rx_action+0x363/0xbd0
? call_timer_fn+0x240/0x240
? __switch_to_asm+0x34/0x70
? napi_busy_loop+0x520/0x520
? net_tx_action+0x379/0x720
__do_softirq+0x18c/0x634
? takeover_tasklets+0x5f0/0x5f0
run_ksoftirqd+0x15/0x20
smpboot_thread_fn+0x2f1/0x6b0
? smpboot_unregister_percpu_thread+0x160/0x160
? __kthread_parkme+0x80/0x100
? smpboot_unregister_percpu_thread+0x160/0x160
kthread+0x2b5/0x3b0
? kthread_create_on_node+0xd0/0xd0
ret_from_fork+0x22/0x40

Reported-by: Brendan Dolan-Gavitt <brendandg@nyu.edu>
Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9bd2702d 23-Nov-2020 Lincoln Ramsay <lincoln.ramsay@opengear.com>

aquantia: Remove the build_skb path

When performing IPv6 forwarding, there is an expectation that SKBs
will have some headroom. When forwarding a packet from the aquantia
driver, this does not always happen, triggering a kernel warning.

aq_ring.c has this code (edited slightly for brevity):

if (buff->is_eop && buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) {
skb = build_skb(aq_buf_vaddr(&buff->rxdata), AQ_CFG_RX_FRAME_MAX);
} else {
skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);

There is a significant difference between the SKB produced by these
2 code paths. When napi_alloc_skb creates an SKB, there is a certain
amount of headroom reserved. However, this is not done in the
build_skb codepath.

As the hardware buffer that build_skb is built around does not
handle the presence of the SKB header, this code path is being
removed and the napi_alloc_skb path will always be used. This code
path does have to copy the packet header into the SKB, but it adds
the packet data as a frag.

Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
Signed-off-by: Lincoln Ramsay <lincoln.ramsay@opengear.com>
Link: https://lore.kernel.org/r/MWHPR1001MB23184F3EAFA413E0D1910EC9E8FC0@MWHPR1001MB2318.namprd10.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# aa7e17a3 20-Jul-2020 Dmitry Bogdanov <dbogdanov@marvell.com>

net: atlantic: additional per-queue stats

This patch adds additional per-queue stats, these could
be useful for debugging and diagnostics.

Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d7d8bb92 20-Jul-2020 Mark Starovoytov <mstarovo@pm.me>

net: atlantic: use u64_stats_update_* to protect access to 64-bit stats

This patch adds u64_stats_update_* usage to protect access to 64-bit stats,
where necessary.

This is necessary for per-ring stats, because they are updated by the
driver directly, so there is a possibility for a partial read.

Other stats require no additional protection, e.g.:
* all MACSec stats are fetched directly from HW (under semaphore);
* nic/ndev stats (aq_stats_s) are fetched directly from FW (under mutex).

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 508f2e3d 20-Jul-2020 Mark Starovoytov <mstarovo@pm.me>

net: atlantic: split rx and tx per-queue stats

This patch splits rx and tx per-queue stats.
This change simplifies the follow-up introduction of PTP stats and
u64_stats_update_* usage.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4378b882 26-Jun-2020 Igor Russkikh <irusskikh@marvell.com>

net: atlantic: put ptp code under IS_REACHABLE check

A1 requires additional processing for both egress and ingress to support
PTP.
And it makes sense to get rid of this processing altogether (via ifdef),
if PTP clock is disabled globally.

This patch puts the PTP code under the corresponding IS_REACHABLE check.

Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 586616cb 26-Jun-2020 Mark Starovoytov <mstarovo@pm.me>

net: atlantic: fix typo in aq_ring_tx_clean

This patch fixes a typo in aq_ring_tx_clean.
stats is a union, so the typo doesn't cause any issues, but it's a typo
nonetheless.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a83fe6b6 22-May-2020 Dmitry Bezrukov <dbezrukov@marvell.com>

net: atlantic: QoS implementation: multi-TC support

This patch adds multi-TC support.

PTP is automatically disabled when the user enables more than 2 TCs,
otherwise traffic on TC2 won't quite work, because it's reserved for PTP.

Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Co-developed-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Co-developed-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a4980919 14-Feb-2020 Pavel Belous <pbelous@marvell.com>

net: atlantic: fix use after free kasan warn

skb->len is used to calculate statistics after xmit invocation.

Under a stress load it may happen that skb will be xmited,
rx interrupt will come and skb will be freed, all before xmit function
is even returned.

Eventually, skb->len will access unallocated area.

Moving stats calculation into tx_clean routine.

Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
Reported-by: Christophe Vu-Brugier <cvubrugier@fastmail.fm>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Pavel Belous <pbelous@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 15beab0a 14-Feb-2020 Dmitry Bezrukov <dbezrukov@marvell.com>

net: atlantic: checksum compat issue

Yet another checksum offload compatibility issue was found.

The known issue is that AQC HW marks tcp packets with 0xFFFF checksum
as invalid (1). This is workarounded in driver, passing all the suspicious
packets up to the stack for further csum validation.

Another HW problem (2) is that it hides invalid csum of LRO aggregated
packets inside of the individual descriptors. That was workarounded
by forced scan of all LRO descriptors for checksum errors.

However the scan logic was joint for both LRO and multi-descriptor
packets (jumbos). And this causes the issue.

We have to drop LRO packets with the detected bad checksum
because of (2), but we have to pass jumbo packets to stack because of (1).

When using windows tcp partner with jumbo frames but with LSO disabled
driver discards such frames as bad checksummed. But only LRO frames
should be dropped, not jumbos.

On such a configurations tcp stream have a chance of drops and stucks.

(1) 76f254d4afe2 ("net: aquantia: tcp checksum 0xffff being handled incorrectly")
(2) d08b9a0a3ebd ("net: aquantia: do not pass lro session with invalid tcp checksum")

Fixes: d08b9a0a3ebd ("net: aquantia: do not pass lro session with invalid tcp checksum")
Signed-off-by: Dmitry Bezrukov <dbezrukov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7b0c342f 07-Nov-2019 Nikita Danilov <ndanilov@marvell.com>

net: atlantic: code style cleanup

Thats a pure checkpatck walkthrough the code with no functional
changes. Reverse christmas tree, spacing, etc.

Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 04a18399 22-Oct-2019 Egor Pomozov <epomozov@marvell.com>

net: aquantia: implement data PTP datapath

Here we do alloc/free IRQs for PTP rings.
We also implement processing of PTP packets on TX and RX sides.

Signed-off-by: Egor Pomozov <epomozov@marvell.com>
Co-developed-by: Sergey Samoilenko <sergey.samoilenko@aquantia.com>
Signed-off-by: Sergey Samoilenko <sergey.samoilenko@aquantia.com>
Co-developed-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 94ad9455 22-Oct-2019 Egor Pomozov <epomozov@marvell.com>

net: aquantia: add PTP rings infrastructure

Add implementations of PTP rings alloc/free.

PTP desing on this device uses two separate rings on a separate traffic
class for traffic rx/tx.

Third ring (hwts) is not a traffic ring, but is used only to receive timestamps
of the transmitted packets.

Signed-off-by: Egor Pomozov <epomozov@marvell.com>
Co-developed-by: Sergey Samoilenko <sergey.samoilenko@aquantia.com>
Signed-off-by: Sergey Samoilenko <sergey.samoilenko@aquantia.com>
Co-developed-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d08b9a0a 11-Oct-2019 Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>

net: aquantia: do not pass lro session with invalid tcp checksum

Individual descriptors on LRO TCP session should be checked
for CRC errors. It was discovered that HW recalculates
L4 checksums on LRO session and does not break it up on bad L4
csum.

Thus, driver should aggregate HW LRO L4 statuses from all individual
buffers of LRO session and drop packet if one of the buffers has bad
L4 checksum.

Fixes: f38f1ee8aeb2 ("net: aquantia: check rx csum for all packets in LRO session")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 880b3ca5 25-Jun-2019 Igor Russkikh <Igor.Russkikh@aquantia.com>

net: aquantia: vlan offloads logic in datapath

Update datapath by adding logic related to hardware assisted
vlan strip/insert behaviour.

Tested-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 75a6faf6 01-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 422

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 101 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190113.822954939@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# f38f1ee8 25-May-2019 Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>

net: aquantia: check rx csum for all packets in LRO session

Atlantic hardware does not aggregate nor breaks LRO sessions
with bad csum packets. This means driver should take care of that.

If in LRO session there is a non-first descriptor with invalid
checksum (L2/L3/L4), the driver must account this information
in csum application logic.

Fixes: 018423e90bee8 ("net: ethernet: aquantia: Add ring support code")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 31bafc49 25-May-2019 Igor Russkikh <Igor.Russkikh@aquantia.com>

net: aquantia: tx clean budget logic error

In case no other traffic happening on the ring, full tx cleanup
may not be completed. That may cause socket buffer to overflow
and tx traffic to stuck until next activity on the ring happens.

This is due to logic error in budget variable decrementor.
Variable is compared with zero, and then post decremented,
causing it to become MAX_INT. Solution is remove decrementor
from the `for` statement and rewrite it in a clear way.

Fixes: b647d3980948e ("net: aquantia: Add tx clean budget and valid budget handling logic")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c43f1255 22-Apr-2019 Stanislav Fomichev <sdf@google.com>

net: pass net_device argument to the eth_get_headlen

Update all users of eth_get_headlen to pass network device, fetch
network namespace from it and pass it down to the flow dissector.
This commit is a noop until administrator inserts BPF flow dissector
program.

Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


# 9773ef18 23-Mar-2019 Igor Russkikh <Igor.Russkikh@aquantia.com>

net: aquantia: Introduce rx refill threshold value

Before that, we've refilled ring even on single descriptor move.
Under high packet load that caused page allocation logic to be triggered
too often. That made overall ring processing slower.

Moreover, with page buffer reuse implemented, we should give a chance
higher networking levels to process received packets faster, release
the pages they consumed and therefore give a higher chance for these
pages to be reused.

RX ring is now refilled only when AQ_CFG_RX_REFILL_THRES or more
descriptors were processed (32 by default). Under regular traffic this
gives quite enough time for packet to be consumed and page to be reused.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 46f4c29d 23-Mar-2019 Igor Russkikh <Igor.Russkikh@aquantia.com>

net: aquantia: optimize rx performance by page reuse strategy

We introduce internal aq_rxpage wrapper over regular page
where extra field is tracked: rxpage offset inside of allocated page.

This offset allows to reuse one page for multiple packets.
When needed (for example with large frames processing), allocated
pageorder could be customized. This gives even larger page reuse
efficiency.

page_ref_count is used to track page users. If during rx refill
underlying page has users, we increase pg_off by rx frame size
thus the top half of the page is reused.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7e2698c4 23-Mar-2019 Igor Russkikh <Igor.Russkikh@aquantia.com>

net: aquantia: optimize rx path using larger preallocated skb len

Atlantic driver used 14 bytes preallocated skb size. That made L3 protocol
processing inefficient because pskb_pull had to fetch all the L3/L4 headers
from extra fragments.

Specially on UDP flows that caused extra packet drops because CPU was
overloaded with pskb_pull.

This patch uses eth_get_headlen for skb preallocation.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a7faaa0c 16-Mar-2019 Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>

net: aquantia: fix rx checksum offload for UDP/TCP over IPv6

TCP/UDP checksum validity was propagated to skb
only if IP checksum is valid.
But for IPv6 there is no validity as there is no checksum in IPv6.
This patch propagates TCP/UDP checksum validity regardless of IP checksum.

Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ad703c2b 09-Nov-2018 Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>

net: aquantia: invalid checksumm offload implementation

Packets with marked invalid IP/UDP/TCP checksums were considered as good
by the driver. The error was in a logic, processing offload bits in
RX descriptor.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d26ed6b0 15-Sep-2018 Friedemann Gerold <f.gerold@b-c-s.de>

net: aquantia: memory corruption on jumbo frames

This patch fixes skb_shared area, which will be corrupted
upon reception of 4K jumbo packets.

Originally build_skb usage purpose was to reuse page for skb to eliminate
needs of extra fragments. But that logic does not take into account that
skb_shared_info should be reserved at the end of skb data area.

In case packet data consumes all the page (4K), skb_shinfo location
overflows the page. As a consequence, __build_skb zeroed shinfo data above
the allocated page, corrupting next page.

The issue is rarely seen in real life because jumbo are normally larger
than 4K and that causes another code path to trigger.
But it 100% reproducible with simple scapy packet, like:

sendp(IP(dst="192.168.100.3") / TCP(dport=443) \
/ Raw(RandString(size=(4096-40))), iface="enp1s0")

Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")

Reported-by: Friedemann Gerold <f.gerold@b-c-s.de>
Reported-by: Michael Rauch <michael@rauch.be>
Signed-off-by: Friedemann Gerold <f.gerold@b-c-s.de>
Tested-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6f9dbadc 15-Sep-2018 Friedemann Gerold <f.gerold@b-c-s.de>

net: aquantia: memory corruption on jumbo frames

This patch fixes skb_shared area, which will be corrupted
upon reception of 4K jumbo packets.

Originally build_skb usage purpose was to reuse page for skb to eliminate
needs of extra fragments. But that logic does not take into account that
skb_shared_info should be reserved at the end of skb data area.

In case packet data consumes all the page (4K), skb_shinfo location
overflows the page. As a consequence, __build_skb zeroed shinfo data above
the allocated page, corrupting next page.

The issue is rarely seen in real life because jumbo are normally larger
than 4K and that causes another code path to trigger.
But it 100% reproducible with simple scapy packet, like:

sendp(IP(dst="192.168.100.3") / TCP(dport=443) \
/ Raw(RandString(size=(4096-40))), iface="enp1s0")

Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
Reported-by: Friedemann Gerold <f.gerold@b-c-s.de>
Reported-by: Michael Rauch <michael@rauch.be>
Signed-off-by: Friedemann Gerold <f.gerold@b-c-s.de>
Tested-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e9157848 09-Sep-2018 Nikita Danilov <nikita.danilov@aquantia.com>

net: aquantia: whitespace changes

Removed extra spaces, corrected alignment.

Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b647d398 20-Mar-2018 Igor Russkikh <igor.russkikh@aquantia.com>

net: aquantia: Add tx clean budget and valid budget handling logic

We should report to napi full budget only when we have more job to do.
Before this fix, on any tx queue cleanup we forced napi to do poll again.
Thats a waste of cpu resources and caused storming with napi polls when
there was at least one tx on each interrupt.

With this fix we report full budget only when there is more job on TX
to do. Or, as before, when rx budget was fully consumed.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9ec03bf6 15-Jan-2018 Igor Russkikh <igor.russkikh@aquantia.com>

net: aquantia: Fix internal stats calculation on rx

skb len should be fetched before gro_receive - otherwise we may get
wrong or even outdated skb data.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 453f85d4 15-Nov-2017 Mel Gorman <mgorman@techsingularity.net>

mm: remove __GFP_COLD

As the page free path makes no distinction between cache hot and cold
pages, there is no real useful ordering of pages in the free list that
allocation requests can take advantage of. Juding from the users of
__GFP_COLD, it is likely that a number of them are the result of copying
other sites instead of actually measuring the impact. Remove the
__GFP_COLD parameter which simplifies a number of paths in the page
allocator.

This is potentially controversial but bear in mind that the size of the
per-cpu pagelists versus modern cache sizes means that the whole per-cpu
list can often fit in the L3 cache. Hence, there is only a potential
benefit for microbenchmarks that alloc/free pages in a tight loop. It's
even worse when THP is taken into account which has little or no chance
of getting a cache-hot page as the per-cpu list is bypassed and the
zeroing of multiple pages will thrash the cache anyway.

The truncate microbenchmarks are not shown as this patch affects the
allocation path and not the free path. A page fault microbenchmark was
tested but it showed no sigificant difference which is not surprising
given that the __GFP_COLD branches are a miniscule percentage of the
fault path.

Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c7545689 25-Sep-2017 Pavel Belous <pavel.belous@aquantia.com>

atlantic: fix iommu errors

Call skb_frag_dma_map multiple times if tx length is greater than
device max and avoid processing tx ring until entire packet has been
sent.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3aec6412 25-Sep-2017 Igor Russkikh <igor.russkikh@aquantia.com>

aquantia: Fix Tx queue hangups

Driver did a poor job in managing its Tx queues: Sometimes it could stop
tx queues due to link down condition in aq_nic_xmit - but never waked up
them. That led to Tx path total suspend.
This patch fixes this and improves generic queue management:
- introduces queue restart counter
- uses generic netif_ interface to disable and enable tx path
- refactors link up/down condition and introduces dmesg log event when
link changes.
- introduces new constant for minimum descriptors count required for queue
wakeup

Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 278175ab 28-Aug-2017 Pavel Belous <pavel.belous@aquantia.com>

net:ethernet:aquantia: Extra spinlocks removed.

This patch removes datapath spinlocks which does not perform any
useful work.

Fixes: 6e70637f9f1e ("net: ethernet: aquantia: Add ring support code")
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a54df682 03-Aug-2017 Pavel Belous <pavel.belous@aquantia.com>

aquantia: Switch to use napi_gro_receive

Add support for GRO (generic receive offload) for aQuantia Atlantic driver.
This results in a perfomance improvement when GRO is enabled.

Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 219f1d79 18-May-2017 Davide Caratti <dcaratti@redhat.com>

sk_buff: remove support for csum_bad in sk_buff

This bit was introduced with commit 5a21232983aa ("net: Support for
csum_bad in skbuff") to reduce the stack workload when processing RX
packets carrying a wrong Internet Checksum. Up to now, only one driver and
GRO core are setting it.

Suggested-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9f0dd8c3 23-Mar-2017 Pavel Belous <pavel.belous@aquantia.com>

net:ethernet:aquantia: Missing spinlock initialization.

Fix for missing initialization aq_ring header.lock spinlock.

Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")

Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e399553d 20-Feb-2017 Pavel Belous <pavel.belous@aquantia.com>

net: ethernet: aquantia: Copying tx buffers is not needed.

This fix removes copying of tx biffers.
Now we use ring->buff_fing directly.

Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 89b64388 20-Feb-2017 Pavel Belous <pavel.belous@aquantia.com>

net: ethernet: aquantia: Fixed memory allocation if AQ_CFG_RX_FRAME_MAX > 1 page.

We should allocate the number of pages based on the config parameter
AQ_CFG_RX_FRAME_MAX.

Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 99e55827 20-Feb-2017 Pavel Belous <pavel.belous@aquantia.com>

net: ethernet: aquantia: Removed extra assignment for skb->dev.

This assignment is not needed.

Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eb36bedf 17-Feb-2017 Lino Sanfilippo <LinoSanfilippo@gmx.de>

net: aquantia: remove function aq_ring_tx_deinit

Both functions aq_ring_rx_deinit() and aq_ring_tx_clean() are almost
identical aside from an additional check in the latter.
Move that check from the function into its caller and replace
aq_ring_rx_deinit() with aq_ring_rx_deinit().

By doing this also adjust the functions return value from int to void
since it can never fail.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Tested-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ff1176f6 01-Feb-2017 Dan Carpenter <dan.carpenter@oracle.com>

ethernet: aquantia: fix dma_mapping_error test

dma_mapping_error() returns 1 if there is an error and 0 if not.

Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f81e5ca9 27-Jan-2017 Colin Ian King <colin.king@canonical.com>

net: ethernet: aquantia: remove another redundant err check

The check on err < 0 is redundant and can be removed. Detected
by CoverityScan, CID#1398318 ("Logically Dead Code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 018423e9 23-Jan-2017 David VomLehn <vomlehn@texas.net>

net: ethernet: aquantia: Add ring support code

Add code to support the transmit and receive ring buffers.

Signed-off-by: Alexander Loktionov <Alexander.Loktionov@aquantia.com>
Signed-off-by: Dmitrii Tarakanov <Dmitrii.Tarakanov@aquantia.com>
Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com>
Signed-off-by: Dmitry Bezrukov <Dmitry.Bezrukov@aquantia.com>
Signed-off-by: David M. VomLehn <vomlehn@texas.net>
Signed-off-by: David S. Miller <davem@davemloft.net>