History log of /linux-master/drivers/net/ethernet/freescale/enetc/enetc.h
Revision Date Author Comments
# dd8e215e 22-Sep-2023 Kees Cook <keescook@chromium.org>

net: enetc: Annotate struct enetc_int_vector with __counted_by

Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct enetc_int_vector.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230922172858.3822653-5-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 92272ec4 02-Aug-2023 Jakub Kicinski <kuba@kernel.org>

eth: add missing xdp.h includes in drivers

Handful of drivers currently expect to get xdp.h by virtue
of including netdevice.h. This will soon no longer be the case
so add explicit includes.

Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-2-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>


# 5353599a 29-May-2023 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: refactor enetc_setup_tc_taprio() to have a switch/case for cmd

Make enetc_setup_tc_taprio() more amenable to future extensions, like
reporting statistics.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 82714539 18-Apr-2023 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: only commit preemptible TCs to hardware when MM TX is active

This was left as TODO in commit 01e23b2b3bad ("net: enetc: add support
for preemptible traffic classes") since it's relatively complicated.

Where this makes a difference is with a configuration as follows:

ethtool --set-mm eno0 pmac-enabled on tx-enabled on verify-enabled on

Preemptible packets should only be sent when the MAC Merge TX direction
becomes active (i.o.w. when the verification process succeeds, aka when
the link partner confirms it can process preemptible traffic). But the
tc qdisc with the preemptible traffic classes is offloaded completely
asynchronously w.r.t. the MM becoming active.

The ENETC manual does suggest that this should be handled in the driver:
"On startup, software should wait for the verification process to
complete (MMCSR[VSTS]=011) before initiating traffic".

Adding the necessary logic allows future selftests to uphold the claim
that an inactive or disabled MAC Merge layer should never send data
packets through the pMAC.

This change moves enetc_set_ptcfpr() from enetc.c to enetc_ethtool.c,
where its only caller is now - enetc_mm_commit_preemptible_tcs().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 01e23b2b 11-Apr-2023 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: add support for preemptible traffic classes

PFs which support the MAC Merge layer also have a set of 8 registers
called "Port traffic class N frame preemption register (PTC0FPR - PTC7FPR)".
Through these, a traffic class (group of TX rings of same dequeue
priority) can be mapped to the eMAC or to the pMAC.

There's nothing particularly spectacular here. We should probably only
commit the preemptible TCs to hardware once the MAC Merge layer became
active, but unlike Felix, we don't have an IRQ that notifies us of that.
We'd have to sleep for up to verifyTime (127 ms) to wait for a
resolution coming from the verification state machine; not only from the
ndo_setup_tc() code path, but also from enetc_mm_link_state_update().
Since it's relatively complicated and has a relatively small benefit,
I'm not doing it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c7b9e808 06-Feb-2023 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: add support for MAC Merge layer

Add PF driver support for viewing and changing the MAC Merge sublayer
parameters, and seeing the verification state machine's current state.
The verification handshake with the link partner is driven by hardware.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230206094531.444988-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 800db2d1 02-Feb-2023 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: ensure we always have a minimum number of TXQs for stack

Currently it can happen that an mqprio qdisc is installed with num_tc 8,
and this will reserve 8 (out of 8) TXQs for the network stack. Then we
can attach an XDP program, and this will crop 2 TXQs, leaving just 6 for
mqprio. That's not what the user requested, and we should fail it.

On the other hand, if mqprio isn't requested, we still give the 8 TXQs
to the network stack (with hashing among a single traffic class), but
then, cropping 2 TXQs for XDP is fine, because the user didn't
explicitly ask for any number of TXQs, so no expectations are violated.

Simply put, the logic that mqprio should impose a minimum number of TXQs
for the network never existed. Let's say (more or less arbitrarily) that
without mqprio, the driver expects a minimum number of TXQs equal to the
number of CPUs (on NXP LS1028A, that is either 1, or 2). And with mqprio,
mqprio gives the minimum required number of TXQs.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 12717dec 19-Jan-2023 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: implement software lockstep for port MAC registers

Currently the enetc driver duplicates its writes to the PM0 registers
also to PM1, but it doesn't do this consistently - for example we write
to ENETC_PM0_MAXFRM but not to ENETC_PM1_MAXFRM.

Create enetc_port_mac_wr() which writes both the PM0 and PM1 register
with the same value (if frame preemption is supported on this port).
Also create enetc_port_mac_rd() which reads from PM0 - the assumption
being that PM1 contains just the same value.

This will be necessary when we enable the MAC Merge layer properly, and
the pMAC becomes operational.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 94557a9a 19-Jan-2023 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: detect frame preemption hardware capability

Similar to other TSN features, query the Station Interface capability
register to see whether preemption is supported on this port or not.
On LS1028A, preemption is available on ports 0 and 2, but not on 1
and 3.

This will allow us in the future to write the pMAC registers only on the
ENETC ports where a pMAC actually exists.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 766338c7 17-Jan-2023 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: rename "xdp" and "dev" in enetc_setup_bpf()

Follow the convention from this driver, which is to name "struct
net_device *" as "ndev", and the convention from other drivers, to name
"struct netdev_bpf *" as "bpf".

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f3ce29e1 17-Jan-2023 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: split ring resource allocation from assignment

We have a few instances in the enetc driver where the ring resources
(BD ring iomem, software BD ring, software TSO headers, basically
everything except RX buffers) need to be reallocated. For example, when
RX timestamping is enabled, the RX BD format changes to an extended one
(twice as large).

Currently, this is done using a simplistic enetc_close() -> enetc_open()
procedure. But this is quite crude, since it also invokes phylink_stop()
-> phylink_start(), the link is lost, and a few seconds need to pass for
autoneg to complete again.

In fact it's bad also due to the improper (yolo) error checking. In case
we fail to allocate new resources, we've already freed the old ones, so
the interface is more or less stuck.

To avoid that, we need a system where reconfiguration is possible in a
way in which resources are allocated upfront. This means that there will
be a higher memory usage temporarily, but the assignment of resources to
rings can be done when both the old and new resources are still available.

Introduce a struct enetc_bdr_resource which holds the resources for a
ring, be it RX or TX. This structure duplicates a lot of fields from
struct enetc_bdr (and access to the same fields in the ring structure
was left duplicated, to not change cache characteristics in the fast
path).

When enetc_alloc_tx_resources() runs, it returns an array of resource
elements (one per TX ring), in addition to the existing priv->tx_res.
To populate priv->tx_res with that array, one must call
enetc_assign_tx_resources(), and this also frees the old resources.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 59cc773a 04-Jan-2023 Lorenzo Bianconi <lorenzo@kernel.org>

net: ethernet: enetc: get rid of xdp_redirect_sg counter

Remove xdp_redirect_sg counter and the related ethtool entry since it is
no longer used.

Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 290b5fe0 22-Nov-2022 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: preserve TX ring priority across reconfiguration

In the blamed commit, a rudimentary reallocation procedure for RX buffer
descriptors was implemented, for the situation when their format changes
between normal (no PTP) and extended (PTP).

enetc_hwtstamp_set() calls enetc_close() and enetc_open() in a sequence,
and this sequence loses information which was previously configured in
the TX BDR Mode Register, specifically via the enetc_set_bdr_prio() call.
The TX ring priority is configured by tc-mqprio and tc-taprio, and
affects important things for TSN such as the TX time of packets. The
issue manifests itself most visibly by the fact that isochron --txtime
reports premature packet transmissions when PTP is first enabled on an
enetc interface.

Save the TX ring priority in a new field in struct enetc_bdr (occupies a
2 byte hole on arm64) in order to make this survive a ring reconfiguration.

Fixes: 434cebabd3a2 ("enetc: Add dynamic allocation of extended Rx BD rings")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Link: https://lore.kernel.org/r/20221122130936.1704151-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# dfc7175d 27-Sep-2022 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: offload per-tc max SDU from tc-taprio

The driver currently sets the PTCMSDUR register statically to the max
MTU supported by the interface. Keep this logic if tc-taprio is absent
or if the max_sdu for a traffic class is 0, and follow the requested max
SDU size otherwise.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 715bf261 27-Sep-2022 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: cache accesses to &priv->si->hw

The &priv->si->hw construct dereferences 2 pointers and makes lines
longer than they need to be, in turn making the code harder to read.

Replace &priv->si->hw accesses with a "hw" variable when there are 2 or
more accesses within a function that dereference this. This includes
loops, since &priv->si->hw is a loop invariant.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5641c751 16-Sep-2022 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: deny offload of tc-based TSN features on VF interfaces

TSN features on the ENETC (taprio, cbs, gate, police) are configured
through a mix of command BD ring messages and port registers:
enetc_port_rd(), enetc_port_wr().

Port registers are a region of the ENETC memory map which are only
accessible from the PCIe Physical Function. They are not accessible from
the Virtual Functions.

Moreover, attempting to access these registers crashes the kernel:

$ echo 1 > /sys/bus/pci/devices/0000\:00\:00.0/sriov_numvfs
pci 0000:00:01.0: [1957:ef00] type 00 class 0x020001
fsl_enetc_vf 0000:00:01.0: Adding to iommu group 15
fsl_enetc_vf 0000:00:01.0: enabling device (0000 -> 0002)
fsl_enetc_vf 0000:00:01.0 eno0vf0: renamed from eth0
$ tc qdisc replace dev eno0vf0 root taprio num_tc 8 map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \
sched-entry S 0x7f 900000 sched-entry S 0x80 100000 flags 0x2
Unable to handle kernel paging request at virtual address ffff800009551a08
Internal error: Oops: 96000007 [#1] PREEMPT SMP
pc : enetc_setup_tc_taprio+0x170/0x47c
lr : enetc_setup_tc_taprio+0x16c/0x47c
Call trace:
enetc_setup_tc_taprio+0x170/0x47c
enetc_setup_tc+0x38/0x2dc
taprio_change+0x43c/0x970
taprio_init+0x188/0x1e0
qdisc_create+0x114/0x470
tc_modify_qdisc+0x1fc/0x6c0
rtnetlink_rcv_msg+0x12c/0x390

Split enetc_setup_tc() into separate functions for the PF and for the
VF drivers. Also remove enetc_qos.o from being included into
enetc-vf.ko, since it serves absolutely no purpose there.

Fixes: 34c6adf1977b ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220916133209.3351399-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# fed38e64 16-Sep-2022 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: move enetc_set_psfp() out of the common enetc_set_features()

The VF netdev driver shouldn't respond to changes in the NETIF_F_HW_TC
flag; only PFs should. Moreover, TSN-specific code should go to
enetc_qos.c, which should not be included in the VF driver.

Fixes: 79e499829f3f ("net: enetc: add hw tc hw offload features for PSPF capability")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220916133209.3351399-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 285e8ded 10-May-2022 Po Liu <Po.Liu@nxp.com>

net: enetc: count the tc-taprio window drops

The enetc scheduler for IEEE 802.1Qbv has 2 options (depending on
PTGCR[TG_DROP_DISABLE]) when we attempt to send an oversized packet
which will never fit in its allotted time slot for its traffic class:
either block the entire port due to head-of-line blocking, or drop the
packet and set a bit in the writeback format of the transmit buffer
descriptor, allowing other packets to be sent.

We obviously choose the second option in the driver, but we do not
detect the drop condition, so from the perspective of the network stack,
the packet is sent and no error counter is incremented.

This change checks the writeback of the TX BD when tc-taprio is enabled,
and increments a specific ethtool statistics counter and a generic
"tx_dropped" counter in ndo_get_stats64.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 0cc11cdb 09-Feb-2022 Po Liu <po.liu@nxp.com>

net:enetc: command BD ring data memory alloc as one function alone

Separate the CBDR data memory alloc standalone. It is convenient for
other part loading, for example the ENETC QOS part.

Reported-and-suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Po Liu <po.liu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fb8629e2 07-Oct-2021 Ioana Ciornei <ioana.ciornei@nxp.com>

net: enetc: add support for software TSO

This patch adds support for driver level TSO in the enetc driver using
the TSO API.

Beside using the usual tso_build_hdr(), tso_build_data() this specific
implementation also has to compute the checksum, both IP and L4, for
each resulted segment. This is because the ENETC controller does not
support Tx checksum offload which is needed in order to take advantage
of TSO.

With the workaround for the ENETC MDIO erratum in place the Tx path of
the driver is forced to lock/unlock for each skb sent. This is why, even
though we are computing the checksum by hand we see the following
improvement in TCP termination on the LS1028A SoC, on a single A72 core
running at 1.3GHz:

before: 1.63 Gbits/sec
after: 2.34 Gbits/sec

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 87614b93 16-Apr-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: create a common enetc_pf_to_port helper

Even though ENETC interfaces are exposed as individual PCIe PFs with
their own driver instances, the ENETC is still fundamentally a
multi-port Ethernet controller, and some parts of the IP take a port
number (as can be seen in the PSFP implementation).

Create a common helper that can be used outside of the TSN code for
retrieving the ENETC port number based on the PF number. This is only
correct for LS1028A, the only Linux-capable instantiation of ENETC thus
far.

Note that ENETC port 3 is PF 6. The TSN code did not care about this
because ENETC port 3 does not support TSN, so the wrong mapping done by
enetc_get_port for PF 6 could have never been hit.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7eab503b 16-Apr-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: use dedicated TX rings for XDP

It is possible for one CPU to perform TX hashing (see netdev_pick_tx)
between the 8 ENETC TX rings, and the TX hashing to select TX queue 1.

At the same time, it is possible for the other CPU to already use TX
ring 1 for XDP (either XDP_TX or XDP_REDIRECT). Since there is no mutual
exclusion between XDP and the network stack, we run into an issue
because the ENETC TX procedure is not reentrant.

The obvious approach would be to just make XDP take the lock of the
network stack's TX queue corresponding to the ring it's about to enqueue
in.

For XDP_REDIRECT, this is quite straightforward, a lock at the beginning
and end of enetc_xdp_xmit() should do the trick.

But for XDP_TX, it's a bit more complicated. For one, we do TX batching
all by ourselves for frames with the XDP_TX verdict. This is something
we would like to keep the way it is, for performance reasons. But
batching means that the network stack's lock should be kept from the
first enqueued XDP_TX frame and until we ring the doorbell. That is
mostly fine, except for cases when in the same NAPI loop we have mixed
XDP_TX and XDP_REDIRECT frames. So if enetc_xdp_xmit() gets called while
we are holding the lock from the RX NAPI, then bam, deadlock. The naive
answer could be 'just flush the XDP_TX frames first, then release the
network stack's TX queue lock, then call xdp_do_flush_map()'. But even
xdp_do_redirect() is capable of flushing the batched XDP_REDIRECT
frames, so unless we unlock/relock the TX queue around xdp_do_redirect(),
there simply isn't any clean way to protect XDP_TX from concurrent
network stack .ndo_start_xmit() on another CPU.

So we need to take a different approach, and that is to reserve two
rings for the sole use of XDP. We leave TX rings
0..ndev->real_num_tx_queues-1 to be handled by the network stack, and we
pick them from the end of the priv->tx_ring array.

We make an effort to keep the mapping done by enetc_alloc_msix() which
decides which CPU handles the TX completions of which TX ring in its
NAPI poll. So the XDP TX ring of CPU 0 is handled by TX ring 6, and the
XDP TX ring of CPU 1 is handled by TX ring 7.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ee3e875f 16-Apr-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: increase TX ring size

Now that commit d6a2829e82cf ("net: enetc: increase RX ring default
size") has increased the RX ring size, it is quite easy to congest the
TX rings when the traffic is predominantly XDP_TX, as the RX ring is
quite a bit larger than the TX one.

Since we bit the bullet and did the expensive thing already (larger RX
rings consume more memory pages), it seems quite foolish to keep the TX
rings small. So make them equally sized with TX.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7294380c 12-Apr-2021 Yangbo Lu <yangbo.lu@nxp.com>

enetc: support PTP Sync packet one-step timestamping

This patch is to add support for PTP Sync packet one-step timestamping.
Since ENETC single-step register has to be configured dynamically per
packet for correctionField offeset and UDP checksum update, current
one-step timestamping packet has to be sent only when the last one
completes transmitting on hardware. So, on the TX, this patch handles
one-step timestamping packet as below:

- Trasmit packet immediately if no other one in transfer, or queue to
skb queue if there is already one in transfer.
The test_and_set_bit_lock() is used here to lock and check state.
- Start a work when complete transfer on hardware, to release the bit
lock and to send one skb in skb queue if has.

And the configuration for one-step timestamping on ENETC before
transmitting is,

- Set one-step timestamping flag in extension BD.
- Write 30 bits current timestamp in tstamp field of extension BD.
- Update PTP Sync packet originTimestamp field with current timestamp.
- Configure single-step register for correctionField offeset and UDP
checksum update.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f768e751 12-Apr-2021 Yangbo Lu <yangbo.lu@nxp.com>

enetc: mark TX timestamp type per skb

Mark TX timestamp type per skb on skb->cb[0], instead of
global variable for all skbs. This is a preparation for
one step timestamp support.

For one-step timestamping enablement, there will be both
one-step and two-step PTP messages to transfer. And a skb
queue is needed for one-step PTP messages making sure
start to send current message only after the last one
completed on hardware. (ENETC single-step register has to
be dynamically configured per message.) So, marking TX
timestamp type per skb is required.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9d2b68cc 31-Mar-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: add support for XDP_REDIRECT

The driver implementation of the XDP_REDIRECT action reuses parts from
XDP_TX, most notably the enetc_xdp_tx function which transmits an array
of TX software BDs. Only this time, the buffers don't have DMA mappings,
we need to create them.

When a BPF program reaches the XDP_REDIRECT verdict for a frame, we can
employ the same buffer reuse strategy as for the normal processing path
and for XDP_PASS: we can flip to the other page half and seed that to
the RX ring.

Note that scatter/gather support is there, but disabled due to lack of
multi-buffer support in XDP (which is added by this series):
https://patchwork.kernel.org/project/netdevbpf/cover/cover.1616179034.git.lorenzo@kernel.org/

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d6a2829e 31-Mar-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: increase RX ring default size

As explained in the XDP_TX patch, when receiving a burst of frames with
the XDP_TX verdict, there is a momentary dip in the number of available
RX buffers. The system will eventually recover as TX completions will
start kicking in and refilling our RX BD ring again. But until that
happens, we need to survive with as few out-of-buffer discards as
possible.

This increases the memory footprint of the driver in order to avoid
discards at 2.5Gbps line rate 64B packet sizes, the maximum speed
available for testing on 1 port on NXP LS1028A.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7ed2bc80 31-Mar-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: add support for XDP_TX

For reflecting packets back into the interface they came from, we create
an array of TX software BDs derived from the RX software BDs. Therefore,
we need to extend the TX software BD structure to contain most of the
stuff that's already present in the RX software BD structure, for
reasons that will become evident in a moment.

For a frame with the XDP_TX verdict, we don't reuse any buffer right
away as we do for XDP_DROP (the same page half) or XDP_PASS (the other
page half, same as the skb code path).

Because the buffer transfers ownership from the RX ring to the TX ring,
reusing any page half right away is very dangerous. So what we can do is
we can recycle the same page half as soon as TX is complete.

The code path is:
enetc_poll
-> enetc_clean_rx_ring_xdp
-> enetc_xdp_tx
-> enetc_refill_rx_ring
(time passes, another MSI interrupt is raised)
enetc_poll
-> enetc_clean_tx_ring
-> enetc_recycle_xdp_tx_buff

But that creates a problem, because there is a potentially large time
window between enetc_xdp_tx and enetc_recycle_xdp_tx_buff, period in
which we'll have less and less RX buffers.

Basically, when the ship starts sinking, the knee-jerk reaction is to
let enetc_refill_rx_ring do what it does for the standard skb code path
(refill every 16 consumed buffers), but that turns out to be very
inefficient. The problem is that we have no rx_swbd->page at our
disposal from the enetc_reuse_page path, so enetc_refill_rx_ring would
have to call enetc_new_page for every buffer that we refill (if we
choose to refill at this early stage). Very inefficient, it only makes
the problem worse, because page allocation is an expensive process, and
CPU time is exactly what we're lacking.

Additionally, there is an even bigger problem: if we let
enetc_refill_rx_ring top up the ring's buffers again from the RX path,
remember that the buffers sent to transmission haven't disappeared
anywhere. They will be eventually sent, and processed in
enetc_clean_tx_ring, and an attempt will be made to recycle them.
But surprise, the RX ring is already full of new buffers, because we
were premature in deciding that we should refill. So not only we took
the expensive decision of allocating new pages, but now we must throw
away perfectly good and reusable buffers.

So what we do is we implement an elastic refill mechanism, which keeps
track of the number of in-flight XDP_TX buffer descriptors. We top up
the RX ring only up to the total ring capacity minus the number of BDs
that are in flight (because we know that those BDs will return to us
eventually).

The enetc driver manages 1 RX ring per CPU, and the default TX ring
management is the same. So we do XDP_TX towards the TX ring of the same
index, because it is affined to the same CPU. This will probably not
produce great results when we have a tc-taprio/tc-mqprio qdisc on the
interface, because in that case, the number of TX rings might be
greater, but I didn't add any checks for that yet (mostly because I
didn't know what checks to add).

It should also be noted that we need to change the DMA mapping direction
for RX buffers, since they may now be reflected into the TX ring of the
same device. We choose to use DMA_BIDIRECTIONAL instead of unmapping and
remapping as DMA_TO_DEVICE, because performance is better this way.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d1b15102 31-Mar-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: add support for XDP_DROP and XDP_PASS

For the RX ring, enetc uses an allocation scheme based on pages split
into two buffers, which is already very efficient in terms of preventing
reallocations / maximizing reuse, so I see no reason why I would change
that.

+--------+--------+--------+--------+--------+--------+--------+
| | | | | | | |
| half B | half B | half B | half B | half B | half B | half B |
| | | | | | | |
+--------+--------+--------+--------+--------+--------+--------+
| | | | | | | |
| half A | half A | half A | half A | half A | half A | half A | RX ring
| | | | | | | |
+--------+--------+--------+--------+--------+--------+--------+
^ ^
| |
next_to_clean next_to_alloc
next_to_use

+--------+--------+--------+--------+--------+
| | | | | |
| half B | half B | half B | half B | half B |
| | | | | |
+--------+--------+--------+--------+--------+--------+--------+
| | | | | | | |
| half B | half B | half A | half A | half A | half A | half A | RX ring
| | | | | | | |
+--------+--------+--------+--------+--------+--------+--------+
| | | ^ ^
| half A | half A | | |
| | | next_to_clean next_to_use
+--------+--------+
^
|
next_to_alloc

then when enetc_refill_rx_ring is called, whose purpose is to advance
next_to_use, it sees that it can take buffers up to next_to_alloc, and
it says "oh, hey, rx_swbd->page isn't NULL, I don't need to allocate
one!".

The only problem is that for default PAGE_SIZE values of 4096, buffer
sizes are 2048 bytes. While this is enough for normal skb allocations at
an MTU of 1500 bytes, for XDP it isn't, because the XDP headroom is 256
bytes, and including skb_shared_info and alignment, we end up being able
to make use of only 1472 bytes, which is insufficient for the default
MTU.

To solve that problem, we implement scatter/gather processing in the
driver, because we would really like to keep the existing allocation
scheme. A packet of 1500 bytes is received in a buffer of 1472 bytes and
another one of 28 bytes.

Because the headroom required by XDP is different (and much larger) than
the one required by the network stack, whenever a BPF program is added
or deleted on the port, we drain the existing RX buffers and seed new
ones with the required headroom. We also keep the required headroom in
rx_ring->buffer_offset.

The simplest way to implement XDP_PASS, where an skb must be created, is
to create an xdp_buff based on the next_to_clean RX BDs, but not clear
those BDs from the RX ring yet, just keep the original index at which
the BDs for this frame started. Then, if the verdict is XDP_PASS,
instead of converting the xdb_buff to an skb, we replay a call to
enetc_build_skb (just as in the normal enetc_clean_rx_ring case),
starting from the original BD index.

We would also like to be minimally invasive to the regular RX data path,
and not check whether there is a BPF program attached to the ring on
every packet. So we create a separate RX ring processing function for
XDP.

Because we only install/remove the BPF program while the interface is
down, we forgo the rcu_read_lock() in enetc_clean_rx_ring, since there
shouldn't be any circumstance in which we are processing packets and
there is a potentially freed BPF program attached to the RX ring.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d504498d 31-Mar-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: add a dedicated is_eof bit in the TX software BD

In the transmit path, if we have a scatter/gather frame, it is put into
multiple software buffer descriptors, the last of which has the skb
pointer populated (which is necessary for rearming the TX MSI vector and
for collecting the two-step TX timestamp from the TX confirmation path).

At the moment, this is sufficient, but with XDP_TX, we'll need to
service TX software buffer descriptors that don't have an skb pointer,
however they might be final nonetheless. So add a dedicated bit for
final software BDs that we populate and check explicitly. Also, we keep
looking just for an skb when doing TX timestamping, because we don't
want/need that for XDP.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7f071a45 10-Mar-2021 Vladimir Oltean <olteanv@gmail.com>

net: enetc: use enum enetc_active_offloads

The active_offloads variable of enetc_ndev_priv has an enum type, use it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c027aa92 10-Mar-2021 Vladimir Oltean <olteanv@gmail.com>

net: enetc: simplify callers of enetc_rxbd_next

When we iterate through the BDs in the RX ring, the software producer
index (which is already passed by value to enetc_rxbd_next) lags behind,
and we end up with this funny looking "++i == rx_ring->bd_count" check
so that we drag it after us.

Let's pass the software producer index "i" by reference, so that
enetc_rxbd_next can increment it by itself (mod rx_ring->bd_count),
especially since enetc_rxbd_next has to increment the index anyway.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5b4daa7f 10-Mar-2021 Vladimir Oltean <olteanv@gmail.com>

net: enetc: pass bd_count as an argument to enetc_setup_cbdr

It makes no sense from an API perspective to first initialize some
portion of struct enetc_cbdr outside enetc_setup_cbdr, then leave that
function to initialize the rest. enetc_setup_cbdr should be able to
perform all initialization given a zero-initialized struct enetc_cbdr.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0bfde022 10-Mar-2021 Vladimir Oltean <olteanv@gmail.com>

net: enetc: squash clear_cbdr and free_cbdr into teardown_cbdr

All call sites call enetc_clear_cbdr and enetc_free_cbdr one after
another, so let's combine the two functions into a single method named
enetc_teardown_cbdr which does both, and in the same order.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 27f9025d 10-Mar-2021 Vladimir Oltean <olteanv@gmail.com>

net: enetc: save the mode register address inside struct enetc_cbdr

enetc_clear_cbdr depends on struct enetc_hw because it must disable the
ring through a register write. We'd like to remove that dependency, so
let's do what's already done with the producer and consumer indices,
which is to save the iomem address in a variable kept in struct enetc_cbdr.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 24be14e3 10-Mar-2021 Vladimir Oltean <olteanv@gmail.com>

net: enetc: squash enetc_alloc_cbdr and enetc_setup_cbdr

enetc_alloc_cbdr and enetc_setup_cbdr are always called one after
another, so we can simplify the callers and make enetc_setup_cbdr do
everything that's needed.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 01121ab7 10-Mar-2021 Vladimir Oltean <olteanv@gmail.com>

net: enetc: save the DMA device for enetc_free_cbdr

We shouldn't need to pass the struct device *dev to enetc CBDR APIs over
and over again, so save this inside struct enetc_cbdr::dma_dev and avoid
calling it from the enetc_free_cbdr functions.

This breaks the dependency of the cbdr API from struct enetc_si (the
station interface).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3222b5b6 01-Mar-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: initialize RFS/RSS memories for unused ports too

Michael reports that since linux-next-20210211, the AER messages for ECC
errors have started reappearing, and this time they can be reliably
reproduced with the first ping on one of his LS1028A boards.

$ ping 1[ 33.258069] pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0
72.16.0.1
PING [ 33.267050] pcieport 0000:00:1f.0: AER: can't find device of ID0000
172.16.0.1 (172.16.0.1): 56 data bytes
64 bytes from 172.16.0.1: seq=0 ttl=64 time=17.124 ms
64 bytes from 172.16.0.1: seq=1 ttl=64 time=0.273 ms

$ devmem 0x1f8010e10 32
0xC0000006

It isn't clear why this is necessary, but it seems that for the errors
to go away, we must clear the entire RFS and RSS memory, not just for
the ports in use.

Sadly the code is structured in such a way that we can't have unified
logic for the used and unused ports. For the minimal initialization of
an unused port, we need just to enable and ioremap the PF memory space,
and a control buffer descriptor ring. Unused ports must then free the
CBDR because the driver will exit, but used ports can not pick up from
where that code path left, since the CBDR API does not reinitialize a
ring when setting it up, so its producer and consumer indices are out of
sync between the software and hardware state. So a separate
enetc_init_unused_port function was created, and it gets called right
after the PF memory space is enabled.

Fixes: 07bf34a50e32 ("net: enetc: initialize the RFS and RSS memories")
Reported-by: Michael Walle <michael@walle.cc>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c646d10d 01-Mar-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: enetc: don't overwrite the RSS indirection table when initializing

After the blamed patch, all RX traffic gets hashed to CPU 0 because the
hashing indirection table set up in:

enetc_pf_probe
-> enetc_alloc_si_resources
-> enetc_configure_si
-> enetc_setup_default_rss_table

is overwritten later in:

enetc_pf_probe
-> enetc_init_port_rss_memory

which zero-initializes the entire port RSS table in order to avoid ECC errors.

The trouble really is that enetc_init_port_rss_memory really neads
enetc_alloc_si_resources to be called, because it depends upon
enetc_alloc_cbdr and enetc_setup_cbdr. But that whole enetc_configure_si
thing could have been better thought out, it has nothing to do in a
function called "alloc_si_resources", especially since its counterpart,
"free_si_resources", does nothing to unwind the configuration of the SI.

The point is, we need to pull out enetc_configure_si out of
enetc_alloc_resources, and move it after enetc_init_port_rss_memory.
This allows us to set up the default RSS indirection table after
initializing the memory.

Fixes: 07bf34a50e32 ("net: enetc: initialize the RFS and RSS memories")
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 82728b91 03-Nov-2020 Claudiu Manoil <claudiu.manoil@nxp.com>

enetc: Remove Tx checksumming offload code

Tx checksumming has been defeatured and completely removed
from the h/w reference manual. Made a little cleanup for the
TSE case as this is complementary code.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201103140213.3294-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 71b77a7a 06-Oct-2020 Claudiu Manoil <claudiu.manoil@nxp.com>

enetc: Migrate to PHYLINK and PCS_LYNX

This is a methodical transition of the driver from phylib
to phylink, following the guidelines from sfp-phylink.rst.
The MAC register configurations based on interface mode
were moved from the probing path to the mac_config() hook.
MAC enable and disable commands (enabling Rx and Tx paths
at MAC level) were also extracted and assigned to their
corresponding phylink hooks.
As part of the migration to phylink, the serdes configuration
from the driver was offloaded to the PCS_LYNX module,
introduced in commit 0da4c3d393e4 ("net: phy: add Lynx PCS module"),
the PCS_LYNX module being a mandatory component required to
make the enetc driver work with phylink.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.cionei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ae0e6a5d 21-Jul-2020 Claudiu Manoil <claudiu.manoil@nxp.com>

enetc: Add adaptive interrupt coalescing

Use the generic dynamic interrupt moderation (dim)
framework to implement adaptive interrupt coalescing
on Rx. With the per-packet interrupt scheme, a high
interrupt rate has been noted for moderate traffic flows
leading to high CPU utilization. The 'dim' scheme
implemented by the current patch addresses this issue
improving CPU utilization while using minimal coalescing
time thresholds in order to preserve a good latency.
On the Tx side use an optimal time threshold value by
default. This value has been optimized for Tx TCP
streams at a rate of around 85kpps on a 1G link,
at which rate half of the Tx ring size (128) gets filled
in 1500 usecs. Scaling this down to 2.5G links yields
the current value of 600 usecs, which is conservative
and gives good enough results for 1G links too (see
next).

Below are some measurement results for before and after
this patch (and related dependencies) basically, for a
2 ARM Cortex-A72 @1.3Ghz CPUs system (32 KB L1 data cache),
using 60secs log netperf TCP stream tests @ 1Gbit link
(maximum throughput):

1) 1 Rx TCP flow, both Rx and Tx processed by the same NAPI
thread on the same CPU:
CPU utilization int rate (ints/sec)
Before: 50%-60% (over 50%) 92k
After: 13%-22% 3.5k-12k
Comment: Major CPU utilization improvement for a single flow
Rx TCP flow (i.e. netperf -t TCP_MAERTS) on a single
CPU. Usually settles under 16% for longer tests.

2) 4 Rx TCP flows + 4 Tx TCP flows (+ pings to check the latency):
Total CPU utilization Total int rate (ints/sec)
Before: ~80% (spikes to 90%) ~100k
After: 60% (more steady) ~4k
Comment: Important improvement for this load test, while the
ping test outcome does not show any notable
difference compared to before.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 91571081 21-Jul-2020 Claudiu Manoil <claudiu.manoil@nxp.com>

enetc: Add interrupt coalescing support

Enable programming of the interrupt coalescing registers
and allow manual configuration of the coalescing time
thresholds via ethtool. Packet thresholds have been fixed
to predetermined values as there's no point in making them
run-time configurable, also anticipating the dynamic interrupt
moderation (DIM) algorithm which uses fixed packet thresholds
as well. If the interface is up when the operation mode of
traffic interrupt events is changed by the user (i.e. switching
from default per-packet interrupts to coalesced interrupts),
the traffic needs to be paused in the process.
This patch also prepares the ground for introducing DIM on Rx.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 058d9cfa 21-Jul-2020 Claudiu Manoil <claudiu.manoil@nxp.com>

enetc: Drop redundant ____cacheline_aligned_in_smp

'struct enetc_bdr' is already '____cacheline_aligned_in_smp'.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 02293dd4 21-Jul-2020 Claudiu Manoil <claudiu.manoil@nxp.com>

enetc: Refine buffer descriptor ring sizes

It's time to differentiate between Rx and Tx ring sizes.
Not only Tx rings are processed differently than Rx rings,
but their default number also differs - i.e. up to 8 Tx rings
per device (8 traffic classes) vs. 2 Rx rings (one per CPU).
So let's set Tx rings sizes to half the size of the Rx rings
for now, to be conservative.
The default ring sizes were decreased as well (to the next
lower power of 2), to reduce the memory footprint, buffering
etc., since the measurements I've made so far show that the
rings are very unlikely to get full.
This change also anticipates the introduction of the
dynamic interrupt moderation (dim) algorithm which operates
on maximum packet thresholds of 256 packets for Rx and 128
packets for Tx.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 888ae5a3 30-Apr-2020 Po Liu <Po.Liu@nxp.com>

net: enetc: add tc flower psfp offload driver

This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP)
function. There are four main feature parts to implement the flow
policing and filtering for ingress flow with IEEE 802.1Qci features.
They are stream identify(this is defined in the P802.1cb exactly but
needed for 802.1Qci), stream filtering, stream gate and flow metering.
Each function block includes many entries by index to assign parameters.
So for one frame would be filtered by stream identify first, then
flow into stream filter block by the same handle between stream identify
and stream filtering. Then flow into stream gate control which assigned
by the stream filtering entry. And then policing by the gate and limited
by the max sdu in the filter block(optional). At last, policing by the
flow metering block, index choosing at the fitering block.
So you can see that each entry of block may link to many upper entries
since they can be assigned same index means more streams want to share
the same feature in the stream filtering or stream gate or flow
metering.
To implement such features, each stream filtered by source/destination
mac address, some stream maybe also plus the vlan id value would be
treated as one flow chain. This would be identified by the chain_index
which already in the tc filter concept. Driver would maintain this chain
and also with gate modules. The stream filter entry create by the gate
index and flow meter(optional) entry id and also one priority value.
Offloading only transfer the gate action and flow filtering parameters.
Driver would create (or search same gate id and flow meter id and
priority) one stream filter entry to set to the hardware. So stream
filtering do not need transfer by the action offloading.
This architecture is same with tc filter and actions relationship. tc
filter maintain the list for each flow feature by keys. And actions
maintain by the action list.

Below showing a example commands by tc:
> tc qdisc add dev eth0 ingress
> ip link set eth0 address 10:00:80:00:00:00
> tc filter add dev eth0 parent ffff: protocol ip chain 11 \
flower skip_sw dst_mac 10:00:80:00:00:00 \
action gate index 10 \
sched-entry open 200000000 1 8000000 \
sched-entry close 100000000 -1 -1

Command means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream
identify module. Then setting the gate index 10 of stream gate module.
Keep the gate open for 200ms and limit the traffic volume to 8MB in this
sched-entry. Then direct the frames to the ingress queue 1.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 79e49982 30-Apr-2020 Po Liu <Po.Liu@nxp.com>

net: enetc: add hw tc hw offload features for PSPF capability

This patch is to let ethtool enable/disable the tc flower offload
features. Hardware ENETC has the feature of PSFP which is for per-stream
policing. When enable the tc hw offloading feature, driver would enable
the IEEE 802.1Qci feature. It is only set the register enable bit for
this feature not enable for any entry of per stream filtering and stream
gate or stream identify but get how much capabilities for each feature.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 434cebab 10-Mar-2020 Claudiu Manoil <claudiu.manoil@nxp.com>

enetc: Add dynamic allocation of extended Rx BD rings

Hardware timestamping support (PTP) on Rx requires extended
buffer descriptors, double the size of normal Rx descriptors.
On the current controller revision only the timestamping offload
requires extended Rx descriptors.
Since Rx timestamping can be turned on/off at runtime, make Rx ring
allocation configurable at runtime too. As a result, the static
config option FSL_ENETC_HW_TIMESTAMPING can be dropped and the
extended descriptors can be used only when Rx timestamping gets
activated.
The extension has the same size as the base descriptor, making
the descriptor iterators easy to update for the extended case.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 714239ac 10-Mar-2020 Claudiu Manoil <claudiu.manoil@nxp.com>

enetc: Clean up Rx BD iteration

Improve maintainability of the code iterating the Rx buffer
descriptors to prepare it to support iterating extended Rx BD
descriptors as well.
Don't increment by one the h/w descriptor pointers explicitly,
provide an iterator that takes care of the h/w details.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cc5b48b5 24-Feb-2020 Gustavo A. R. Silva <gustavo@embeddedor.com>

freescale: Replace zero-length array with flexible-array member

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0d08c9ec 01-Jan-2020 Po Liu <po.liu@nxp.com>

enetc: add support time specific departure base on the qos etf

ENETC implement time specific departure capability, which enables
the user to specify when a frame can be transmitted. When this
capability is enabled, the device will delay the transmission of
the frame so that it can be transmitted at the precisely specified time.
The delay departure time up to 0.5 seconds in the future. If the
departure time in the transmit BD has not yet been reached, based
on the current time, the packet will not be transmitted.

This driver was loaded by Qos driver ETF. User could load it by tc
commands. Here are the example commands:

tc qdisc add dev eth0 root handle 1: mqprio \
num_tc 8 map 0 1 2 3 4 5 6 7 hw 1
tc qdisc replace dev eth0 parent 1:8 etf \
clockid CLOCK_TAI delta 30000 offload

These example try to set queue mapping first and then set queue 7
with 30us ahead dequeue time.

Then user send test frame should set SO_TXTIME feature for socket.

There are also some limitations for this feature in hardware:
- Transmit checksum offloads and time specific departure operation
are mutually exclusive.
- Time Aware Shaper feature (Qbv) offload and time specific departure
operation are mutually exclusive.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c431047c 24-Nov-2019 Po Liu <po.liu@nxp.com>

enetc: add support Credit Based Shaper(CBS) for hardware offload

The ENETC hardware support the Credit Based Shaper(CBS) which part
of the IEEE-802.1Qav. The CBS driver was loaded by the sch_cbs
interface when set in the QOS in the kernel.

Here is an example command to set 20Mbits bandwidth in 1Gbits port
for taffic class 7:

tc qdisc add dev eth0 root handle 1: mqprio \
num_tc 8 map 0 1 2 3 4 5 6 7 hw 1

tc qdisc replace dev eth0 parent 1:8 cbs \
locredit -1470 hicredit 30 \
sendslope -980000 idleslope 20000 offload 1

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2e47cb41 14-Nov-2019 Po Liu <po.liu@nxp.com>

enetc: update TSN Qbv PSPEED set according to adjust link speed

ENETC has a register PSPEED to indicate the link speed of hardware.
It is need to update accordingly. PSPEED field needs to be updated
with the port speed for QBV scheduling purposes. Or else there is
chance for gate slot not free by frame taking the MAC if PSPEED and
phy speed not match. So update PSPEED when link adjust. This is
implement by the adjust_link.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 34c6adf1 14-Nov-2019 Po Liu <po.liu@nxp.com>

enetc: Configure the Time-Aware Scheduler via tc-taprio offload

ENETC supports in hardware for time-based egress shaping according
to IEEE 802.1Qbv. This patch implement the Qbv enablement by the
hardware offload method qdisc tc-taprio method.
Also update cbdr writeback to up level since control bd ring may
writeback data to control bd ring.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cbe9e835 27-May-2019 Camelia Groza <camelia.groza@nxp.com>

enetc: Enable TC offloading with mqprio

Add support to configure multiple prioritized TX traffic
classes with mqprio.

Configure one BD ring per TC for the moment, one netdev
queue per TC.

Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 41514737 22-May-2019 Y.b. Lu <yangbo.lu@nxp.com>

enetc: add get_ts_info interface for ethtool

This patch is to add get_ts_info interface for ethtool
to support getting timestamping capability.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d3982312 22-May-2019 Y.b. Lu <yangbo.lu@nxp.com>

enetc: add hardware timestamping support

This patch is to add hardware timestamping support
for ENETC. On Rx, timestamping is enabled for all
frames. On Tx, we only instruct the hardware to
timestamp the frames marked accordingly by the stack.

Because the RX BD ring dynamic allocation has not been
supported and it is too expensive to use extended RX BDs
if timestamping is not used, a Kconfig option is used to
enable extended RX BDs in order to support hardware
timestamping. This option will be removed once RX BD
ring dynamic allocation is implemented.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d382563f 22-Jan-2019 Claudiu Manoil <claudiu.manoil@nxp.com>

enetc: Add RFS and RSS support

A ternary match table is used for RFS. If multiple entries in the table
match, the entry with the lowest numerical values index is chosen as the
matching entry. Entries in the table are identified using an index
which takes a value from 0 to PRFSCAPR[NUM_RFS]-1 when accessed by the
PSI (PF).
Portions of the RFS table can be assigned to each SI by the PSI (PF)
driver in PSIaRFSCFGR. Assignments are cumulative, the entries assigned
to SIn start after those assigned to SIn-1. The total assignments to
all SIs must be equal to or less than the number available to the port
as found in PRFSCAPR.

For RSS, the Toeplitz hash function used requires two inputs, a 40B
random secret key that is supplied through the PRSSKR0-9 registers as well
as the relevant pieces of the packet header (n-tuple). The 6 LSB bits of
the hash function result will then be used as a pointer to obtain the tag
referenced in the 64 entry indirection table. The result will provide a
winning group which will be used to help route the received packet.

Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# beb74ac8 22-Jan-2019 Claudiu Manoil <claudiu.manoil@nxp.com>

enetc: Add vf to pf messaging support

VSIs (VFs) may send a message to the PSI (PF) for general notification
or to gain access to hardware resources which requires host inspection.
These messages may vary in size and are handled as a partition copy
between two memory regions owned by the respective participants.
The PSI will respond with fail or success and a 16-bit message code.
The patch implements the vf to pf messaging mechanism above and, as the
first application making use of this support, it enables the VF to
configure its own primary MAC address.

Signed-off-by: Catalin Horghidan <catalin.horghidan@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d4fd0404 22-Jan-2019 Claudiu Manoil <claudiu.manoil@nxp.com>

enetc: Introduce basic PF and VF ENETC ethernet drivers

ENETC is a multi-port virtualized Ethernet controller supporting GbE
designs and Time-Sensitive Networking (TSN) functionality.
ENETC is operating as an SR-IOV multi-PF capable Root Complex Integrated
Endpoint (RCIE). As such, it contains multiple physical (PF) and
virtual (VF) PCIe functions, discoverable by standard PCI Express.

Introduce basic PF and VF ENETC ethernet drivers. The PF has access to
the ENETC Port registers and resources and makes the required privileged
configurations for the underlying VF devices. Common functionality is
controlled through so called System Interface (SI) register blocks, PFs
and VFs own a SI each. Though SI register blocks are almost identical,
there are a few privileged SI level controls that are accessible only to
PFs, and so the distinction is made between PF SIs (PSI) and VF SIs (VSI).
As such, the bulk of the code, including datapath processing, basic h/w
offload support and generic pci related configuration, is shared between
the 2 drivers and is factored out in common source files (i.e. enetc.c).

Major functionalities included (for both drivers):
MSI-X support for Rx and Tx processing, assignment of Rx/Tx BD ring pairs
to MSI-X entries, multi-queue support, Rx S/G (Rx frame fragmentation) and
jumbo frame (up to 9600B) support, Rx paged allocation and reuse, Tx S/G
support (NETIF_F_SG), Rx and Tx checksum offload, PF MAC filtering and
initial control ring support, VLAN extraction/ insertion, PF Rx VLAN
CTAG filtering, VF mac address config support, VF VLAN isolation support,
etc.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>