History log of /linux-master/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
Revision Date Author Comments
# 7f525acb 14-Feb-2024 Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Support per-mdev queue counter

Each queue counter object counts some events (in hardware) for the RQs
that are attached to it, like events of packet drops due to no receive
WQE (rx_out_of_buffer).

Each RQ can be attached to a queue counter only within the same vhca. To
still cover all RQs with these counters, we create multiple instances,
one per vhca.

The result that's shown to the user is now the sum of all instances.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f25e7b82 09-Feb-2024 Joe Damato <jdamato@fastly.com>

net/mlx5e: link NAPI instances to queues and IRQs

Make mlx5 compatible with the newly added netlink queue GET APIs.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240209202312.30181-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 90502d43 08-Feb-2024 Rahul Rameshbabu <rrameshbabu@nvidia.com>

net/mlx5e: Switch to using _bh variant of of spinlock API in port timestamping NAPI poll context

The NAPI poll context is a softirq context. Do not use normal spinlock API
in this context to prevent concurrency issues.

Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
CC: Vadim Fedorenko <vadfed@meta.com>


# 3876638b 22-Nov-2023 Rahul Rameshbabu <rrameshbabu@nvidia.com>

net/mlx5e: Fix operation precedence bug in port timestamping napi_poll context

Indirection (*) is of lower precedence than postfix increment (++). Logic
in napi_poll context would cause an out-of-bound read by first increment
the pointer address by byte address space and then dereference the value.
Rather, the intended logic was to dereference first and then increment the
underlying value.

Fixes: 92214be5979c ("net/mlx5e: Update doorbell for port timestamping CQ before the software counter")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3fbf6120 07-Jan-2024 Jakub Kicinski <kuba@kernel.org>

Revert "mlx5 updates 2023-12-20"

Revert "net/mlx5: Implement management PF Ethernet profile"
This reverts commit 22c4640698a1d47606b5a4264a584e8046641784.
Revert "net/mlx5: Enable SD feature"
This reverts commit c88c49ac9c18fb7c3fa431126de1d8f8f555e912.
Revert "net/mlx5e: Block TLS device offload on combined SD netdev"
This reverts commit 83a59ce0057b7753d7fbece194b89622c663b2a6.
Revert "net/mlx5e: Support per-mdev queue counter"
This reverts commit d72baceb92539a178d2610b0e9ceb75706a75b55.
Revert "net/mlx5e: Support cross-vhca RSS"
This reverts commit c73a3ab8fa6e93a783bd563938d7cf00d62d5d34.
Revert "net/mlx5e: Let channels be SD-aware"
This reverts commit e4f9686bdee7b4dd89e0ed63cd03606e4bda4ced.
Revert "net/mlx5e: Create EN core HW resources for all secondary devices"
This reverts commit c4fb94aa822d6c9d05fc3c5aee35c7e339061dc1.
Revert "net/mlx5e: Create single netdev per SD group"
This reverts commit e2578b4f983cfcd47837bbe3bcdbf5920e50b2ad.
Revert "net/mlx5: SD, Add informative prints in kernel log"
This reverts commit c82d360325112ccc512fc11a3b68cdcdf04a1478.
Revert "net/mlx5: SD, Implement steering for primary and secondaries"
This reverts commit 605fcce33b2d1beb0139b6e5913fa0b2062116b2.
Revert "net/mlx5: SD, Implement devcom communication and primary election"
This reverts commit a45af9a96740873db9a4b5bb493ce2ad81ccb4d5.
Revert "net/mlx5: SD, Implement basic query and instantiation"
This reverts commit 63b9ce944c0e26c44c42cdd5095c2e9851c1a8ff.
Revert "net/mlx5: SD, Introduce SD lib"
This reverts commit 4a04a31f49320d078b8078e1da4b0e2faca5dfa3.
Revert "net/mlx5: Fix query of sd_group field"
This reverts commit e04984a37398b3f4f5a79c993b94c6b1224184cc.
Revert "net/mlx5e: Use the correct lag ports number when creating TISes"
This reverts commit a7e7b40c4bc115dbf2a2bb453d7bbb2e0ea99703.

There are some unanswered questions on the list, and we don't
have any docs. Given the lack of replies so far and the fact
that v6.8 merge window has started - let's revert this and
revisit for v6.9.

Link: https://lore.kernel.org/all/20231221005721.186607-1-saeed@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# d72baceb 08-Aug-2023 Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Support per-mdev queue counter

Each queue counter object counts some events (in hardware) for the RQs
that are attached to it, like events of packet drops due to no receive
WQE (rx_out_of_buffer).

Each RQ can be attached to a queue counter only within the same vhca. To
still cover all RQs with these counters, we create multiple instances,
one per vhca.

The result that's shown to the user is now the sum of all instances.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# db52aa6d 04-Aug-2023 Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Decouple CQ from priv

Make CQ struct and methods independent of "priv", use more basic
arguments instead.
This will ease the transition to netdev with multiple mdevs.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b25bd37c 06-Aug-2023 Tariq Toukan <tariqt@nvidia.com>

net/mlx5: Move TISes from priv to mdev HW resources

The transport interface send (TIS) object is responsible for performing
all transport related operations of the transmit side. Messages from
Send Queues get segmented and transmitted by the TIS including all
transport required implications, e.g. in the case of large send offload,
the TIS is responsible for the segmentation.

These are stateless objects and can be used by multiple netdevs (e.g.
representors) who share the same core device.

Providing the TISes as a service from the core layer to the netdev layer
reduces the number of replecated TIS objects (in case of multiple
netdevs), and will ease the transition to netdev with multiple mdevs.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 92214be5 14-Nov-2023 Rahul Rameshbabu <rrameshbabu@nvidia.com>

net/mlx5e: Update doorbell for port timestamping CQ before the software counter

Previously, mlx5e_ptp_poll_ts_cq would update the device doorbell with the
incremented consumer index after the relevant software counters in the
kernel were updated. In the mlx5e_sq_xmit_wqe context, this would lead to
either overrunning the device CQ or exceeding the expected software buffer
size in the device CQ if the device CQ size was greater than the software
buffer size. Update the relevant software counter only after updating the
device CQ consumer index in the port timestamping napi_poll context.

Log:
mlx5_core 0000:08:00.0: cq_err_event_notifier:517:(pid 0): CQ error on CQN 0x487, syndrome 0x1
mlx5_core 0000:08:00.0 eth2: mlx5e_cq_error_event: cqn=0x000487 event=0x04

Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-12-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 53b836a4 08-Aug-2023 Rahul Rameshbabu <rrameshbabu@nvidia.com>

net/mlx5e: Add recovery flow for tx devlink health reporter for unhealthy PTP SQ

A new check for the tx devlink health reporter is introduced for
determining when the PTP port timestamping SQ is considered unhealthy. If
there are enough CQEs considered never to be delivered, the space that can
be utilized on the SQ decreases significantly, impacting performance and
usability of the SQ. The health reporter is triggered when the number of
likely never delivered port timestamping CQEs that utilize the space of the
PTP SQ is greater than 93.75% of the total capacity of the SQ. A devlink
health reporter recover method is also provided for this specific TX error
context that restarts the PTP SQ.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3178308a 02-May-2023 Rahul Rameshbabu <rrameshbabu@nvidia.com>

net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs

Use a map structure for associating CQEs containing port timestamping
information with the appropriate skb. Track order of WQEs submitted using a
FIFO. Check if the corresponding port timestamping CQEs from the lookup
values in the FIFO are considered dropped due to time elapsed. Return the
lookup value to a freelist after consuming the skb. Reuse the freed lookup
in future WQE submission iterations.

The map structure uses an integer identifier for the key and returns an skb
corresponding to that identifier. Embed the integer identifier in the WQE
submitted to the WQ for the transmit path when the SQ is a PTP (port
timestamping) SQ. The embedded identifier can then be queried using a field
in the CQE of the corresponding port timestamping CQ. In the port
timestamping napi_poll context, the identifier is queried from the CQE
polled from CQ and used to lookup the corresponding skb from the WQE submit
path. The skb reference is removed from map and then embedded with the port
HW timestamp information from the CQE and eventually consumed.

The metadata freelist FIFO is an array containing integer identifiers that
can be pushed and popped in the FIFO. The purpose of this structure is
bookkeeping what identifier values can safely be used in a subsequent WQE
submission and should not contain identifiers that have still not been
reaped by processing a corresponding CQE completion on the port
timestamping CQ.

The ts_cqe_pending_list structure is a combination of an array and linked
list. The array is pre-populated with the nodes that will be added and
removed from the head of the linked list. Each node contains the unique
identifier value associated with the values submitted in the WQEs and
retrieved in the port timestamping CQEs. When a WQE is submitted, the node
in the array corresponding to the identifier popped from the metadata
freelist is added to the end of the CQE pending list and is marked as
"in-use". The node is removed from the linked list under two conditions.
The first condition is that the corresponding port timestamping CQE is
polled in the PTP napi_poll context. The second condition is that more than
a second has elapsed since the DMA timestamp value corresponding to the WQE
submission. When the first condition occurs, the "in-use" bit in the linked
list node is cleared, and the resources corresponding to the WQE submission
are then released. The second condition, however, indicates that the port
timestamping CQE will likely never be delivered. It's not impossible for
the device to post a CQE after an infinite amount of time though highly
improbable. In order to be resilient to this improbable case, resources
related to the corresponding WQE submission are still kept, the identifier
value is not returned to the freelist, and the "in-use" bit is cleared on
the node to indicate that it's no longer part of the linked list of "likely
to be delivered" port timestamping CQE identifiers. A count for the number
of port timestamping CQEs considered highly likely to never be delivered by
the device is maintained. This count gets decremented in the unlikely event
a port timestamping CQE considered unlikely to ever be delivered is polled
in the PTP napi_poll context.

Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d543b649 29-Jun-2023 Zhengchao Shao <shaozhengchao@huawei.com>

net/mlx5e: fix memory leak in mlx5e_ptp_open

When kvzalloc_node or kvzalloc failed in mlx5e_ptp_open, the memory
pointed by "c" or "cparams" is not freed, which can lead to a memory
leak. Fix by freeing the array in the error path.

Fixes: 145e5637d941 ("net/mlx5e: Add TX PTP port object support")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7aa50380 21-Feb-2023 Rahul Rameshbabu <rrameshbabu@nvidia.com>

net/mlx5e: Fix SQ wake logic in ptp napi_poll context

Check in the mlx5e_ptp_poll_ts_cq context if the ptp tx sq should be woken
up. Before change, the ptp tx sq may never wake up if the ptp tx ts skb
fifo is full when mlx5e_poll_tx_cq checks if the queue should be woken up.

Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# cf1cccae 30-Jan-2023 Gal Pressman <gal@nvidia.com>

net/mlx5e: Rename misleading skb_pc/cc references in ptp code

The 'skb_pc/cc' naming is misleading as the values hold the
producer/consumer indices (masked values), not the counters. Rename to
'skb_pi/ci'.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Adham Faris <afaris@nvidia.com>


# 3a50cf1e 02-Feb-2023 Vadim Fedorenko <vadfed@meta.com>

mlx5: fix possible ptp queue fifo use-after-free

Fifo indexes are not checked during pop operations and it leads to
potential use-after-free when poping from empty queue. Such case was
possible during re-sync action. WARN_ON_ONCE covers future cases.

There were out-of-order cqe spotted which lead to drain of the queue and
use-after-free because of lack of fifo pointers check. Special check and
counter are added to avoid resync operation if SKB could not exist in the
fifo because of OOO cqe (skb_id must be between consumer and producer
index).

Fixes: 58a518948f60 ("net/mlx5e: Add resiliency for PTP TX port timestamp")
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e435941b 02-Feb-2023 Vadim Fedorenko <vadfed@meta.com>

mlx5: fix skb leak while fifo resync and push

During ptp resync operation SKBs were poped from the fifo but were never
freed neither by napi_consume nor by dev_kfree_skb_any. Add call to
napi_consume_skb to properly free SKBs.

Another leak was happening because mlx5e_skb_fifo_has_room() had an error
in the check. Comparing free running counters works well unless C promotes
the types to something wider than the counter. In this case counters are
u16 but the result of the substraction is promouted to int and it causes
wrong result (negative value) of the check when producer have already
overlapped but consumer haven't yet. Explicit cast to u16 fixes the issue.

Fixes: 58a518948f60 ("net/mlx5e: Add resiliency for PTP TX port timestamp")
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 79efecb4 30-Aug-2022 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Trigger NAPI after activating an SQ

If an SQ is deactivated and reactivated again, some packets could be
sent after MLX5E_SQ_STATE_ENABLED is cleared, but before
netif_tx_stop_queue, meaning that NAPI might miss some completions. In
order to handle them, make sure to trigger NAPI after SQ activation in
all cases where it can be relevant. Regular SQs, XDP SQs and XSK SQs are
good. Missing cases added: after recovery, after activating HTB SQs and
after activating PTP SQs.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b48b89f9 27-Sep-2022 Jakub Kicinski <kuba@kernel.org>

net: drop the weight argument from netif_napi_add

We tell driver developers to always pass NAPI_POLL_WEIGHT
as the weight to netif_napi_add(). This may be confusing
to newcomers, drop the weight argument, those who really
need to tweak the weight can use netif_napi_add_weight().

Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for CAN
Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 81a0b241 31-Jan-2022 Lama Kayal <lkayal@nvidia.com>

net/mlx5e: Drop priv argument of ptp function in en_fs

Both mlx5e_ptp_alloc_rx_fs and mlx5e_ptp_free_rx_fs only
make use of two priv member, pass them directly instead.
This will help dropping priv from all en_fs file.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 4e0ecc17 25-Jan-2022 Lama Kayal <lkayal@nvidia.com>

net/mlx5e: Decouple fs_tt_redirect from en.h

Make flow steering files fs_tt_redirect.c/h independent of en.h
such that it goes through the flow steering API only.

Make error reports be via mlx5_core API instead of netdev_err API, this
to ensure a safe decoupling from en.h, and prevent redundant argument
passing.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f52f2fae 10-Jan-2022 Lama Kayal <lkayal@nvidia.com>

net/mlx5e: Introduce flow steering API

Move mlx5e_flow_steering struct to fs_en.c to make it private.
Introduce flow_steering API and let other files go through it.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# af8bbf73 09-Jan-2022 Lama Kayal <lkayal@nvidia.com>

net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer

Make mlx5e_flow_steering member of mlx5e_priv a pointer.
Add dynamic allocation respectively.

Allocate fs for all profiles when initializing profile,
symmetrically deallocate at profile cleanup.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 58a51894 04-Jul-2022 Aya Levin <ayal@nvidia.com>

net/mlx5e: Add resiliency for PTP TX port timestamp

PTP TX port timestamp relies on receiving 2 CQEs for each outgoing
packet (WQE). The regular CQE has a less accurate timestamp than the
wire CQE. On link change, the wire CQE may get lost. Let the driver
detect and restore the relation between the CQEs, and re-sync after
timeout.

Add resiliency for this as follows: add id (producer counter)
into the WQE's metadata. This id will be received in the wire
CQE (in wqe_counter field). On handling the wire CQE, if there is no
match, replay the PTP application with the time-stamp from the regular
CQE and restore the sync between the CQEs and their SKBs. This patch
adds 2 ptp counters:
1) ptp_cq0_resync_event: number of times a mismatch was detected between
the regular CQE and the wire CQE.
2) ptp_cq0_resync_cqe: total amount of missing wire CQEs.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2e642afb 15-Apr-2022 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Disable softirq in mlx5e_activate_rq to avoid race condition

When the driver activates the channels, it assumes NAPI isn't running
yet. mlx5e_activate_rq posts a NOP WQE to ICOSQ to trigger a hardware
interrupt and start NAPI, which will run mlx5e_alloc_rx_mpwqe and post
UMR WQEs to ICOSQ to be able to receive packets with striding RQ.

Unfortunately, a race condition is possible if NAPI is triggered by
something else (for example, TX) at a bad timing, before
mlx5e_activate_rq finishes. In this case, mlx5e_alloc_rx_mpwqe may post
UMR WQEs to ICOSQ, and with the bad timing, the wqe_info of the first
UMR may be overwritten by the wqe_info of the NOP posted by
mlx5e_activate_rq.

The consequence is that icosq->db.wqe_info[0].num_wqebbs will be changed
from MLX5E_UMR_WQEBBS to 1, disrupting the integrity of the array-based
linked list in wqe_info[]. mlx5e_poll_ico_cq will hang in an infinite
loop after processing wqe_info[0], because after the corruption, the
next item to be processed will be wqe_info[1], which is filled with
zeros, and `sqcc += wi->num_wqebbs` will never move further.

This commit fixes this race condition by using async_icosq to post the
NOP and trigger the interrupt. async_icosq is always protected with a
spinlock, eliminating the race condition.

Fixes: bc77b240b3c5 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reported-by: Karsten Nielsen <karsten@foo-bar.dk>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c27bd171 17-Jan-2022 Aya Levin <ayal@nvidia.com>

net/mlx5e: Read max WQEBBs on the SQ from firmware

Prior to this patch the maximal value for max WQEBBs (WQE Basic Blocks,
where WQE is a Work Queue Element) on the TX side was assumed to be 16
(fixed value). All firmware versions till today comply to this. In order
to be more flexible and resilient, read from FW the corresponding:
max_wqe_sz_sq. This value describes the maximum WQE size given in bytes,
thus max WQEBBs is given by the division in WQEBB's byte size. The
driver uses the top between 16 and the division result. This ensures
synchronization between driver and firmware and avoids unexpected
behavior. Store this value on the different SQs (Send Queues) for easy
access.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9536923d 19-May-2021 Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Remove unused tstamp SQ field

Remove tstamp pointer in mlx5e_txqsq as it's no longer used after
commit 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section").

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6c72cb05 04-Dec-2021 Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Use bitmap field for profile features

Use a features bitmap field in mlx5e_profile to declare profile support
state of the different features. Let it replace the existing
rx_ptp_support boolean. It will be extended to cover more features in a
downstream patch.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 83fec3f1 12-Oct-2021 Aharon Landau <aharonl@nvidia.com>

RDMA/mlx5: Replace struct mlx5_core_mkey by u32 key

In mlx5_core and vdpa there is no use of mlx5_core_mkey members except
for the key itself.

As preparation for moving mlx5_core_mkey to mlx5_ib, the occurrences of
struct mlx5_core_mkey in all modules except for mlx5_ib are replaced by
a u32 key.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>


# dd1979cf 29-Aug-2021 Lama Kayal <lkayal@nvidia.com>

net/mlx5e: Fix the presented RQ index in PTP stats

PTP-RQ counters title format contains PTP-RQ identifier, which is
mistakenly not passed to sprinft().
This leads to unexpected garbage values instead.
This patch fixes it.

Before applying the patch:
ethtool -S eth3 | grep ptp_rq
ptp_rq15_packets: 0
ptp_rq8_bytes: 0
ptp_rq6_csum_complete: 0
ptp_rq14_csum_complete_tail: 0
ptp_rq3_csum_complete_tail_slow : 0
ptp_rq9_csum_unnecessary: 0
ptp_rq1_csum_unnecessary_inner: 0
ptp_rq7_csum_none: 0
ptp_rq10_xdp_drop: 0
ptp_rq9_xdp_redirect: 0
ptp_rq13_lro_packets: 0
ptp_rq12_lro_bytes: 0
ptp_rq10_ecn_mark: 0
ptp_rq9_removed_vlan_packets: 0
ptp_rq5_wqe_err: 0
ptp_rq8_mpwqe_filler_cqes: 0
ptp_rq2_mpwqe_filler_strides: 0
ptp_rq5_oversize_pkts_sw_drop: 0
ptp_rq6_buff_alloc_err: 0
ptp_rq15_cqe_compress_blks: 0
ptp_rq2_cqe_compress_pkts: 0
ptp_rq2_cache_reuse: 0
ptp_rq12_cache_full: 0
ptp_rq11_cache_empty: 256
ptp_rq12_cache_busy: 0
ptp_rq11_cache_waive: 0
ptp_rq12_congst_umr: 0
ptp_rq11_arfs_err: 0
ptp_rq9_recover: 0

After applying the patch:
ethtool -S eth3 | grep ptp_rq
ptp_rq0_packets: 0
ptp_rq0_bytes: 0
ptp_rq0_csum_complete: 0
ptp_rq0_csum_complete_tail: 0
ptp_rq0_csum_complete_tail_slow : 0
ptp_rq0_csum_unnecessary: 0
ptp_rq0_csum_unnecessary_inner: 0
ptp_rq0_csum_none: 0
ptp_rq0_xdp_drop: 0
ptp_rq0_xdp_redirect: 0
ptp_rq0_lro_packets: 0
ptp_rq0_lro_bytes: 0
ptp_rq0_ecn_mark: 0
ptp_rq0_removed_vlan_packets: 0
ptp_rq0_wqe_err: 0
ptp_rq0_mpwqe_filler_cqes: 0
ptp_rq0_mpwqe_filler_strides: 0
ptp_rq0_oversize_pkts_sw_drop: 0
ptp_rq0_buff_alloc_err: 0
ptp_rq0_cqe_compress_blks: 0
ptp_rq0_cqe_compress_pkts: 0
ptp_rq0_cache_reuse: 0
ptp_rq0_cache_full: 0
ptp_rq0_cache_empty: 256
ptp_rq0_cache_busy: 0
ptp_rq0_cache_waive: 0
ptp_rq0_congst_umr: 0
ptp_rq0_arfs_err: 0
ptp_rq0_recover: 0

Fixes: a28359e922c6 ("net/mlx5e: Add PTP-RX statistics")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 86d747a3 06-Jul-2021 Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Abstract MQPRIO params

Abstract the MQPRIO params into a struct.
Use a getter for DCB mode num_tcs.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d443c6f6 02-Jul-2021 Maor Gottlieb <maorg@nvidia.com>

net/mlx5e: Rename traffic type enums

Rename traffic type enums as part of the preparation for moving
the traffic type logic to a separate file.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 43ec0f41 09-Apr-2021 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Hide all implementation details of mlx5e_rx_res

This commit moves all implementation details of struct mlx5e_rx_res
under en/rx_res.c. All access to RX resources is now done using methods.
Encapsulating RX resources into an object allows for better
manageability, because all the implementation details are now in a
single place, and external code can use only a limited set of API
methods to init/teardown the whole thing, reconfigure RSS and LRO
parameters, connect TIRs to flow steering and activate/deactivate TIRs.

mlx5e_rx_res is self-contained and doesn't depend on struct mlx5e_priv
or include en.h.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 0570c1c9 05-Apr-2021 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Take RQT out of TIR and group RX resources

RQT is not part of TIR, as multiple TIRs may point to the same RQT, as
it happens with indir_tir and inner_indir_tir. These instances of a TIR
don't use the embedded RQT.

This commit takes RQT out of TIR, making them independent. The RQTs are
placed into struct mlx5e_rx_res, and items in that struct are regrouped
by functionality: RSS, channels and PTP.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3f22d6c7 05-Apr-2021 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Move RX resources to a separate struct

This commit moves RQTs and TIRs to a separate struct that is allocated
dynamically in profiles that support these RX resources (all profiles,
except IPoIB PKey). It also allows to remove rqt_enabled flags, as RQTs
are always enabled in profiles that support RX resources.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 678b1ae1 22-Jun-2021 Aya Levin <ayal@nvidia.com>

net/mlx5e: Fix page allocation failure for ptp-RQ over SF

Set the correct pci-device pointer to the ptp-RQ. This allows access to
dma_mask and avoids allocation request with wrong pci-device.

Fixes: a099da8ffcf6 ("net/mlx5e: Add RQ to PTP channel")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a759f845 30-Jun-2021 Aya Levin <ayal@nvidia.com>

net/mlx5e: Consider PTP-RQ when setting RX VLAN stripping

Add PTP-RQ to the loop when setting rx-vlan-offload feature via ethtool.
On PTP-RQ's creation, set rx-vlan-offload into its parameters.

Fixes: a099da8ffcf6 ("net/mlx5e: Add RQ to PTP channel")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a6ee6f5f 19-Apr-2021 Aya Levin <ayal@nvidia.com>

net/mlx5e: Fix select queue to consider SKBTX_HW_TSTAMP

Steering packets to PTP-SQ should be done only if the SKB has
SKBTX_HW_TSTAMP set in the tx_flags. While here, take the function into
a header and inline it.
Set the whole condition to select the PTP-SQ to unlikely.

Fixes: 24c22dd0918b ("net/mlx5e: Add states to PTP channel")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 69cc4185 28-Jan-2021 Maxim Mikityanskiy <maximmi@mellanox.com>

net/mlx5e: Use mlx5e_safe_switch_channels when channels are closed

This commit uses new functionality of mlx5e_safe_switch_channels
introduced by the previous commit to reduce the amount of repeating
similar code all over the driver.

It's very common in mlx5e to call mlx5e_safe_switch_channels when the
channels are open, but assign parameters and run hardware commands
manually when the channels are closed.

After the previous commit it's no longer needed to do such manual things
every time, so this commit removes unneeded code and relies on the new
functionality of mlx5e_safe_switch_channels. Some of the places are
refactored and simplified, where more complex flows are used to change
configuration on the fly, without recreating the channels (the logic is
rewritten in a more robust way, with a reset required by default and a
list of exceptions).

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 960fbfe2 20-Jan-2021 Aya Levin <ayal@nvidia.com>

net/mlx5e: Allow coexistence of CQE compression and HW TS PTP

Update setting HW time-stamp to allow coexistence with CQE compression.
Turn on RX PTP indication and try to reopen the channels. On success,
coexistence with CQE compression is enabled. Otherwise, fall-back to
turning off CQE compression.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e5fe4946 15-Feb-2021 Aya Levin <ayal@nvidia.com>

net/mlx5e: Add PTP Flow Steering support

When opening PTP channel with MLX5E_PTP_STATE_RX set, add the
corresponding flow steering rules. Capture UDP packets with destination
port 319 and L2 packets with ethertype 0x88F7 and steer them into the RQ
of the PTP channel.
Add API that manages the flow steering rules to be used in the following
patches via safe_reopen_channels mechanism.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3adb60b6 25-Feb-2021 Aya Levin <ayal@nvidia.com>

net:mlx5e: Add PTP-TIR and PTP-RQT

Add PTP-TIR and initiate its RQT to allow PTP-RQ to integrate into the
safe-reopen flow on configuration change. Add rx_ptp_support flag on a
profile and turn it on for ETH driver. With this flag set, create a
redirect-RQT for PTP-RQ.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a28359e9 07-Mar-2021 Aya Levin <ayal@nvidia.com>

net/mlx5e: Add PTP-RX statistics

Like PTP-TX, once the PTP-RX is opened, corresponding statistics appear.
Add indication that PTP-RX was ever opened: rx_ptp_opened. If any of the
PTP RX or TX were opened, display the PTP channel's statistics.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a099da8f 07-Mar-2021 Aya Levin <ayal@nvidia.com>

net/mlx5e: Add RQ to PTP channel

Enhance PTP channel to allow PTP without disabling CQE compression. Add
RQ, TIR and PTP_RX_STATE to PTP channel. When this bit is set, PTP
channel manages its RQ, and PTP traffic is directed to the PTP-RQ which
is not affected by compression.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 24c22dd0 11-Jan-2021 Aya Levin <ayal@nvidia.com>

net/mlx5e: Add states to PTP channel

Add PTP TX state to PTP channel, which indicates the corresponding SQ is
available. Further patches in the set extend PTP channel to include RQ.
The PTP channel state will be used for separation and coexistence of RX
and TX PTP. Enhance conditions to verify the TX PTP state is set.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e569cbd7 17-Jan-2021 Aya Levin <ayal@nvidia.com>

net/mlx5e: Cleanup PTP

Reduce scope of mlx5e_ptp_params, move to its c file. Remove unneeded
variables from mlx5e_ptp_open and state bitmap from PTP channel. In
addition, remove channel index from PTP channel since it is set to a
hard coded value, use define instead.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b0d35de4 07-Mar-2021 Aya Levin <ayal@nvidia.com>

net/mlx5e: Generalize PTP implementation

Following patches in the set add support for RX PTP. Rename PTP prefix
from %s/port_ptp/ptp/g to include RX PTP too.

In addition rename indication (used in statistics context) that PTP-SQ
was opened: %s/port_ptp_opened/tx_ptp_opened/g. This will simplify adding
indication that PTP-RQ was opened.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 183532b7 02-Mar-2021 Aya Levin <ayal@nvidia.com>

net/mlx5: Add helper to set time-stamp translator on a queue

Translation method on the time-stamp is set by the capabilities. Avoid
code duplication by using a helper to set ptp_cyc2time callback on a
queue.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 89564920 10-Mar-2021 Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Restrict usage of mlx5e_priv in params logic functions

Do not use generic struct mlx5e_priv as a parameter to param
functions, as it is too generic. All calculations of the channel's
param should be mainly based on struct mlx5_core_dev and
struct mlx5e_params. Additional info can be explicitly passed.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c276aae8 26-Jan-2021 Roi Dayan <roid@nvidia.com>

net/mlx5: Move mlx5e hw resources into a sub object

This is to separate between resources attributes and other
attributes we will want to use.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 432119de 12-Feb-2021 Aya Levin <ayal@nvidia.com>

net/mlx5: Add cyc2time HW translation mode support

Device timestamp can be in real time mode (cycles to time translation is
offloaded into the Hardware). With real time mode, HW provides timestamp
which is already translated into nanoseconds.

With this mode, driver adjusts both the HW and timecounter (to keep
clock_info_page updated) using callbacks: adjfreq, adjtime and settime.
HW clock modifications are done via MTUTC access reg commands. Driver is
allowed to modify HW real time clock only if MCAM ptpcyc2realtime_modify
capability is set.

Add MTUTC set function to be used for configuring the HW real time
clock. Modify existing code to support both internal timer (with
conversion via timecounter_cyc2time() and real time (no conversions).

Align the signatures of the helpers converting from timestamp to
nanoseconds. With that, when allocating a queue assign the corresponding
callback with respect to the capability.

Adjust 1PPS timestamp calculation flows based on the timestamp mode.

Cyc2time offload brings two major advantages:
- Improve MTAE (Max Time Absolute Error) for HW TS by up to 160 ns over a
100% loaded CPU.
- Faster data-path timestamp to nanoseconds, as translation is
lock-less and done in HW.

On real time mode, timestamp format is 32 high bits of seconds and 32
low bits of nanoseconds. On some flows, driver shall convert this format
into nanoseconds wall-clock with REAL_TIME_TO_NS macro.

HW supports a single clock, and it is shared by all functions on a
device. In case real time clock is used, it is recommended to use
a single GM to all device's functions.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7637e499 30-Dec-2019 Tariq Toukan <tariqt@mellanox.com>

net/mlx5e: Enable napi in channel's activation stage

The channel's napi is first needed upon activation, not creation.
Minimize its enabled scope by moving it from the channel's open/close
stage into the activate/deactivate stage.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 214baf22 19-Jan-2021 Maxim Mikityanskiy <maximmi@mellanox.com>

net/mlx5e: Support HTB offload

This commit adds support for HTB offload in the mlx5e driver.

Performance:

NIC: Mellanox ConnectX-6 Dx
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (24 cores with HT)

100 Gbit/s line rate, 500 UDP streams @ ~200 Mbit/s each
48 traffic classes, flower used for steering
No shaping (rate limits set to 4 Gbit/s per TC) - checking for max
throughput.

Baseline: 98.7 Gbps, 8.25 Mpps
HTB: 6.7 Gbps, 0.56 Mpps
HTB offload: 95.6 Gbps, 8.00 Mpps

Limitations:

1. 256 leaf nodes, 3 levels of depth.

2. Granularity for ceil is 1 Mbit/s. Rates are converted to weights, and
the bandwidth is split among the siblings according to these weights.
Other parameters for classes are not supported.

Ethtool statistics support for QoS SQs are also added. The counters are
called qos_txN_*, where N is the QoS queue number (starting from 0, the
numeration is separate from the normal SQs), and * is the counter name
(the counters are the same as for the normal SQs).

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 1880bc4e 01-Dec-2020 Eran Ben Elisha <eranbe@nvidia.com>

net/mlx5e: Add TX port timestamp support

Transmitted packet timestamping accuracy can be improved when using
timestamp from the port, instead of packet CQE creation timestamp, as
it better reflects the actual time of a packet's transmit.

TX port timestamping is supported starting from ConnectX6-DX hardware.
Although at the original completion, only CQE timestamp can be attached,
we are able to get TX port timestamping via an additional completion over
a special CQ associated with the SQ (in addition to the regular CQ).

Driver to ignore the original packet completion timestamp, and report
back the timestamp of the special CQ completion. If the absolute timestamp

diff between the two completions is greater than 1 / 128 second, ignore
the TX port timestamp as it has a jitter which is too big.
No skb will be generate out of the extra completion.

Allocate additional CQ per ptpsq, to receive the TX port timestamp.

Driver to hold an skb FIFO in order to map between transmitted skb to
the two expected completions. When using ptpsq, hold double refcount on
the skb, to gaurantee it will not get released before both completions
arrive.

Expose dedicated counters of the ptp additional CQ and connect it to the
TX health reporter.

This patch improves TX Hardware timestamping offset to be less than 40ns
at a 100Gbps line rate, compared to 600ns before.

With that, making our HW compliant with G.8273.2 class C, and allow Linux
systems to be deployed in the 5G telco edge, where this standard is a
must.

Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 145e5637 01-Dec-2020 Eran Ben Elisha <eranbe@nvidia.com>

net/mlx5e: Add TX PTP port object support

Add TX PTP port object support for better TX timestamping accuracy.
Currently, driver supports CQE based TX port timestamp. Device
also offers TX port timestamp, which has less jitter and better
reflects the actual time of a packet's transmit.

Define new driver layout called ptpsq, on which driver will create
SQs that will support TX port timestamp for their transmitted packets.
Driver to identify PTP TX skbs and steer them to these dedicated SQs
as part of the select queue ndo.

Driver to hold ptpsq per TC and report them at
netif_set_real_num_tx_queues().

Add support for all needed functionality in order to xmit and poll
completions received via ptpsq.

Add ptpsq to the TX reporter recover, diagnose and dump methods.

Creation of ptpsqs is disabled by default, and can be enabled via
tx_port_ts private flag.

This patch steer all timestamp related packets to a ptpsq, but it
does not open the port timestamp support for it. The support will
be added in the following patch.

Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>