History log of /linux-master/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
Revision Date Author Comments
# 7f525acb 14-Feb-2024 Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Support per-mdev queue counter

Each queue counter object counts some events (in hardware) for the RQs
that are attached to it, like events of packet drops due to no receive
WQE (rx_out_of_buffer).

Each RQ can be attached to a queue counter only within the same vhca. To
still cover all RQs with these counters, we create multiple instances,
one per vhca.

The result that's shown to the user is now the sum of all instances.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 40e6ad91 14-Feb-2024 Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Support cross-vhca RSS

Implement driver support for the HW feature that allows RX steering of
one device to target other device's RQs.

In SD multi-pf netdev mode, we set the secondaries into silent mode,
disconnecting them from the network. This feature is then used to steer
traffic from the primary to the secondaries.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d76fdd31 10-Nov-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Fix peer flow lists handling

The cited change refactored mlx5e_tc_del_fdb_peer_flow() to only clear DUP
flag when list of peer flows has become empty. However, if any concurrent
user holds a reference to a peer flow (for example, the neighbor update
workqueue task is updating peer flow's parent encap entry concurrently),
then the flow will not be removed from the peer list and, consecutively,
DUP flag will remain set. Since mlx5e_tc_del_fdb_peers_flow() calls
mlx5e_tc_del_fdb_peer_flow() for every possible peer index the algorithm
will try to remove the flow from eswitch instances that it has never peered
with causing either NULL pointer dereference when trying to remove the flow
peer list head of peer_index that was never initialized or a warning if the
list debug config is enabled[0].

Fix the issue by always removing the peer flow from the list even when not
releasing the last reference to it.

[0]:

[ 3102.985806] ------------[ cut here ]------------
[ 3102.986223] list_del corruption, ffff888139110698->next is NULL
[ 3102.986757] WARNING: CPU: 2 PID: 22109 at lib/list_debug.c:53 __list_del_entry_valid_or_report+0x4f/0xc0
[ 3102.987561] Modules linked in: act_ct nf_flow_table bonding act_tunnel_key act_mirred act_skbedit vxlan cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa openvswitch nsh xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcg
ss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core [last unloaded: bonding]
[ 3102.991113] CPU: 2 PID: 22109 Comm: revalidator28 Not tainted 6.6.0-rc6+ #3
[ 3102.991695] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 3102.992605] RIP: 0010:__list_del_entry_valid_or_report+0x4f/0xc0
[ 3102.993122] Code: 39 c2 74 56 48 8b 32 48 39 fe 75 62 48 8b 51 08 48 39 f2 75 73 b8 01 00 00 00 c3 48 89 fe 48 c7 c7 48 fd 0a 82 e8 41 0b ad ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 70 fd 0a 82 e8 2d 0b ad ff 0f 0b
[ 3102.994615] RSP: 0018:ffff8881383e7710 EFLAGS: 00010286
[ 3102.995078] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
[ 3102.995670] RDX: 0000000000000001 RSI: ffff88885f89b640 RDI: ffff88885f89b640
[ 3102.997188] DEL flow 00000000be367878 on port 0
[ 3102.998594] RBP: dead000000000122 R08: 0000000000000000 R09: c0000000ffffdfff
[ 3102.999604] R10: 0000000000000008 R11: ffff8881383e7598 R12: dead000000000100
[ 3103.000198] R13: 0000000000000002 R14: ffff888139110000 R15: ffff888101901240
[ 3103.000790] FS: 00007f424cde4700(0000) GS:ffff88885f880000(0000) knlGS:0000000000000000
[ 3103.001486] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3103.001986] CR2: 00007fd42e8dcb70 CR3: 000000011e68a003 CR4: 0000000000370ea0
[ 3103.002596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3103.003190] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3103.003787] Call Trace:
[ 3103.004055] <TASK>
[ 3103.004297] ? __warn+0x7d/0x130
[ 3103.004623] ? __list_del_entry_valid_or_report+0x4f/0xc0
[ 3103.005094] ? report_bug+0xf1/0x1c0
[ 3103.005439] ? console_unlock+0x4a/0xd0
[ 3103.005806] ? handle_bug+0x3f/0x70
[ 3103.006149] ? exc_invalid_op+0x13/0x60
[ 3103.006531] ? asm_exc_invalid_op+0x16/0x20
[ 3103.007430] ? __list_del_entry_valid_or_report+0x4f/0xc0
[ 3103.007910] mlx5e_tc_del_fdb_peers_flow+0xcf/0x240 [mlx5_core]
[ 3103.008463] mlx5e_tc_del_flow+0x46/0x270 [mlx5_core]
[ 3103.008944] mlx5e_flow_put+0x26/0x50 [mlx5_core]
[ 3103.009401] mlx5e_delete_flower+0x25f/0x380 [mlx5_core]
[ 3103.009901] tc_setup_cb_destroy+0xab/0x180
[ 3103.010292] fl_hw_destroy_filter+0x99/0xc0 [cls_flower]
[ 3103.010779] __fl_delete+0x2d4/0x2f0 [cls_flower]
[ 3103.011207] fl_delete+0x36/0x80 [cls_flower]
[ 3103.011614] tc_del_tfilter+0x56f/0x750
[ 3103.011982] rtnetlink_rcv_msg+0xff/0x3a0
[ 3103.012362] ? netlink_ack+0x1c7/0x4e0
[ 3103.012719] ? rtnl_calcit.isra.44+0x130/0x130
[ 3103.013134] netlink_rcv_skb+0x54/0x100
[ 3103.013533] netlink_unicast+0x1ca/0x2b0
[ 3103.013902] netlink_sendmsg+0x361/0x4d0
[ 3103.014269] __sock_sendmsg+0x38/0x60
[ 3103.014643] ____sys_sendmsg+0x1f2/0x200
[ 3103.015018] ? copy_msghdr_from_user+0x72/0xa0
[ 3103.015265] ___sys_sendmsg+0x87/0xd0
[ 3103.016608] ? copy_msghdr_from_user+0x72/0xa0
[ 3103.017014] ? ___sys_recvmsg+0x9b/0xd0
[ 3103.017381] ? ttwu_do_activate.isra.137+0x58/0x180
[ 3103.017821] ? wake_up_q+0x49/0x90
[ 3103.018157] ? futex_wake+0x137/0x160
[ 3103.018521] ? __sys_sendmsg+0x51/0x90
[ 3103.018882] __sys_sendmsg+0x51/0x90
[ 3103.019230] ? exit_to_user_mode_prepare+0x56/0x130
[ 3103.019670] do_syscall_64+0x3c/0x80
[ 3103.020017] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 3103.020469] RIP: 0033:0x7f4254811ef4
[ 3103.020816] Code: 89 f3 48 83 ec 10 48 89 7c 24 08 48 89 14 24 e8 42 eb ff ff 48 8b 14 24 41 89 c0 48 89 de 48 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 30 44 89 c7 48 89 04 24 e8 78 eb ff ff 48 8b
[ 3103.022290] RSP: 002b:00007f424cdd9480 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[ 3103.022970] RAX: ffffffffffffffda RBX: 00007f424cdd9510 RCX: 00007f4254811ef4
[ 3103.023564] RDX: 0000000000000000 RSI: 00007f424cdd9510 RDI: 0000000000000012
[ 3103.024158] RBP: 00007f424cdda238 R08: 0000000000000000 R09: 00007f41d801a4b0
[ 3103.024748] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000001
[ 3103.025341] R13: 00007f424cdd9510 R14: 00007f424cdda240 R15: 00007f424cdd99a0
[ 3103.025931] </TASK>
[ 3103.026182] ---[ end trace 0000000000000000 ]---
[ 3103.027033] ------------[ cut here ]------------

Fixes: 9be6c21fdcf8 ("net/mlx5e: Handle offloads flows per peer")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c20767fd 05-Nov-2023 Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Fix inconsistent hairpin RQT sizes

The processing of traffic in hairpin queues occurs in HW/FW and does not
involve the cpus, hence the upper bound on max num channels does not
apply to them. Using this bound for the hairpin RQT max_table_size is
wrong. It could be too small, and cause the error below [1]. As the
RQT size provided on init does not get modified later, use the same
value for both actual and max table sizes.

[1]
mlx5_core 0000:08:00.1: mlx5_cmd_out_err:805:(pid 1200): CREATE_RQT(0x916) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x538faf), err(-22)

Fixes: 74a8dadac17e ("net/mlx5e: Preparations for supporting larger number of channels")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3fbf6120 07-Jan-2024 Jakub Kicinski <kuba@kernel.org>

Revert "mlx5 updates 2023-12-20"

Revert "net/mlx5: Implement management PF Ethernet profile"
This reverts commit 22c4640698a1d47606b5a4264a584e8046641784.
Revert "net/mlx5: Enable SD feature"
This reverts commit c88c49ac9c18fb7c3fa431126de1d8f8f555e912.
Revert "net/mlx5e: Block TLS device offload on combined SD netdev"
This reverts commit 83a59ce0057b7753d7fbece194b89622c663b2a6.
Revert "net/mlx5e: Support per-mdev queue counter"
This reverts commit d72baceb92539a178d2610b0e9ceb75706a75b55.
Revert "net/mlx5e: Support cross-vhca RSS"
This reverts commit c73a3ab8fa6e93a783bd563938d7cf00d62d5d34.
Revert "net/mlx5e: Let channels be SD-aware"
This reverts commit e4f9686bdee7b4dd89e0ed63cd03606e4bda4ced.
Revert "net/mlx5e: Create EN core HW resources for all secondary devices"
This reverts commit c4fb94aa822d6c9d05fc3c5aee35c7e339061dc1.
Revert "net/mlx5e: Create single netdev per SD group"
This reverts commit e2578b4f983cfcd47837bbe3bcdbf5920e50b2ad.
Revert "net/mlx5: SD, Add informative prints in kernel log"
This reverts commit c82d360325112ccc512fc11a3b68cdcdf04a1478.
Revert "net/mlx5: SD, Implement steering for primary and secondaries"
This reverts commit 605fcce33b2d1beb0139b6e5913fa0b2062116b2.
Revert "net/mlx5: SD, Implement devcom communication and primary election"
This reverts commit a45af9a96740873db9a4b5bb493ce2ad81ccb4d5.
Revert "net/mlx5: SD, Implement basic query and instantiation"
This reverts commit 63b9ce944c0e26c44c42cdd5095c2e9851c1a8ff.
Revert "net/mlx5: SD, Introduce SD lib"
This reverts commit 4a04a31f49320d078b8078e1da4b0e2faca5dfa3.
Revert "net/mlx5: Fix query of sd_group field"
This reverts commit e04984a37398b3f4f5a79c993b94c6b1224184cc.
Revert "net/mlx5e: Use the correct lag ports number when creating TISes"
This reverts commit a7e7b40c4bc115dbf2a2bb453d7bbb2e0ea99703.

There are some unanswered questions on the list, and we don't
have any docs. Given the lack of replies so far and the fact
that v6.8 merge window has started - let's revert this and
revisit for v6.9.

Link: https://lore.kernel.org/all/20231221005721.186607-1-saeed@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# d72baceb 08-Aug-2023 Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Support per-mdev queue counter

Each queue counter object counts some events (in hardware) for the RQs
that are attached to it, like events of packet drops due to no receive
WQE (rx_out_of_buffer).

Each RQ can be attached to a queue counter only within the same vhca. To
still cover all RQs with these counters, we create multiple instances,
one per vhca.

The result that's shown to the user is now the sum of all instances.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c73a3ab8 05-Aug-2023 Tariq Toukan <tariqt@nvidia.com>

net/mlx5e: Support cross-vhca RSS

Implement driver support for the HW feature that allows RX steering of
one device to target other device's RQs.

In SD multi-mdev netdev mode, we set the secondaries into silent mode,
disconnecting them from the network. This feature is then used to steer
traffic from the primary to the secondaries.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 88e928b2 24-Oct-2023 Gal Pressman <gal@nvidia.com>

net/mlx5e: Access array with enum values instead of magic numbers

Access the headers array using pedit_cmd enum values, and don't assume
anything about their values.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 312eb3fd 06-Sep-2023 Amir Tzin <amirtz@nvidia.com>

net/mlx5e: Some cleanup in mlx5e_tc_stats_matchall()

Function mlx5e_tc_stats_matchall() is only called from one file:
drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c
Move it there and make it static as exposing it is unnecessary. Also
variable cur_stats is redundant. Remove it and avoid superfluous copy.

This patch does not change functionality.

Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d792e5f7 13-Dec-2023 Dan Carpenter <dan.carpenter@linaro.org>

net/mlx5e: Fix error codes in alloc_branch_attr()

Set the error code if set_branch_dest_ft() fails.

Fixes: ccbe33003b10 ("net/mlx5e: TC, Don't offload post action rule if not supported")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 86d59226 13-Dec-2023 Dan Carpenter <dan.carpenter@linaro.org>

net/mlx5e: Fix error code in mlx5e_tc_action_miss_mapping_get()

Preserve the error code if esw_add_restore_rule() fails. Don't return
success.

Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ccbe3300 11-Sep-2023 Chris Mi <cmi@nvidia.com>

net/mlx5e: TC, Don't offload post action rule if not supported

If post action is not supported, eg. ignore_flow_level is not
supported, don't offload post action rule. Otherwise, will hit
panic [1].

Fix it by checking if post action table is valid or not.

[1]
[445537.863880] BUG: unable to handle page fault for address: ffffffffffffffb1
[445537.864617] #PF: supervisor read access in kernel mode
[445537.865244] #PF: error_code(0x0000) - not-present page
[445537.865860] PGD 70683a067 P4D 70683a067 PUD 70683c067 PMD 0
[445537.866497] Oops: 0000 [#1] PREEMPT SMP NOPTI
[445537.867077] CPU: 19 PID: 248742 Comm: tc Kdump: loaded Tainted: G O 6.5.0+ #1
[445537.867888] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[445537.868834] RIP: 0010:mlx5e_tc_post_act_add+0x51/0x130 [mlx5_core]
[445537.869635] Code: c0 0d 00 00 e8 20 96 c6 d3 48 85 c0 0f 84 e5 00 00 00 c7 83 b0 01 00 00 00 00 00 00 49 89 c5 31 c0 31 d2 66 89 83 b4 01 00 00 <49> 8b 44 24 10 83 23 df 83 8b d8 01 00 00 04 48 89 83 c0 01 00 00
[445537.871318] RSP: 0018:ffffb98741cef428 EFLAGS: 00010246
[445537.871962] RAX: 0000000000000000 RBX: ffff8df341167000 RCX: 0000000000000001
[445537.872704] RDX: 0000000000000000 RSI: ffffffff954844e1 RDI: ffffffff9546e9cb
[445537.873430] RBP: ffffb98741cef448 R08: 0000000000000020 R09: 0000000000000246
[445537.874160] R10: 0000000000000000 R11: ffffffff943f73ff R12: ffffffffffffffa1
[445537.874893] R13: ffff8df36d336c20 R14: ffffffffffffffa1 R15: ffff8df341167000
[445537.875628] FS: 00007fcd6564f800(0000) GS:ffff8dfa9ea00000(0000) knlGS:0000000000000000
[445537.876425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[445537.877090] CR2: ffffffffffffffb1 CR3: 00000003b5884001 CR4: 0000000000770ee0
[445537.877832] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[445537.878564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[445537.879300] PKRU: 55555554
[445537.879797] Call Trace:
[445537.880263] <TASK>
[445537.880713] ? show_regs+0x6e/0x80
[445537.881232] ? __die+0x29/0x70
[445537.881731] ? page_fault_oops+0x85/0x160
[445537.882276] ? search_exception_tables+0x65/0x70
[445537.882852] ? kernelmode_fixup_or_oops+0xa2/0x120
[445537.883432] ? __bad_area_nosemaphore+0x18b/0x250
[445537.884019] ? bad_area_nosemaphore+0x16/0x20
[445537.884566] ? do_kern_addr_fault+0x8b/0xa0
[445537.885105] ? exc_page_fault+0xf5/0x1c0
[445537.885623] ? asm_exc_page_fault+0x2b/0x30
[445537.886149] ? __kmem_cache_alloc_node+0x1df/0x2a0
[445537.886717] ? mlx5e_tc_post_act_add+0x51/0x130 [mlx5_core]
[445537.887431] ? mlx5e_tc_post_act_add+0x30/0x130 [mlx5_core]
[445537.888172] alloc_flow_post_acts+0xfb/0x1c0 [mlx5_core]
[445537.888849] parse_tc_actions+0x582/0x5c0 [mlx5_core]
[445537.889505] parse_tc_fdb_actions+0xd7/0x1f0 [mlx5_core]
[445537.890175] __mlx5e_add_fdb_flow+0x1ab/0x2b0 [mlx5_core]
[445537.890843] mlx5e_add_fdb_flow+0x56/0x120 [mlx5_core]
[445537.891491] ? debug_smp_processor_id+0x1b/0x30
[445537.892037] mlx5e_tc_add_flow+0x79/0x90 [mlx5_core]
[445537.892676] mlx5e_configure_flower+0x305/0x450 [mlx5_core]
[445537.893341] mlx5e_rep_setup_tc_cls_flower+0x3d/0x80 [mlx5_core]
[445537.894037] mlx5e_rep_setup_tc_cb+0x5c/0xa0 [mlx5_core]
[445537.894693] tc_setup_cb_add+0xdc/0x220
[445537.895177] fl_hw_replace_filter+0x15f/0x220 [cls_flower]
[445537.895767] fl_change+0xe87/0x1190 [cls_flower]
[445537.896302] tc_new_tfilter+0x484/0xa50

Fixes: f0da4daa3413 ("net/mlx5e: Refactor ct to use post action infrastructure")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Automatic Verification <verifier@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shachar Kagan <skagan@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>


# 0c101a23 14-Nov-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Fix pedit endianness

Referenced commit addressed endianness issue in mlx5 pedit implementation
in ad hoc manner instead of systematically treating integer values
according to their types which left pedit fields of sizes not equal to 4
and where the bytes being modified are not least significant ones broken on
big endian machines since wrong bits will be consumed during parsing which
leads to following example error when applying pedit to source and
destination MAC addresses:

[Wed Oct 18 12:52:42 2023] mlx5_core 0001:00:00.1 p1v3_r: attempt to offload an unsupported field (cmd 0)
[Wed Oct 18 12:52:42 2023] mask: 00000000330c5b68: 00 00 00 00 ff ff 00 00 00 00 ff ff 00 00 00 00 ................
[Wed Oct 18 12:52:42 2023] mask: 0000000017d22fd9: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[Wed Oct 18 12:52:42 2023] mask: 000000008186d717: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[Wed Oct 18 12:52:42 2023] mask: 0000000029eb6149: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[Wed Oct 18 12:52:42 2023] mask: 000000007ed103e4: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[Wed Oct 18 12:52:42 2023] mask: 00000000db8101a6: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[Wed Oct 18 12:52:42 2023] mask: 00000000ec3c08a9: 00 00 00 00 00 00 00 00 00 00 00 00 ............

Treat masks and values of pedit and filter match as network byte order,
refactor pointers to them to void pointers instead of confusing u32
pointers and only cast to pointer-to-integer when reading a value from
them. Treat pedit mlx5_fields->field_mask as host byte order according to
its type u32, change the constants in fields array accordingly.

Fixes: 82198d8bcdef ("net/mlx5e: Fix endianness when calculating pedit mask first bit")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20231114215846.5902-8-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 74a8dada 12-Oct-2023 Adham Faris <afaris@nvidia.com>

net/mlx5e: Preparations for supporting larger number of channels

Data center server CPUs number keeps getting larger with time.
Currently, our driver limits the number of channels to 128.

Maximum channels number is enforced and bounded by hardcoded
defines (en.h/MLX5E_MAX_NUM_CHANNELS) even though the device and machine
(CPUs num) can allow more.

Refactor current implementation in order to handle further channels.

The maximum supported channels number will be increased in the followup
patch.

Introduce RQT size calculation/allocation scheme below:
1) Preserve current RQT size of 256 for channels number up to 128 (the
old limit).
2) For greater channels number, RQT size is calculated by multiplying
the channels number by 2 and rounding up the result to the nearest
power of 2. If the calculated RQT size exceeds the maximum supported
size by the NIC, fallback to this maximum RQT size
(1 << log_max_rqt_size).

Since RQT size is no more static, allocate and free the indirection
table SW shadow dynamically.

Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 80f12414 04-Sep-2023 Amir Tzin <amirtz@nvidia.com>

net/mlx5e: Fix VF representors reporting zero counters to "ip -s" command

Although vf_vport entry of struct mlx5e_stats is never updated, its
values are mistakenly copied to the caller structure in the VF
representor .ndo_get_stat_64 callback mlx5e_rep_get_stats(). Remove
redundant entry and use the updated one, rep_stats, instead.

Fixes: 64b68e369649 ("net/mlx5: Refactor and expand rep vport stat group")
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b7558a77 05-Sep-2023 Jianbo Liu <jianbol@nvidia.com>

net/mlx5e: Clear mirred devices array if the rule is split

In the cited commit, the mirred devices are recorded and checked while
parsing the actions. In order to avoid system crash, the duplicate
action in a single rule is not allowed.

But the rule is actually break down into several FTEs in different
tables, for either mirroring, or the specified types of actions which
use post action infrastructure.

It will reject certain action list by mistake, for example:
actions:enp8s0f0_1,set(ipv4(ttl=63)),enp8s0f0_0,enp8s0f0_1.
Here the rule is split to two FTEs because of pedit action.

To fix this issue, when parsing the rule actions, reset if_count to
clear the mirred devices array if the rule is split to multiple
FTEs, and then the duplicate checking is restarted.

Fixes: 554fe75c1b3f ("net/mlx5e: Avoid duplicating rule destinations")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c8e350e6 31-Jul-2023 Jianbo Liu <jianbol@nvidia.com>

net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev

For IPsec packet offload mode, the order of TC offload and IPsec
offload on the same netdevice is not aligned with the order in the
non-offload software. For example, for RX, the software performs TC
first and then IPsec transformation, but the implementation for
offload does that in the opposite way.

To resolve the difference for now, either IPsec offload or TC offload,
not both, is allowed for a specific interface.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/8e2e5e3b0984d785066e8663aaf97b3ba1bb873f.1690802064.git.leon@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 2b3082c6 28-Jul-2023 Ratheesh Kannoth <rkannoth@marvell.com>

net: flow_dissector: Use 64bits for used_keys

As 32bits of dissector->used_keys are exhausted,
increase the size to 64bits.

This is base change for ESP/AH flow dissector patch.
Please find patch and discussions at
https://lore.kernel.org/netdev/ZMDNjD46BvZ5zp5I@corigine.com/T/#t

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Petr Machata <petrm@nvidia.com> # for mlxsw
Tested-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1161d22d 22-May-2023 Roi Dayan <roid@nvidia.com>

net/mlx5e: E-Switch, Register devcom device with switch id key

Register devcom devices with switch id instead of guid.
Devcom interface is used to sync between ports in the eswitch,
e.g. Adding miss rules between the ports.
New eswitch devices could have the same guid but a different
switch id so its more correct to group according to switch id
which is the identifier if the ports are on the same eswitch.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 88d162b4 03-May-2023 Roi Dayan <roid@nvidia.com>

net/mlx5: Devcom, Infrastructure changes

Update devcom infrastructure to be more generic, without
depending on max supported ports definition or a device guid,
and also more encapsulated so callers don't need to pass
the register devcom component id per event call.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ac5da544 14-Apr-2023 Jianbo Liu <jianbol@nvidia.com>

net/mlx5e: TC, Fix internal port memory leak

The flow rule can be splited, and the extra post_act rules are added
to post_act table. It's possible to trigger memleak when the rule
forwards packets from internal port and over tunnel, in the case that,
for example, CT 'new' state offload is allowed. As int_port object is
assigned to the flow attribute of post_act rule, and its refcnt is
incremented by mlx5e_tc_int_port_get(), but mlx5e_tc_int_port_put() is
not called, the refcnt is never decremented, then int_port is never
freed.

The kmemleak reports the following error:
unreferenced object 0xffff888128204b80 (size 64):
comm "handler20", pid 50121, jiffies 4296973009 (age 642.932s)
hex dump (first 32 bytes):
01 00 00 00 19 00 00 00 03 f0 00 00 04 00 00 00 ................
98 77 67 41 81 88 ff ff 98 77 67 41 81 88 ff ff .wgA.....wgA....
backtrace:
[<00000000e992680d>] kmalloc_trace+0x27/0x120
[<000000009e945a98>] mlx5e_tc_int_port_get+0x3f3/0xe20 [mlx5_core]
[<0000000035a537f0>] mlx5e_tc_add_fdb_flow+0x473/0xcf0 [mlx5_core]
[<0000000070c2cec6>] __mlx5e_add_fdb_flow+0x7cf/0xe90 [mlx5_core]
[<000000005cc84048>] mlx5e_configure_flower+0xd40/0x4c40 [mlx5_core]
[<000000004f8a2031>] mlx5e_rep_indr_offload.isra.0+0x10e/0x1c0 [mlx5_core]
[<000000007df797dc>] mlx5e_rep_indr_setup_tc_cb+0x90/0x130 [mlx5_core]
[<0000000016c15cc3>] tc_setup_cb_add+0x1cf/0x410
[<00000000a63305b4>] fl_hw_replace_filter+0x38f/0x670 [cls_flower]
[<000000008bc9e77c>] fl_change+0x1fd5/0x4430 [cls_flower]
[<00000000e7f766e4>] tc_new_tfilter+0x867/0x2010
[<00000000e101c0ef>] rtnetlink_rcv_msg+0x6fc/0x9f0
[<00000000e1111d44>] netlink_rcv_skb+0x12c/0x360
[<0000000082dd6c8b>] netlink_unicast+0x438/0x710
[<00000000fc568f70>] netlink_sendmsg+0x794/0xc50
[<0000000016e92590>] sock_sendmsg+0xc5/0x190

So fix this by moving int_port cleanup code to the flow attribute
free helper, which is used by all the attribute free cases.

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 93a33193 29-Jun-2023 Chris Mi <cmi@nvidia.com>

net/mlx5e: Don't hold encap tbl lock if there is no encap action

The cited commit holds encap tbl lock unconditionally when setting
up dests. But it may cause the following deadlock:

PID: 1063722 TASK: ffffa062ca5d0000 CPU: 13 COMMAND: "handler8"
#0 [ffffb14de05b7368] __schedule at ffffffffa1d5aa91
#1 [ffffb14de05b7410] schedule at ffffffffa1d5afdb
#2 [ffffb14de05b7430] schedule_preempt_disabled at ffffffffa1d5b528
#3 [ffffb14de05b7440] __mutex_lock at ffffffffa1d5d6cb
#4 [ffffb14de05b74e8] mutex_lock_nested at ffffffffa1d5ddeb
#5 [ffffb14de05b74f8] mlx5e_tc_tun_encap_dests_set at ffffffffc12f2096 [mlx5_core]
#6 [ffffb14de05b7568] post_process_attr at ffffffffc12d9fc5 [mlx5_core]
#7 [ffffb14de05b75a0] mlx5e_tc_add_fdb_flow at ffffffffc12de877 [mlx5_core]
#8 [ffffb14de05b75f0] __mlx5e_add_fdb_flow at ffffffffc12e0eef [mlx5_core]
#9 [ffffb14de05b7660] mlx5e_tc_add_flow at ffffffffc12e12f7 [mlx5_core]
#10 [ffffb14de05b76b8] mlx5e_configure_flower at ffffffffc12e1686 [mlx5_core]
#11 [ffffb14de05b7720] mlx5e_rep_indr_offload at ffffffffc12e3817 [mlx5_core]
#12 [ffffb14de05b7730] mlx5e_rep_indr_setup_tc_cb at ffffffffc12e388a [mlx5_core]
#13 [ffffb14de05b7740] tc_setup_cb_add at ffffffffa1ab2ba8
#14 [ffffb14de05b77a0] fl_hw_replace_filter at ffffffffc0bdec2f [cls_flower]
#15 [ffffb14de05b7868] fl_change at ffffffffc0be6caa [cls_flower]
#16 [ffffb14de05b7908] tc_new_tfilter at ffffffffa1ab71f0

[1031218.028143] wait_for_completion+0x24/0x30
[1031218.028589] mlx5e_update_route_decap_flows+0x9a/0x1e0 [mlx5_core]
[1031218.029256] mlx5e_tc_fib_event_work+0x1ad/0x300 [mlx5_core]
[1031218.029885] process_one_work+0x24e/0x510

Actually no need to hold encap tbl lock if there is no encap action.
Fix it by checking if encap action exists or not before holding
encap tbl lock.

Fixes: 37c3b9fa7ccf ("net/mlx5e: Prevent encap offload when neigh update is running")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 65e64640 08-Jun-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Check for NOT_READY flag state after locking

Currently the check for NOT_READY flag is performed before obtaining the
necessary lock. This opens a possibility for race condition when the flow
is concurrently removed from unready_flows list by the workqueue task,
which causes a double-removal from the list and a crash[0]. Fix the issue
by moving the flag check inside the section protected by
uplink_priv->unready_flows_lock mutex.

[0]:
[44376.389654] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] SMP
[44376.391665] CPU: 7 PID: 59123 Comm: tc Not tainted 6.4.0-rc4+ #1
[44376.392984] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[44376.395342] RIP: 0010:mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core]
[44376.396857] Code: 00 48 8b b8 68 ce 02 00 e8 8a 4d 02 00 4c 8d a8 a8 01 00 00 4c 89 ef e8 8b 79 88 e1 48 8b 83 98 06 00 00 48 8b 93 90 06 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 90 06
[44376.399167] RSP: 0018:ffff88812cc97570 EFLAGS: 00010246
[44376.399680] RAX: dead000000000122 RBX: ffff8881088e3800 RCX: ffff8881881bac00
[44376.400337] RDX: dead000000000100 RSI: ffff88812cc97500 RDI: ffff8881242f71b0
[44376.401001] RBP: ffff88811cbb0940 R08: 0000000000000400 R09: 0000000000000001
[44376.401663] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88812c944000
[44376.402342] R13: ffff8881242f71a8 R14: ffff8881222b4000 R15: 0000000000000000
[44376.402999] FS: 00007f0451104800(0000) GS:ffff88852cb80000(0000) knlGS:0000000000000000
[44376.403787] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44376.404343] CR2: 0000000000489108 CR3: 0000000123a79003 CR4: 0000000000370ea0
[44376.405004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[44376.405665] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[44376.406339] Call Trace:
[44376.406651] <TASK>
[44376.406939] ? die_addr+0x33/0x90
[44376.407311] ? exc_general_protection+0x192/0x390
[44376.407795] ? asm_exc_general_protection+0x22/0x30
[44376.408292] ? mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core]
[44376.408876] __mlx5e_tc_del_fdb_peer_flow+0xbc/0xe0 [mlx5_core]
[44376.409482] mlx5e_tc_del_flow+0x42/0x210 [mlx5_core]
[44376.410055] mlx5e_flow_put+0x25/0x50 [mlx5_core]
[44376.410529] mlx5e_delete_flower+0x24b/0x350 [mlx5_core]
[44376.411043] tc_setup_cb_reoffload+0x22/0x80
[44376.411462] fl_reoffload+0x261/0x2f0 [cls_flower]
[44376.411907] ? mlx5e_rep_indr_setup_ft_cb+0x160/0x160 [mlx5_core]
[44376.412481] ? mlx5e_rep_indr_setup_ft_cb+0x160/0x160 [mlx5_core]
[44376.413044] tcf_block_playback_offloads+0x76/0x170
[44376.413497] tcf_block_unbind+0x7b/0xd0
[44376.413881] tcf_block_setup+0x17d/0x1c0
[44376.414269] tcf_block_offload_cmd.isra.0+0xf1/0x130
[44376.414725] tcf_block_offload_unbind+0x43/0x70
[44376.415153] __tcf_block_put+0x82/0x150
[44376.415532] ingress_destroy+0x22/0x30 [sch_ingress]
[44376.415986] qdisc_destroy+0x3b/0xd0
[44376.416343] qdisc_graft+0x4d0/0x620
[44376.416706] tc_get_qdisc+0x1c9/0x3b0
[44376.417074] rtnetlink_rcv_msg+0x29c/0x390
[44376.419978] ? rep_movs_alternative+0x3a/0xa0
[44376.420399] ? rtnl_calcit.isra.0+0x120/0x120
[44376.420813] netlink_rcv_skb+0x54/0x100
[44376.421192] netlink_unicast+0x1f6/0x2c0
[44376.421573] netlink_sendmsg+0x232/0x4a0
[44376.421980] sock_sendmsg+0x38/0x60
[44376.422328] ____sys_sendmsg+0x1d0/0x1e0
[44376.422709] ? copy_msghdr_from_user+0x6d/0xa0
[44376.423127] ___sys_sendmsg+0x80/0xc0
[44376.423495] ? ___sys_recvmsg+0x8b/0xc0
[44376.423869] __sys_sendmsg+0x51/0x90
[44376.424226] do_syscall_64+0x3d/0x90
[44376.424587] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[44376.425046] RIP: 0033:0x7f045134f887
[44376.425403] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[44376.426914] RSP: 002b:00007ffd63a82b98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[44376.427592] RAX: ffffffffffffffda RBX: 000000006481955f RCX: 00007f045134f887
[44376.428195] RDX: 0000000000000000 RSI: 00007ffd63a82c00 RDI: 0000000000000003
[44376.428796] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[44376.429404] R10: 00007f0451208708 R11: 0000000000000246 R12: 0000000000000001
[44376.430039] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400
[44376.430644] </TASK>
[44376.430907] Modules linked in: mlx5_ib mlx5_core act_mirred act_tunnel_key cls_flower vxlan dummy sch_ingress openvswitch nsh rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_g
ss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: mlx5_core]
[44376.433936] ---[ end trace 0000000000000000 ]---
[44376.434373] RIP: 0010:mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core]
[44376.434951] Code: 00 48 8b b8 68 ce 02 00 e8 8a 4d 02 00 4c 8d a8 a8 01 00 00 4c 89 ef e8 8b 79 88 e1 48 8b 83 98 06 00 00 48 8b 93 90 06 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 90 06
[44376.436452] RSP: 0018:ffff88812cc97570 EFLAGS: 00010246
[44376.436924] RAX: dead000000000122 RBX: ffff8881088e3800 RCX: ffff8881881bac00
[44376.437530] RDX: dead000000000100 RSI: ffff88812cc97500 RDI: ffff8881242f71b0
[44376.438179] RBP: ffff88811cbb0940 R08: 0000000000000400 R09: 0000000000000001
[44376.438786] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88812c944000
[44376.439393] R13: ffff8881242f71a8 R14: ffff8881222b4000 R15: 0000000000000000
[44376.439998] FS: 00007f0451104800(0000) GS:ffff88852cb80000(0000) knlGS:0000000000000000
[44376.440714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44376.441225] CR2: 0000000000489108 CR3: 0000000123a79003 CR4: 0000000000370ea0
[44376.441843] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[44376.442471] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: ad86755b18d5 ("net/mlx5e: Protect unready flows with dedicated lock")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# fb7be476 06-Apr-2023 Chris Mi <cmi@nvidia.com>

net/mlx5e: TC, Cleanup ct resources for nic flow

The cited commit removes special handling of CT action. But it
removes too much. Pre ct/ct_nat tables and some other resources
are not destroyed due to the cited commit.

Fix it by adding it back.

Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 90ca127c 02-Jun-2023 Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Devcom, introduce devcom_for_each_peer_entry

Introduce generic APIs which will retrieve all peers.
This API replace mlx5_devcom_get/release_peer_data which retrieve
only a single peer.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e67f928a 07-Feb-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Devcom, Rename paired to ready

In downstream patch devcom will provide support for more than two
devices. The term 'paired' will be renamed as 'ready' to convey a
more accurate meaning.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9be6c21f 06-Feb-2023 Shay Drory <shayd@nvidia.com>

net/mlx5e: Handle offloads flows per peer

Currently, E-switch offloads table have a list of all flows that
create a peer_flow over the peer eswitch.
In order to support more than one peer, extend E-switch offloads
table peer_flow to hold an array of lists, where each peer have
dedicate index via mlx5_get_dev_index(). Thereafter, extend original
flow to hold an array of peers as well.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 0af3613d 30-Mar-2022 Mark Bloch <mbloch@nvidia.com>

net/mlx5e: en_tc, re-factor query route port

query for peer esw outside of if scope.
This is preparation for query route port over multiple peers.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b1661efa 29-Mar-2022 Shay Drory <shayd@nvidia.com>

net/mlx5e: tc, Refactor peer add/del flow

Move peer_eswitch outside mlx5e_tc_add_fdb_peer_flow() so downstream
patch can call mlx5e_tc_add_fdb_peer_flow() with multiple peers.
Move peer_eswitch in the remove flow as well in order to keep symmetry.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 953bb24d 29-Mar-2022 Mark Bloch <mbloch@nvidia.com>

net/mlx5e: en_tc, Extend peer flows to a list

Currently, mlx5e_flow is holding a pointer to a peer_flow, in case one
was created. e.g. There is an assumption that mlx5e_flow can have only
one peer.
In order to support more than one peer, refactor mlx5e_flow to hold a
list of peer flows.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f4356947 29-May-2023 Ido Schimmel <idosch@nvidia.com>

flow_offload: Reject matching on layer 2 miss

Adjust drivers that support the 'FLOW_DISSECTOR_KEY_META' key to reject
filters that try to match on the newly added layer 2 miss field. Add an
extack message to clearly communicate the failure reason to user space.

The following users were not patched:

1. mtk_flow_offload_replace(): Only checks that the key is present, but
does not do anything with it.
2. mlx5_tc_ct_set_tuple_match(): Used as part of netfilter offload,
which does not make use of the new field, unlike tc.
3. get_netdev_from_rule() in nfp: Likewise.

Example:

# tc filter add dev swp1 egress pref 1 proto all flower skip_sw l2_miss true action drop
Error: mlxsw_spectrum: Can't match on "l2_miss".
We have an error talking to the kernel

Acked-by: Elad Nachman <enachman@marvell.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5d862ec6 22-May-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Fix post parse infra to only parse every action once

Caller of mlx5e_tc_act_post_parse() needs it to parse only the subset of
actions starting after previous split and ending at the current action.
However, that range is not provided as arguments and
mlx5e_tc_act_post_parse() uses generic flow_action_for_each() that iterates
over all flow actions. Not only this is redundant, it also causes a bug
when mlx5e_tc_act->post_parse() callback is not idempotent since it will be
called for every split. For example, ct action tc_act_post_parse_ct()
callback obtains a reference to mlx5_ct_ft instance and calling it several
times during parsing stage will cause reference counter imbalance.

Fix the issue by providing a proper action range of the current split
subset to mlx5e_tc_act_post_parse() and only calling
mlx5e_tc_act->post_parse() for actions inside the subset range.

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e2ab5aa1 01-Mar-2023 Chris Mi <cmi@nvidia.com>

net/mlx5e: Extract remaining tunnel encap code to dedicated file

Move set_encap_dests() and clean_encap_dests() to the tunnel encap
dedicated file. And rename them to mlx5e_tc_tun_encap_dests_set()
and mlx5e_tc_tun_encap_dests_unset().

No functional change in this patch. It is needed in the next patch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# dfa1e46d 26-Apr-2023 Paul Blakey <paulb@nvidia.com>

net/mlx5e: TC, Fix using eswitch mapping in nic mode

Cited patch is using the eswitch object mapping pool while
in nic mode where it isn't initialized. This results in the
trace below [0].

Fix that by using either nic or eswitch object mapping pool
depending if eswitch is enabled or not.

[0]:
[ 826.446057] ==================================================================
[ 826.446729] BUG: KASAN: slab-use-after-free in mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
[ 826.447515] Read of size 8 at addr ffff888194485830 by task tc/6233

[ 826.448243] CPU: 16 PID: 6233 Comm: tc Tainted: G W 6.3.0-rc6+ #1
[ 826.448890] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 826.449785] Call Trace:
[ 826.450052] <TASK>
[ 826.450302] dump_stack_lvl+0x33/0x50
[ 826.450650] print_report+0xc2/0x610
[ 826.450998] ? __virt_addr_valid+0xb1/0x130
[ 826.451385] ? mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
[ 826.451935] kasan_report+0xae/0xe0
[ 826.452276] ? mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
[ 826.452829] mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
[ 826.453368] ? __kmalloc_node+0x5a/0x120
[ 826.453733] esw_add_restore_rule+0x20f/0x270 [mlx5_core]
[ 826.454288] ? mlx5_eswitch_add_send_to_vport_meta_rule+0x260/0x260 [mlx5_core]
[ 826.455011] ? mutex_unlock+0x80/0xd0
[ 826.455361] ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[ 826.455862] ? mapping_add+0x2cb/0x440 [mlx5_core]
[ 826.456425] mlx5e_tc_action_miss_mapping_get+0x139/0x180 [mlx5_core]
[ 826.457058] ? mlx5e_tc_update_skb_nic+0xb0/0xb0 [mlx5_core]
[ 826.457636] ? __kasan_kmalloc+0x77/0x90
[ 826.458000] ? __kmalloc+0x57/0x120
[ 826.458336] mlx5_tc_ct_flow_offload+0x325/0xe40 [mlx5_core]
[ 826.458916] ? ct_kernel_enter.constprop.0+0x48/0xa0
[ 826.459360] ? mlx5_tc_ct_parse_action+0xf0/0xf0 [mlx5_core]
[ 826.459933] ? mlx5e_mod_hdr_attach+0x491/0x520 [mlx5_core]
[ 826.460507] ? mlx5e_mod_hdr_get+0x12/0x20 [mlx5_core]
[ 826.461046] ? mlx5e_tc_attach_mod_hdr+0x154/0x170 [mlx5_core]
[ 826.461635] mlx5e_configure_flower+0x969/0x2110 [mlx5_core]
[ 826.462217] ? _raw_spin_lock_bh+0x85/0xe0
[ 826.462597] ? __mlx5e_add_fdb_flow+0x750/0x750 [mlx5_core]
[ 826.463163] ? kasan_save_stack+0x2e/0x40
[ 826.463534] ? down_read+0x115/0x1b0
[ 826.463878] ? down_write_killable+0x110/0x110
[ 826.464288] ? tc_setup_action.part.0+0x9f/0x3b0
[ 826.464701] ? mlx5e_is_uplink_rep+0x4c/0x90 [mlx5_core]
[ 826.465253] ? mlx5e_tc_reoffload_flows_work+0x130/0x130 [mlx5_core]
[ 826.465878] tc_setup_cb_add+0x112/0x250
[ 826.466247] fl_hw_replace_filter+0x230/0x310 [cls_flower]
[ 826.466724] ? fl_hw_destroy_filter+0x1a0/0x1a0 [cls_flower]
[ 826.467212] fl_change+0x14e1/0x2030 [cls_flower]
[ 826.467636] ? sock_def_readable+0x89/0x120
[ 826.468019] ? fl_tmplt_create+0x2d0/0x2d0 [cls_flower]
[ 826.468509] ? kasan_unpoison+0x23/0x50
[ 826.468873] ? get_random_u16+0x180/0x180
[ 826.469244] ? __radix_tree_lookup+0x2b/0x130
[ 826.469640] ? fl_get+0x7b/0x140 [cls_flower]
[ 826.470042] ? fl_mask_put+0x200/0x200 [cls_flower]
[ 826.470478] ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[ 826.470973] ? fl_tmplt_create+0x2d0/0x2d0 [cls_flower]
[ 826.471427] tc_new_tfilter+0x644/0x1050
[ 826.471795] ? tc_get_tfilter+0x860/0x860
[ 826.472170] ? __thaw_task+0x130/0x130
[ 826.472525] ? arch_stack_walk+0x98/0xf0
[ 826.472892] ? cap_capable+0x9f/0xd0
[ 826.473235] ? security_capable+0x47/0x60
[ 826.473608] rtnetlink_rcv_msg+0x1d5/0x550
[ 826.473985] ? rtnl_calcit.isra.0+0x1f0/0x1f0
[ 826.474383] ? __stack_depot_save+0x35/0x4c0
[ 826.474779] ? kasan_save_stack+0x2e/0x40
[ 826.475149] ? kasan_save_stack+0x1e/0x40
[ 826.475518] ? __kasan_record_aux_stack+0x9f/0xb0
[ 826.475939] ? task_work_add+0x77/0x1c0
[ 826.476305] netlink_rcv_skb+0xe0/0x210
[ 826.476661] ? rtnl_calcit.isra.0+0x1f0/0x1f0
[ 826.477057] ? netlink_ack+0x7c0/0x7c0
[ 826.477412] ? rhashtable_jhash2+0xef/0x150
[ 826.477796] ? _copy_from_iter+0x105/0x770
[ 826.484386] netlink_unicast+0x346/0x490
[ 826.484755] ? netlink_attachskb+0x400/0x400
[ 826.485145] ? kernel_text_address+0xc2/0xd0
[ 826.485535] netlink_sendmsg+0x3b0/0x6c0
[ 826.485902] ? kernel_text_address+0xc2/0xd0
[ 826.486296] ? netlink_unicast+0x490/0x490
[ 826.486671] ? iovec_from_user.part.0+0x7a/0x1a0
[ 826.487083] ? netlink_unicast+0x490/0x490
[ 826.487461] sock_sendmsg+0x73/0xc0
[ 826.487803] ____sys_sendmsg+0x364/0x380
[ 826.488186] ? import_iovec+0x7/0x10
[ 826.488531] ? kernel_sendmsg+0x30/0x30
[ 826.488893] ? __copy_msghdr+0x180/0x180
[ 826.489258] ? kasan_save_stack+0x2e/0x40
[ 826.489629] ? kasan_save_stack+0x1e/0x40
[ 826.490002] ? __kasan_record_aux_stack+0x9f/0xb0
[ 826.490424] ? __call_rcu_common.constprop.0+0x46/0x580
[ 826.490876] ___sys_sendmsg+0xdf/0x140
[ 826.491231] ? copy_msghdr_from_user+0x110/0x110
[ 826.491649] ? fget_raw+0x120/0x120
[ 826.491988] ? ___sys_recvmsg+0xd9/0x130
[ 826.492355] ? folio_batch_add_and_move+0x80/0xa0
[ 826.492776] ? _raw_spin_lock+0x7a/0xd0
[ 826.493137] ? _raw_spin_lock+0x7a/0xd0
[ 826.493500] ? _raw_read_lock_irq+0x30/0x30
[ 826.493880] ? kasan_set_track+0x21/0x30
[ 826.494249] ? kasan_save_free_info+0x2a/0x40
[ 826.494650] ? do_sys_openat2+0xff/0x270
[ 826.495016] ? __fget_light+0x1b5/0x200
[ 826.495377] ? __virt_addr_valid+0xb1/0x130
[ 826.495763] __sys_sendmsg+0xb2/0x130
[ 826.496118] ? __sys_sendmsg_sock+0x20/0x20
[ 826.496501] ? __x64_sys_rseq+0x2e0/0x2e0
[ 826.496874] ? do_user_addr_fault+0x276/0x820
[ 826.497273] ? fpregs_assert_state_consistent+0x52/0x60
[ 826.497727] ? exit_to_user_mode_prepare+0x30/0x120
[ 826.498158] do_syscall_64+0x3d/0x90
[ 826.498502] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 826.498949] RIP: 0033:0x7f9b67f4f887
[ 826.499294] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[ 826.500742] RSP: 002b:00007fff5d1a5498 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 826.501395] RAX: ffffffffffffffda RBX: 0000000064413ce6 RCX: 00007f9b67f4f887
[ 826.501975] RDX: 0000000000000000 RSI: 00007fff5d1a5500 RDI: 0000000000000003
[ 826.502556] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
[ 826.503135] R10: 00007f9b67e08708 R11: 0000000000000246 R12: 0000000000000001
[ 826.503714] R13: 0000000000000001 R14: 00007fff5d1a9800 R15: 0000000000485400
[ 826.504304] </TASK>

[ 826.504753] Allocated by task 3764:
[ 826.505090] kasan_save_stack+0x1e/0x40
[ 826.505453] kasan_set_track+0x21/0x30
[ 826.505810] __kasan_kmalloc+0x77/0x90
[ 826.506164] __mlx5_create_flow_table+0x16d/0xbb0 [mlx5_core]
[ 826.506742] esw_offloads_enable+0x60d/0xfb0 [mlx5_core]
[ 826.507292] mlx5_eswitch_enable_locked+0x4d3/0x680 [mlx5_core]
[ 826.507885] mlx5_devlink_eswitch_mode_set+0x2a3/0x580 [mlx5_core]
[ 826.508513] devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0
[ 826.508969] genl_family_rcv_msg_doit.isra.0+0x146/0x1c0
[ 826.509427] genl_rcv_msg+0x28d/0x3e0
[ 826.509772] netlink_rcv_skb+0xe0/0x210
[ 826.510133] genl_rcv+0x24/0x40
[ 826.510448] netlink_unicast+0x346/0x490
[ 826.510810] netlink_sendmsg+0x3b0/0x6c0
[ 826.511179] sock_sendmsg+0x73/0xc0
[ 826.511519] __sys_sendto+0x18d/0x220
[ 826.511867] __x64_sys_sendto+0x72/0x80
[ 826.512232] do_syscall_64+0x3d/0x90
[ 826.512576] entry_SYSCALL_64_after_hwframe+0x46/0xb0

[ 826.513220] Freed by task 5674:
[ 826.513535] kasan_save_stack+0x1e/0x40
[ 826.513893] kasan_set_track+0x21/0x30
[ 826.514245] kasan_save_free_info+0x2a/0x40
[ 826.514629] ____kasan_slab_free+0x11a/0x1b0
[ 826.515021] __kmem_cache_free+0x14d/0x280
[ 826.515399] tree_put_node+0x109/0x1c0 [mlx5_core]
[ 826.515907] mlx5_destroy_flow_table+0x119/0x630 [mlx5_core]
[ 826.516481] esw_offloads_steering_cleanup+0xe7/0x150 [mlx5_core]
[ 826.517084] esw_offloads_disable+0xe0/0x160 [mlx5_core]
[ 826.517632] mlx5_eswitch_disable_locked+0x26c/0x290 [mlx5_core]
[ 826.518225] mlx5_devlink_eswitch_mode_set+0x128/0x580 [mlx5_core]
[ 826.518834] devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0
[ 826.519286] genl_family_rcv_msg_doit.isra.0+0x146/0x1c0
[ 826.519748] genl_rcv_msg+0x28d/0x3e0
[ 826.520101] netlink_rcv_skb+0xe0/0x210
[ 826.520458] genl_rcv+0x24/0x40
[ 826.520771] netlink_unicast+0x346/0x490
[ 826.521137] netlink_sendmsg+0x3b0/0x6c0
[ 826.521505] sock_sendmsg+0x73/0xc0
[ 826.521842] __sys_sendto+0x18d/0x220
[ 826.522191] __x64_sys_sendto+0x72/0x80
[ 826.522554] do_syscall_64+0x3d/0x90
[ 826.522894] entry_SYSCALL_64_after_hwframe+0x46/0xb0

[ 826.523540] Last potentially related work creation:
[ 826.523969] kasan_save_stack+0x1e/0x40
[ 826.524331] __kasan_record_aux_stack+0x9f/0xb0
[ 826.524739] insert_work+0x30/0x130
[ 826.525078] __queue_work+0x34b/0x690
[ 826.525426] queue_work_on+0x48/0x50
[ 826.525766] __rhashtable_remove_fast_one+0x4af/0x4d0 [mlx5_core]
[ 826.526365] del_sw_flow_group+0x1b5/0x270 [mlx5_core]
[ 826.526898] tree_put_node+0x109/0x1c0 [mlx5_core]
[ 826.527407] esw_offloads_steering_cleanup+0xd3/0x150 [mlx5_core]
[ 826.528009] esw_offloads_disable+0xe0/0x160 [mlx5_core]
[ 826.528616] mlx5_eswitch_disable_locked+0x26c/0x290 [mlx5_core]
[ 826.529218] mlx5_devlink_eswitch_mode_set+0x128/0x580 [mlx5_core]
[ 826.529823] devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0
[ 826.530276] genl_family_rcv_msg_doit.isra.0+0x146/0x1c0
[ 826.530733] genl_rcv_msg+0x28d/0x3e0
[ 826.531079] netlink_rcv_skb+0xe0/0x210
[ 826.531439] genl_rcv+0x24/0x40
[ 826.531755] netlink_unicast+0x346/0x490
[ 826.532123] netlink_sendmsg+0x3b0/0x6c0
[ 826.532487] sock_sendmsg+0x73/0xc0
[ 826.532825] __sys_sendto+0x18d/0x220
[ 826.533175] __x64_sys_sendto+0x72/0x80
[ 826.533533] do_syscall_64+0x3d/0x90
[ 826.533877] entry_SYSCALL_64_after_hwframe+0x46/0xb0

[ 826.534521] The buggy address belongs to the object at ffff888194485800
which belongs to the cache kmalloc-512 of size 512
[ 826.535506] The buggy address is located 48 bytes inside of
freed 512-byte region [ffff888194485800, ffff888194485a00)

[ 826.536666] The buggy address belongs to the physical page:
[ 826.537138] page:00000000d75841dd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x194480
[ 826.537915] head:00000000d75841dd order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 826.538595] flags: 0x200000000010200(slab|head|node=0|zone=2)
[ 826.539089] raw: 0200000000010200 ffff888100042c80 ffffea0004523800 dead000000000002
[ 826.539755] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[ 826.540417] page dumped because: kasan: bad access detected

[ 826.541095] Memory state around the buggy address:
[ 826.541519] ffff888194485700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 826.542149] ffff888194485780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 826.542773] >ffff888194485800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 826.543400] ^
[ 826.543822] ffff888194485880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 826.544452] ffff888194485900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 826.545079] ==================================================================

Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 691c041b 31-Mar-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Fix deadlock in tc route query code

Cited commit causes ABBA deadlock[0] when peer flows are created while
holding the devcom rw semaphore. Due to peer flows offload implementation
the lock is taken much higher up the call chain and there is no obvious way
to easily fix the deadlock. Instead, since tc route query code needs the
peer eswitch structure only to perform a lookup in xarray and doesn't
perform any sleeping operations with it, refactor the code for lockless
execution in following ways:

- RCUify the devcom 'data' pointer. When resetting the pointer
synchronously wait for RCU grace period before returning. This is fine
since devcom is currently only used for synchronization of
pairing/unpairing of eswitches which is rare and already expensive as-is.

- Wrap all usages of 'paired' boolean in {READ|WRITE}_ONCE(). The flag has
already been used in some unlocked contexts without proper
annotations (e.g. users of mlx5_devcom_is_paired() function), but it wasn't
an issue since all relevant code paths checked it again after obtaining the
devcom semaphore. Now it is also used by mlx5_devcom_get_peer_data_rcu() as
"best effort" check to return NULL when devcom is being unpaired. Note that
while RCU read lock doesn't prevent the unpaired flag from being changed
concurrently it still guarantees that reader can continue to use 'data'.

- Refactor mlx5e_tc_query_route_vport() function to use new
mlx5_devcom_get_peer_data_rcu() API which fixes the deadlock.

[0]:

[ 164.599612] ======================================================
[ 164.600142] WARNING: possible circular locking dependency detected
[ 164.600667] 6.3.0-rc3+ #1 Not tainted
[ 164.601021] ------------------------------------------------------
[ 164.601557] handler1/3456 is trying to acquire lock:
[ 164.601998] ffff88811f1714b0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}, at: mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.603078]
but task is already holding lock:
[ 164.603617] ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
[ 164.604459]
which lock already depends on the new lock.

[ 164.605190]
the existing dependency chain (in reverse order) is:
[ 164.605848]
-> #1 (&comp->sem){++++}-{3:3}:
[ 164.606380] down_read+0x39/0x50
[ 164.606772] mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
[ 164.607336] mlx5e_tc_query_route_vport+0x86/0xc0 [mlx5_core]
[ 164.607914] mlx5e_tc_tun_route_lookup+0x1a4/0x1d0 [mlx5_core]
[ 164.608495] mlx5e_attach_decap_route+0xc6/0x1e0 [mlx5_core]
[ 164.609063] mlx5e_tc_add_fdb_flow+0x1ea/0x360 [mlx5_core]
[ 164.609627] __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
[ 164.610175] mlx5e_configure_flower+0x952/0x1a20 [mlx5_core]
[ 164.610741] tc_setup_cb_add+0xd4/0x200
[ 164.611146] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[ 164.611661] fl_change+0xc95/0x18a0 [cls_flower]
[ 164.612116] tc_new_tfilter+0x3fc/0xd20
[ 164.612516] rtnetlink_rcv_msg+0x418/0x5b0
[ 164.612936] netlink_rcv_skb+0x54/0x100
[ 164.613339] netlink_unicast+0x190/0x250
[ 164.613746] netlink_sendmsg+0x245/0x4a0
[ 164.614150] sock_sendmsg+0x38/0x60
[ 164.614522] ____sys_sendmsg+0x1d0/0x1e0
[ 164.614934] ___sys_sendmsg+0x80/0xc0
[ 164.615320] __sys_sendmsg+0x51/0x90
[ 164.615701] do_syscall_64+0x3d/0x90
[ 164.616083] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 164.616568]
-> #0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}:
[ 164.617210] __lock_acquire+0x159e/0x26e0
[ 164.617638] lock_acquire+0xc2/0x2a0
[ 164.618018] __mutex_lock+0x92/0xcd0
[ 164.618401] mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.618943] post_process_attr+0x153/0x2d0 [mlx5_core]
[ 164.619471] mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core]
[ 164.620021] __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
[ 164.620564] mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core]
[ 164.621125] tc_setup_cb_add+0xd4/0x200
[ 164.621531] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[ 164.622047] fl_change+0xc95/0x18a0 [cls_flower]
[ 164.622500] tc_new_tfilter+0x3fc/0xd20
[ 164.622906] rtnetlink_rcv_msg+0x418/0x5b0
[ 164.623324] netlink_rcv_skb+0x54/0x100
[ 164.623727] netlink_unicast+0x190/0x250
[ 164.624138] netlink_sendmsg+0x245/0x4a0
[ 164.624544] sock_sendmsg+0x38/0x60
[ 164.624919] ____sys_sendmsg+0x1d0/0x1e0
[ 164.625340] ___sys_sendmsg+0x80/0xc0
[ 164.625731] __sys_sendmsg+0x51/0x90
[ 164.626117] do_syscall_64+0x3d/0x90
[ 164.626502] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 164.626995]
other info that might help us debug this:

[ 164.627725] Possible unsafe locking scenario:

[ 164.628268] CPU0 CPU1
[ 164.628683] ---- ----
[ 164.629098] lock(&comp->sem);
[ 164.629421] lock(&esw->offloads.encap_tbl_lock);
[ 164.630066] lock(&comp->sem);
[ 164.630555] lock(&esw->offloads.encap_tbl_lock);
[ 164.630993]
*** DEADLOCK ***

[ 164.631575] 3 locks held by handler1/3456:
[ 164.631962] #0: ffff888124b75130 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200
[ 164.632703] #1: ffff888116e512b8 (&esw->mode_lock){++++}-{3:3}, at: mlx5_esw_hold+0x39/0x50 [mlx5_core]
[ 164.633552] #2: ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
[ 164.634435]
stack backtrace:
[ 164.634883] CPU: 17 PID: 3456 Comm: handler1 Not tainted 6.3.0-rc3+ #1
[ 164.635431] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 164.636340] Call Trace:
[ 164.636616] <TASK>
[ 164.636863] dump_stack_lvl+0x47/0x70
[ 164.637217] check_noncircular+0xfe/0x110
[ 164.637601] __lock_acquire+0x159e/0x26e0
[ 164.637977] ? mlx5_cmd_set_fte+0x5b0/0x830 [mlx5_core]
[ 164.638472] lock_acquire+0xc2/0x2a0
[ 164.638828] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.639339] ? lock_is_held_type+0x98/0x110
[ 164.639728] __mutex_lock+0x92/0xcd0
[ 164.640074] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.640576] ? __lock_acquire+0x382/0x26e0
[ 164.640958] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.641468] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.641965] mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
[ 164.642454] ? lock_release+0xbf/0x240
[ 164.642819] post_process_attr+0x153/0x2d0 [mlx5_core]
[ 164.643318] mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core]
[ 164.643835] __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
[ 164.644340] mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core]
[ 164.644862] ? lock_acquire+0xc2/0x2a0
[ 164.645219] tc_setup_cb_add+0xd4/0x200
[ 164.645588] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
[ 164.646067] fl_change+0xc95/0x18a0 [cls_flower]
[ 164.646488] tc_new_tfilter+0x3fc/0xd20
[ 164.646861] ? tc_del_tfilter+0x810/0x810
[ 164.647236] rtnetlink_rcv_msg+0x418/0x5b0
[ 164.647621] ? rtnl_setlink+0x160/0x160
[ 164.647982] netlink_rcv_skb+0x54/0x100
[ 164.648348] netlink_unicast+0x190/0x250
[ 164.648722] netlink_sendmsg+0x245/0x4a0
[ 164.649090] sock_sendmsg+0x38/0x60
[ 164.649434] ____sys_sendmsg+0x1d0/0x1e0
[ 164.649804] ? copy_msghdr_from_user+0x6d/0xa0
[ 164.650213] ___sys_sendmsg+0x80/0xc0
[ 164.650563] ? lock_acquire+0xc2/0x2a0
[ 164.650926] ? lock_acquire+0xc2/0x2a0
[ 164.651286] ? __fget_files+0x5/0x190
[ 164.651644] ? find_held_lock+0x2b/0x80
[ 164.652006] ? __fget_files+0xb9/0x190
[ 164.652365] ? lock_release+0xbf/0x240
[ 164.652723] ? __fget_files+0xd3/0x190
[ 164.653079] __sys_sendmsg+0x51/0x90
[ 164.653435] do_syscall_64+0x3d/0x90
[ 164.653784] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 164.654229] RIP: 0033:0x7f378054f8bd
[ 164.654577] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 6a c3 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 be c3 f4 ff 48
[ 164.656041] RSP: 002b:00007f377fa114b0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[ 164.656701] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f378054f8bd
[ 164.657297] RDX: 0000000000000000 RSI: 00007f377fa11540 RDI: 0000000000000014
[ 164.657885] RBP: 00007f377fa12278 R08: 0000000000000000 R09: 000000000000015c
[ 164.658472] R10: 00007f377fa123d0 R11: 0000000000000293 R12: 0000560962d99bd0
[ 164.665317] R13: 0000000000000000 R14: 0000560962d99bd0 R15: 00007f377fa11540

Fixes: f9d196bd632b ("net/mlx5e: Use correct eswitch for stack devices with lag")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2be5bd42 20-Mar-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Handle pairing of E-switch via uplink un/load APIs

In case user switch a device from switchdev mode to legacy mode, mlx5
first unpair the E-switch and afterwards unload the uplink vport.
From the other hand, in case user remove or reload a device, mlx5
first unload the uplink vport and afterwards unpair the E-switch.

The latter is causing a bug[1], hence, handle pairing of E-switch as
part of uplink un/load APIs.

[1]
In case VF_LAG is used, every tc fdb flow is duplicated to the peer
esw. However, the original esw keeps a pointer to this duplicated
flow, not the peer esw.
e.g.: if user create tc fdb flow over esw0, the flow is duplicated
over esw1, in FW/HW, but in SW, esw0 keeps a pointer to the duplicated
flow.
During module unload while a peer tc fdb flow is still offloaded, in
case the first device to be removed is the peer device (esw1 in the
example above), the peer net-dev is destroyed, and so the mlx5e_priv
is memset to 0.
Afterwards, the peer device is trying to unpair himself from the
original device (esw0 in the example above). Unpair API invoke the
original device to clear peer flow from its eswitch (esw0), but the
peer flow, which is stored over the original eswitch (esw0), is
trying to use the peer mlx5e_priv, which is memset to 0 and result in
bellow kernel-oops.

[ 157.964081 ] BUG: unable to handle page fault for address: 000000000002ce60
[ 157.964662 ] #PF: supervisor read access in kernel mode
[ 157.965123 ] #PF: error_code(0x0000) - not-present page
[ 157.965582 ] PGD 0 P4D 0
[ 157.965866 ] Oops: 0000 [#1] SMP
[ 157.967670 ] RIP: 0010:mlx5e_tc_del_fdb_flow+0x48/0x460 [mlx5_core]
[ 157.976164 ] Call Trace:
[ 157.976437 ] <TASK>
[ 157.976690 ] __mlx5e_tc_del_fdb_peer_flow+0xe6/0x100 [mlx5_core]
[ 157.977230 ] mlx5e_tc_clean_fdb_peer_flows+0x67/0x90 [mlx5_core]
[ 157.977767 ] mlx5_esw_offloads_unpair+0x2d/0x1e0 [mlx5_core]
[ 157.984653 ] mlx5_esw_offloads_devcom_event+0xbf/0x130 [mlx5_core]
[ 157.985212 ] mlx5_devcom_send_event+0xa3/0xb0 [mlx5_core]
[ 157.985714 ] esw_offloads_disable+0x5a/0x110 [mlx5_core]
[ 157.986209 ] mlx5_eswitch_disable_locked+0x152/0x170 [mlx5_core]
[ 157.986757 ] mlx5_eswitch_disable+0x51/0x80 [mlx5_core]
[ 157.987248 ] mlx5_unload+0x2a/0xb0 [mlx5_core]
[ 157.987678 ] mlx5_uninit_one+0x5f/0xd0 [mlx5_core]
[ 157.988127 ] remove_one+0x64/0xe0 [mlx5_core]
[ 157.988549 ] pci_device_remove+0x31/0xa0
[ 157.988933 ] device_release_driver_internal+0x18f/0x1f0
[ 157.989402 ] driver_detach+0x3f/0x80
[ 157.989754 ] bus_remove_driver+0x70/0xf0
[ 157.990129 ] pci_unregister_driver+0x34/0x90
[ 157.990537 ] mlx5_cleanup+0xc/0x1c [mlx5_core]
[ 157.990972 ] __x64_sys_delete_module+0x15a/0x250
[ 157.991398 ] ? exit_to_user_mode_prepare+0xea/0x110
[ 157.991840 ] do_syscall_64+0x3d/0x90
[ 157.992198 ] entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Fixes: 1418ddd96afd ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# dc614025 24-Jan-2023 Paul Blakey <paulb@nvidia.com>

net/mlx5e: TC, Remove mirror and ct limitation

Mirror action before a ct nat action was not supported when only
chain was restored on misses. As to work around that limitation,
ct action was reordered to be first (so if hw misses on ct
action, packet wasn't modified). This reordering wasn't possible
if there was mirror action before the ct nat action, as we had to
mirror the packet before the nat operation.

Now that the misses continue from the relevant tc ct action
in software and ct action is no longer reordered, this case
is supported.

Remove this limitation.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 5d7cb06e 25-Jan-2023 Paul Blakey <paulb@nvidia.com>

net/mlx5e: TC, Remove tuple rewrite and ct limitation

Tuple rewrite and ct action was not supported when only chain was
restored on misses. To work around that limitation, ct action was
reordered to be first (so if hw misses on ct action, packet wasn't
modified). This reordering wasn't possible for tuple rewrite
actions before ct action since the ct action result was
dependent on the tuple info.

Now that the misses continue from the relevant tc ct action
in software and ct action is no longer reordered, this case
is supported.

Remove this limitation.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 08fe94ec 25-Jan-2023 Paul Blakey <paulb@nvidia.com>

net/mlx5e: TC, Remove special handling of CT action

CT action has special treating as a per-flow action since
it was assumed to be singular and reordered to be first on
the action list.

This isn't the case anymore, and can be converted to just a
FWD to pre_ct + MODIFY_HEAD, and handled per post_act rule.

Remove special handling of CT action, and offload it while
post parsing each ct attribute.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 67efaf45 04-Dec-2022 Paul Blakey <paulb@nvidia.com>

net/mlx5e: TC, Remove CT action reordering

CT action reordering was done as a workaround when CT misses
used to restore the relevant filter's tc chain and continuing sw processing
from that chain. As such, there was a need to reorder CT action to be before
any packet modifying actions (e.g mac rewrite).

Currently (after patch "net/mlx5e: TC, Set CT miss to the specific ct
action instance"), CT misses continues from the relevant ct action in
software, and so reordering isn't needed anymore.

Remove the reordering.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 13aca17b 12-Feb-2023 Paul Blakey <paulb@nvidia.com>

net/mlx5e: CT: Use per action stats

CT action can miss in a middle of an action list, use
per action stats to correctly report stats for missed
packets.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a830ec48 25-Jan-2023 Paul Blakey <paulb@nvidia.com>

net/mlx5e: TC, Move main flow attribute cleanup to helper func

Actions that can be setup per flow attribute (so per split rule)
are cleaned up from mlx5_free_flow_attr(), mlx5e_tc_del_fdb_flow(),
and free_flow_post_acts().

Remove the duplication by re-using the helper function for
the main flow attribute and split rules attributes.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7195d9a0 30-Nov-2022 Paul Blakey <paulb@nvidia.com>

net/mlx5e: TC, Remove unused vf_tun variable

vf_tun is being assigned but never being used so remove it.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 0840c9f7 13-Mar-2023 Paul Blakey <paulb@nvidia.com>

net/mlx5e: Set default can_offload action

Many parsers of tc actions just return true on their can_offload()
implementation, without checking the input flow/action.
Set the default can_offload action to true (allow), and avoid
having many can_offload implementations that do just that.

This patch doesn't change any functionality.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8e80e564 14-Mar-2023 Paul Blakey <paulb@nvidia.com>

net/mlx5: fs_chains: Refactor to detach chains from tc usage

To support more generic chains that will be used on other
namespaces and without tc, refactor to remove the dependency
on tc terms.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Link: https://lore.kernel.org/r/bb8570d532d569285b5bff981578507bd15350cb.1678714336.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 8a0594c0 13-Mar-2023 Gal Pressman <gal@nvidia.com>

net/mlx5e: Add more information to hairpin table dump

Print the number of hairpin queues and size as part of the hairpin table
dump.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-13-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 1bffcea4 13-Mar-2023 Gal Pressman <gal@nvidia.com>

net/mlx5e: Add devlink hairpin queues parameters

We refer to a TC NIC rule that involves forwarding as "hairpin".
Hairpin queues are mlx5 hardware specific implementation for hardware
forwarding of such packets.

Per the discussion in [1], move the hairpin queues control (number and
size) from debugfs to devlink.

Expose two devlink params:
- hairpin_num_queues: control the number of hairpin queues
- hairpin_queue_size: control the size (in packets) of the hairpin queues

[1] https://lore.kernel.org/all/20230111194608.7f15b9a1@kernel.org/

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-12-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 028522e2 13-Mar-2023 Gal Pressman <gal@nvidia.com>

net/mlx5: Move needed PTYS functions to core layer

Downstream patches require devlink params to access the PTYS register,
move the needed functions from mlx5e to the core layer.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230314054234.267365-11-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 6e9d51b1 01-Mar-2023 Roy Novich <royno@nvidia.com>

net/mlx5e: Initialize link speed to zero

mlx5e_port_max_linkspeed does not guarantee value assignment for speed.
Avoid cases where link_speed might be used uninitialized. In case
mlx5e_port_max_linkspeed fails, a default link speed of 50000 will be
used for the calculations.

Fixes: 3f6d08d196b2 ("net/mlx5e: Add RSS support for hairpin")
Signed-off-by: Roy Novich <royno@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b23bf10c 21-Feb-2023 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, fix cloned flow attribute

Currently the cloned flow attr resets the original tc action cookies
count.
Fix that by resetting the cloned flow attribute.

Fixes: cca7eac13856 ("net/mlx5e: TC, store tc action cookies per attr")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 1166add4 21-Feb-2023 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, fix missing error code

Missing error code when mlx5e_tc_act_stats_create fails

Fixes: d13674b1d14c ("net/mlx5e: TC, map tc action cookie to a hw counter")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c9668f0b 12-Feb-2023 Paul Blakey <paulb@nvidia.com>

net/mlx5e: Fix cleanup null-ptr deref on encap lock

During module is unloaded while a peer tc flow is still offloaded,
first the peer uplink rep profile is changed to a nic profile, and so
neigh encap lock is destroyed. Next during unload, the VF reps netdevs
are unregistered which causes the original non-peer tc flow to be deleted,
which deletes the peer flow. The peer flow deletion detaches the encap
entry and try to take the already destroyed encap lock, causing the
below trace.

Fix this by clearing peer flows during tc eswitch cleanup
(mlx5e_tc_esw_cleanup()).

Relevant trace:
[ 4316.837128] BUG: kernel NULL pointer dereference, address: 00000000000001d8
[ 4316.842239] RIP: 0010:__mutex_lock+0xb5/0xc40
[ 4316.851897] Call Trace:
[ 4316.852481] <TASK>
[ 4316.857214] mlx5e_rep_neigh_entry_release+0x93/0x790 [mlx5_core]
[ 4316.858258] mlx5e_rep_encap_entry_detach+0xa7/0xf0 [mlx5_core]
[ 4316.859134] mlx5e_encap_dealloc+0xa3/0xf0 [mlx5_core]
[ 4316.859867] clean_encap_dests.part.0+0x5c/0xe0 [mlx5_core]
[ 4316.860605] mlx5e_tc_del_fdb_flow+0x32a/0x810 [mlx5_core]
[ 4316.862609] __mlx5e_tc_del_fdb_peer_flow+0x1a2/0x250 [mlx5_core]
[ 4316.863394] mlx5e_tc_del_flow+0x(/0x630 [mlx5_core]
[ 4316.864090] mlx5e_flow_put+0x5f/0x100 [mlx5_core]
[ 4316.864771] mlx5e_delete_flower+0x4de/0xa40 [mlx5_core]
[ 4316.865486] tc_setup_cb_reoffload+0x20/0x80
[ 4316.865905] fl_reoffload+0x47c/0x510 [cls_flower]
[ 4316.869181] tcf_block_playback_offloads+0x91/0x1d0
[ 4316.869649] tcf_block_unbind+0xe7/0x1b0
[ 4316.870049] tcf_block_offload_cmd.isra.0+0x1ee/0x270
[ 4316.879266] tcf_block_offload_unbind+0x61/0xa0
[ 4316.879711] __tcf_block_put+0xa4/0x310

Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Fixes: 1418ddd96afd ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 28d3815a 08-Feb-2023 Maor Dickman <maord@nvidia.com>

net/mlx5: E-switch, Fix missing set of split_count when forward to ovs internal port

Rules with mirror actions are split to two FTEs when the actions after the mirror
action contains pedit, vlan push/pop or ct. Forward to ovs internal port adds
implicit header rewrite (pedit) but missing trigger to do split.

Fix by setting split_count when forwarding to ovs internal port which
will trigger split in mirror rules.

Fixes: 27484f7170ed ("net/mlx5e: Offload tc rules that redirect to ovs internal port")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a6b811cb 22-Feb-2023 Gal Pressman <gal@nvidia.com>

net/mlx5e: Remove hairpin write debugfs files

Per the discussion in [1], hairpin parameters will be exposed using
devlink, remove the debugfs files.

[1] https://lore.kernel.org/all/20230111194608.7f15b9a1@kernel.org/

Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/all/20230222230202.523667-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 67027828 17-Feb-2023 Paul Blakey <paulb@nvidia.com>

net/mlx5e: TC, Set CT miss to the specific ct action instance

Currently, CT misses restore the missed chain on the tc skb extension so
tc will continue from the relevant chain. Instead, restore the CT action's
miss cookie on the extension, which will instruct tc to continue from the
this specific CT action instance on the relevant filter's action list.

Map the CT action's miss_cookie to a new miss object (ACT_MISS), and use
this miss mapping instead of the current chain miss object (CHAIN_MISS)
for CT action misses.

To restore this new miss mapping value, add a RX restore rule for each
such mapping value.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Sholmo <ozsh@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 235ff07d 17-Feb-2023 Paul Blakey <paulb@nvidia.com>

net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG

This reg usage is always a mapped object, not necessarily
containing chain info.

Rename to properly convey what it stores.
This patch doesn't change any functionality.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 93a1ab2c 17-Feb-2023 Paul Blakey <paulb@nvidia.com>

net/mlx5: Refactor tc miss handling to a single function

Move tc miss handling code to en_tc.c, and remove
duplicate code.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 03a283cd 17-Feb-2023 Paul Blakey <paulb@nvidia.com>

net/mlx5: Kconfig: Make tc offload depend on tc skb extension

Tc skb extension is a basic requirement for using tc
offload to support correct restoration on action miss.

Depend on it.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# db4b4902 17-Feb-2023 Paul Blakey <paulb@nvidia.com>

net/sched: Rename user cookie and act cookie

struct tc_action->act_cookie is a user defined cookie,
and the related struct flow_action_entry->act_cookie is
used as an handle similar to struct flow_cls_offload->cookie.

Rename tc_action->act_cookie to user_cookie, and
flow_action_entry->act_cookie to cookie so their names
would better fit their usage.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# b97653d8 29-Jan-2023 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Remove redundant parse_attr argument

The parse_attr argument is not being used in
actions_match_supported_fdb(). remove it.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8ce81fc0 01-Dec-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Add peer flow in mpesw mode

While at it rename mlx5_lag_mpesw_is_activated() to mlx5_lag_is_mpesw() to
be consistent with checking if other lag modes are activated.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a32327a3 28-Nov-2022 Roi Dayan <roid@nvidia.com>

net/mlx5: Lag, Control MultiPort E-Switch single FDB mode

MultiPort E-Switch builds on newer hardware's capabilities and introduces
a mode where a single E-Switch is used and all the vports and physical
ports on the NIC are connected to it.

The new mode will allow in the future a decrease in the memory used by the
driver and advanced features that aren't possible today.

This represents a big change in the current E-Switch implantation in mlx5.
Currently, by default, each E-Switch manager manages its E-Switch.
Steering rules in each E-Switch can only forward traffic to the native
physical port associated with that E-Switch. While there are ways to target
non-native physical ports, for example using a bond or via special TC
rules. None of the ways allows a user to configure the driver
to operate by default in such a mode nor can the driver decide
to move to this mode by default as it's user configuration-driven right now.

While MultiPort E-Switch single FDB mode is the preferred mode, older
generations of ConnectX hardware couldn't support this mode so it was never
implemented. Now that there is capable hardware present, start the
transition to having this mode by default.

Introduce a devlink parameter to control MultiPort E-Switch single FDB mode.
This will allow users to select this mode on their system right now
and in the future will allow the driver to move to this mode by default.

Example:
$ devlink dev param set pci/0000:00:0b.0 name esw_multiport value 1 \
cmode runtime

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2b68d659 12-Feb-2023 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, support per action stats

Extend the action stats callback implementation to update stats for actions
that are associated with hw counters.
Note that the callback may be called from tc action utility or from tc
flower. Both apis expect the driver to return the stats difference from
the last update. As such, query the raw counter value and maintain
the diff from the last api call in the tc layer, instead of the fs_core
layer.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# d13674b1 12-Feb-2023 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, map tc action cookie to a hw counter

Currently a hardware counter is associated with a flow cookie.
This does not apply to flows using branching action which are required to
return per action stats.

A single counter may apply to multiple actions.
Scan the flow actions in reverse (from the last to the first action) while
caching the last counter.
Associate all the flow attribute tc action cookies with the current
cached counter.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# cca7eac1 12-Feb-2023 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, store tc action cookies per attr

The tc parse action phase translates the tc actions to mlx5 flow
attributes data structure that is used during the flow offload phase.
Currently, the flow offload stage instantiates hw counters while
associating them to flow cookie. However, flows with branching
actions are required to associate a hardware counter with its action
cookies.

Store the parsed tc action cookies on the flow attribute.
Use the list of cookies in the next patch to associate a tc action cookie
with its allocated hw counter.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# e9d1061d 12-Feb-2023 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, add hw counter to branching actions

Currently a hw count action is appended to the last action of the action
list. However, a branching action may terminate the action list before
reaching the last action.

Append a count action to a branching action.
In the next patches, filters with branching actions will read this counter
when reporting stats per action.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# afae6254 19-Jan-2023 Gal Pressman <gal@nvidia.com>

net/mlx5e: Remove incorrect debugfs_create_dir NULL check in hairpin

Remove the NULL check on debugfs_create_dir() return value as the
function returns an ERR pointer on failure, not NULL.
The check is not replaced with a IS_ERR_OR_NULL() as
debugfs_create_file(), and debugfs functions in general don't need error
checking.

Fixes: 0e414518d6d8 ("net/mlx5e: Add hairpin debugfs files")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 633ad4b2 21-Sep-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: Remove redundant code for handling vlan actions

Remove unused code which was used only with deprecated HW
which didn't support vlan actions.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 521933cd 20-Dec-2022 Maor Dickman <maord@nvidia.com>

net/mlx5e: Support Geneve and GRE with VF tunnel offload

Today VF tunnel offload (tunnel endpoint is on VF) is implemented
by indirect table which use rules that match on VXLAN VNI to
recirculated to root table, this limit the support for only
VXLAN tunnels.

This patch change indirect table to use one single match all rule
to recirculated to root table which is added when any tunnel decap
rule is added with tunnel endpoint is VF. This allow support of
Geneve and GRE with this configuration.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ef78b8d5 21-Sep-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Use common function allocating flow mod hdr or encap mod hdr

Use mlx5e_tc_attach_mod_hdr() when allocating encap mod hdr and
remove mlx5e_tc_add_flow_mod_hdr() which is not being used now.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c43182e6 20-Sep-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Add tc prefix to attach/detach hdr functions

Currently there are confusing names for attach/detach functions.

mlx5e_attach_mod_hdr() vs mlx5e_mod_hdr_attach()
mlx5e_detach_mod_hdr() vs mlx5e_mod_hdr_detach()

Add tc prefix to the functions that are in en_tc.c to separate
from the functions in mod_hdr.c which has the mod_hdr prefix.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 82b56480 20-Sep-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Pass flow attr to attach/detach mod hdr functions

In preparation to remove duplicate functions handling mod hdr allocation
and the fact that modify hdr should be per flow attr and not flow
pass flow attr to the attach and detach mod hdr funcs.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 0e414518 16-Nov-2022 Gal Pressman <gal@nvidia.com>

net/mlx5e: Add hairpin debugfs files

We refer to a TC NIC rule that involves forwarding as "hairpin".
Hairpin queues are mlx5 hardware specific implementation for hardware
forwarding of such packets.

For debug purposes, introduce debugfs files which:
* Expose the number of active hairpins
* Dump the hairpin table
* Allow control over the number and size of the hairpin queues instead
of the hard-coded values.

This allows us to get visibility of the feature in order to improve it
for next generation hardware.

Add debugfs files:
fs/tc/hairpin_num_active
fs/tc/hairpin_num_queues
fs/tc/hairpin_queue_size
fs/tc/hairpin_table_dump

Note that the new values will only take effect on the next queues
creation, it does not affect existing queues.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 1a803472 23-Nov-2022 Gal Pressman <gal@nvidia.com>

net/mlx5e: Add hairpin params structure

In preparation for downstream work to expose hairpin queues parameters,
introduce a hairpin parameters struct as part of the tc structure.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 5aa56105 15-Dec-2022 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Avoid false lock dependency warning on tc_ht even more

The cited commit changed class of tc_ht internal mutex in order to avoid
false lock dependency with fs_core node and flow_table hash table
structures. However, hash table implementation internally also includes a
workqueue task with its own lockdep map which causes similar bogus lockdep
splat[0]. Fix it by also adding dedicated class for hash table workqueue
work structure of tc_ht.

[0]:

[ 1139.672465] ======================================================
[ 1139.673552] WARNING: possible circular locking dependency detected
[ 1139.674635] 6.1.0_for_upstream_debug_2022_12_12_17_02 #1 Not tainted
[ 1139.675734] ------------------------------------------------------
[ 1139.676801] modprobe/5998 is trying to acquire lock:
[ 1139.677726] ffff88811e7b93b8 (&node->lock){++++}-{3:3}, at: down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.679662]
but task is already holding lock:
[ 1139.680703] ffff88813c1f96a0 (&tc_ht_lock_key){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x38/0x6f0
[ 1139.682223]
which lock already depends on the new lock.

[ 1139.683640]
the existing dependency chain (in reverse order) is:
[ 1139.684887]
-> #2 (&tc_ht_lock_key){+.+.}-{3:3}:
[ 1139.685975] __mutex_lock+0x12c/0x14b0
[ 1139.686659] rht_deferred_worker+0x35/0x1540
[ 1139.687405] process_one_work+0x7c2/0x1310
[ 1139.688134] worker_thread+0x59d/0xec0
[ 1139.688820] kthread+0x28f/0x330
[ 1139.689444] ret_from_fork+0x1f/0x30
[ 1139.690106]
-> #1 ((work_completion)(&ht->run_work)){+.+.}-{0:0}:
[ 1139.691250] __flush_work+0xe8/0x900
[ 1139.691915] __cancel_work_timer+0x2ca/0x3f0
[ 1139.692655] rhashtable_free_and_destroy+0x22/0x6f0
[ 1139.693472] del_sw_flow_table+0x22/0xb0 [mlx5_core]
[ 1139.694592] tree_put_node+0x24c/0x450 [mlx5_core]
[ 1139.695686] tree_remove_node+0x6e/0x100 [mlx5_core]
[ 1139.696803] mlx5_destroy_flow_table+0x187/0x690 [mlx5_core]
[ 1139.698017] mlx5e_tc_nic_cleanup+0x2f8/0x400 [mlx5_core]
[ 1139.699217] mlx5e_cleanup_nic_rx+0x2b/0x210 [mlx5_core]
[ 1139.700397] mlx5e_detach_netdev+0x19d/0x2b0 [mlx5_core]
[ 1139.701571] mlx5e_suspend+0xdb/0x140 [mlx5_core]
[ 1139.702665] mlx5e_remove+0x89/0x190 [mlx5_core]
[ 1139.703756] auxiliary_bus_remove+0x52/0x70
[ 1139.704492] device_release_driver_internal+0x3c1/0x600
[ 1139.705360] bus_remove_device+0x2a5/0x560
[ 1139.706080] device_del+0x492/0xb80
[ 1139.706724] mlx5_rescan_drivers_locked+0x194/0x6a0 [mlx5_core]
[ 1139.707961] mlx5_unregister_device+0x7a/0xa0 [mlx5_core]
[ 1139.709138] mlx5_uninit_one+0x5f/0x160 [mlx5_core]
[ 1139.710252] remove_one+0xd1/0x160 [mlx5_core]
[ 1139.711297] pci_device_remove+0x96/0x1c0
[ 1139.722721] device_release_driver_internal+0x3c1/0x600
[ 1139.723590] unbind_store+0x1b1/0x200
[ 1139.724259] kernfs_fop_write_iter+0x348/0x520
[ 1139.725019] vfs_write+0x7b2/0xbf0
[ 1139.725658] ksys_write+0xf3/0x1d0
[ 1139.726292] do_syscall_64+0x3d/0x90
[ 1139.726942] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1139.727769]
-> #0 (&node->lock){++++}-{3:3}:
[ 1139.728698] __lock_acquire+0x2cf5/0x62f0
[ 1139.729415] lock_acquire+0x1c1/0x540
[ 1139.730076] down_write+0x8e/0x1f0
[ 1139.730709] down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.731841] mlx5_del_flow_rules+0x6f/0x610 [mlx5_core]
[ 1139.732982] __mlx5_eswitch_del_rule+0xdd/0x560 [mlx5_core]
[ 1139.734207] mlx5_eswitch_del_offloaded_rule+0x14/0x20 [mlx5_core]
[ 1139.735491] mlx5e_tc_rule_unoffload+0x104/0x2b0 [mlx5_core]
[ 1139.736716] mlx5e_tc_unoffload_fdb_rules+0x10c/0x1f0 [mlx5_core]
[ 1139.738007] mlx5e_tc_del_fdb_flow+0xc3c/0xfa0 [mlx5_core]
[ 1139.739213] mlx5e_tc_del_flow+0x146/0xa20 [mlx5_core]
[ 1139.740377] _mlx5e_tc_del_flow+0x38/0x60 [mlx5_core]
[ 1139.741534] rhashtable_free_and_destroy+0x3be/0x6f0
[ 1139.742351] mlx5e_tc_ht_cleanup+0x1b/0x30 [mlx5_core]
[ 1139.743512] mlx5e_cleanup_rep_tx+0x4a/0xe0 [mlx5_core]
[ 1139.744683] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
[ 1139.745860] mlx5e_netdev_change_profile+0xd9/0x1c0 [mlx5_core]
[ 1139.747098] mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core]
[ 1139.748372] mlx5e_vport_rep_unload+0x16a/0x1b0 [mlx5_core]
[ 1139.749590] __esw_offloads_unload_rep+0xb1/0xd0 [mlx5_core]
[ 1139.750813] mlx5_eswitch_unregister_vport_reps+0x409/0x5f0 [mlx5_core]
[ 1139.752147] mlx5e_rep_remove+0x62/0x80 [mlx5_core]
[ 1139.753293] auxiliary_bus_remove+0x52/0x70
[ 1139.754028] device_release_driver_internal+0x3c1/0x600
[ 1139.754885] driver_detach+0xc1/0x180
[ 1139.755553] bus_remove_driver+0xef/0x2e0
[ 1139.756260] auxiliary_driver_unregister+0x16/0x50
[ 1139.757059] mlx5e_rep_cleanup+0x19/0x30 [mlx5_core]
[ 1139.758207] mlx5e_cleanup+0x12/0x30 [mlx5_core]
[ 1139.759295] mlx5_cleanup+0xc/0x49 [mlx5_core]
[ 1139.760384] __x64_sys_delete_module+0x2b5/0x450
[ 1139.761166] do_syscall_64+0x3d/0x90
[ 1139.761827] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1139.762663]
other info that might help us debug this:

[ 1139.763925] Chain exists of:
&node->lock --> (work_completion)(&ht->run_work) --> &tc_ht_lock_key

[ 1139.765743] Possible unsafe locking scenario:

[ 1139.766688] CPU0 CPU1
[ 1139.767399] ---- ----
[ 1139.768111] lock(&tc_ht_lock_key);
[ 1139.768704] lock((work_completion)(&ht->run_work));
[ 1139.769869] lock(&tc_ht_lock_key);
[ 1139.770770] lock(&node->lock);
[ 1139.771326]
*** DEADLOCK ***

[ 1139.772345] 2 locks held by modprobe/5998:
[ 1139.772994] #0: ffff88813c1ff0e8 (&dev->mutex){....}-{3:3}, at: device_release_driver_internal+0x8d/0x600
[ 1139.774399] #1: ffff88813c1f96a0 (&tc_ht_lock_key){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x38/0x6f0
[ 1139.775822]
stack backtrace:
[ 1139.776579] CPU: 3 PID: 5998 Comm: modprobe Not tainted 6.1.0_for_upstream_debug_2022_12_12_17_02 #1
[ 1139.777935] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 1139.779529] Call Trace:
[ 1139.779992] <TASK>
[ 1139.780409] dump_stack_lvl+0x57/0x7d
[ 1139.781015] check_noncircular+0x278/0x300
[ 1139.781687] ? print_circular_bug+0x460/0x460
[ 1139.782381] ? rcu_read_lock_sched_held+0x3f/0x70
[ 1139.783121] ? lock_release+0x487/0x7c0
[ 1139.783759] ? orc_find.part.0+0x1f1/0x330
[ 1139.784423] ? mark_lock.part.0+0xef/0x2fc0
[ 1139.785091] __lock_acquire+0x2cf5/0x62f0
[ 1139.785754] ? register_lock_class+0x18e0/0x18e0
[ 1139.786483] lock_acquire+0x1c1/0x540
[ 1139.787093] ? down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.788195] ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
[ 1139.788978] ? register_lock_class+0x18e0/0x18e0
[ 1139.789715] down_write+0x8e/0x1f0
[ 1139.790292] ? down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.791380] ? down_write_killable+0x220/0x220
[ 1139.792080] ? find_held_lock+0x2d/0x110
[ 1139.792713] down_write_ref_node+0x7c/0xe0 [mlx5_core]
[ 1139.793795] mlx5_del_flow_rules+0x6f/0x610 [mlx5_core]
[ 1139.794879] __mlx5_eswitch_del_rule+0xdd/0x560 [mlx5_core]
[ 1139.796032] ? __esw_offloads_unload_rep+0xd0/0xd0 [mlx5_core]
[ 1139.797227] ? xa_load+0x11a/0x200
[ 1139.797800] ? __xa_clear_mark+0xf0/0xf0
[ 1139.798438] mlx5_eswitch_del_offloaded_rule+0x14/0x20 [mlx5_core]
[ 1139.799660] mlx5e_tc_rule_unoffload+0x104/0x2b0 [mlx5_core]
[ 1139.800821] mlx5e_tc_unoffload_fdb_rules+0x10c/0x1f0 [mlx5_core]
[ 1139.802049] ? mlx5_eswitch_get_uplink_priv+0x25/0x80 [mlx5_core]
[ 1139.803260] mlx5e_tc_del_fdb_flow+0xc3c/0xfa0 [mlx5_core]
[ 1139.804398] ? __cancel_work_timer+0x1c2/0x3f0
[ 1139.805099] ? mlx5e_tc_unoffload_from_slow_path+0x460/0x460 [mlx5_core]
[ 1139.806387] mlx5e_tc_del_flow+0x146/0xa20 [mlx5_core]
[ 1139.807481] _mlx5e_tc_del_flow+0x38/0x60 [mlx5_core]
[ 1139.808564] rhashtable_free_and_destroy+0x3be/0x6f0
[ 1139.809336] ? mlx5e_tc_del_flow+0xa20/0xa20 [mlx5_core]
[ 1139.809336] ? mlx5e_tc_del_flow+0xa20/0xa20 [mlx5_core]
[ 1139.810455] mlx5e_tc_ht_cleanup+0x1b/0x30 [mlx5_core]
[ 1139.811552] mlx5e_cleanup_rep_tx+0x4a/0xe0 [mlx5_core]
[ 1139.812655] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
[ 1139.813768] mlx5e_netdev_change_profile+0xd9/0x1c0 [mlx5_core]
[ 1139.814952] mlx5e_netdev_attach_nic_profile+0x1b/0x30 [mlx5_core]
[ 1139.816166] mlx5e_vport_rep_unload+0x16a/0x1b0 [mlx5_core]
[ 1139.817336] __esw_offloads_unload_rep+0xb1/0xd0 [mlx5_core]
[ 1139.818507] mlx5_eswitch_unregister_vport_reps+0x409/0x5f0 [mlx5_core]
[ 1139.819788] ? mlx5_eswitch_uplink_get_proto_dev+0x30/0x30 [mlx5_core]
[ 1139.821051] ? kernfs_find_ns+0x137/0x310
[ 1139.821705] mlx5e_rep_remove+0x62/0x80 [mlx5_core]
[ 1139.822778] auxiliary_bus_remove+0x52/0x70
[ 1139.823449] device_release_driver_internal+0x3c1/0x600
[ 1139.824240] driver_detach+0xc1/0x180
[ 1139.824842] bus_remove_driver+0xef/0x2e0
[ 1139.825504] auxiliary_driver_unregister+0x16/0x50
[ 1139.826245] mlx5e_rep_cleanup+0x19/0x30 [mlx5_core]
[ 1139.827322] mlx5e_cleanup+0x12/0x30 [mlx5_core]
[ 1139.828345] mlx5_cleanup+0xc/0x49 [mlx5_core]
[ 1139.829382] __x64_sys_delete_module+0x2b5/0x450
[ 1139.830119] ? module_flags+0x300/0x300
[ 1139.830750] ? task_work_func_match+0x50/0x50
[ 1139.831440] ? task_work_cancel+0x20/0x20
[ 1139.832088] ? lockdep_hardirqs_on_prepare+0x273/0x3f0
[ 1139.832873] ? syscall_enter_from_user_mode+0x1d/0x50
[ 1139.833661] ? trace_hardirqs_on+0x2d/0x100
[ 1139.834328] do_syscall_64+0x3d/0x90
[ 1139.834922] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1139.835700] RIP: 0033:0x7f153e71288b
[ 1139.836302] Code: 73 01 c3 48 8b 0d 9d 75 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6d 75 0e 00 f7 d8 64 89 01 48
[ 1139.838866] RSP: 002b:00007ffe0a3ed938 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 1139.840020] RAX: ffffffffffffffda RBX: 0000564c2cbf8220 RCX: 00007f153e71288b
[ 1139.841043] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000564c2cbf8288
[ 1139.842072] RBP: 0000564c2cbf8220 R08: 0000000000000000 R09: 0000000000000000
[ 1139.843094] R10: 00007f153e7a3ac0 R11: 0000000000000206 R12: 0000564c2cbf8288
[ 1139.844118] R13: 0000000000000000 R14: 0000564c2cbf7ae8 R15: 00007ffe0a3efcb8

Fixes: 9ba33339c043 ("net/mlx5e: Avoid false lock depenency warning on tc_ht")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 5e72f3f1 29-Aug-2022 Ariel Levkovich <lariel@nvidia.com>

net/mlx5e: TC, Keep mod hdr actions after mod hdr alloc

When offloading TC NIC rule which has mod_hdr action, the
mod_hdr actions list is freed upon mod_hdr allocation.

In the new format of handling multi table actions and CT in
particular, the mod_hdr actions list is still relevant when
setting the pre and post rules and therefore, freeing the list
may cause adding rules which don't set the FTE_ID.

Therefore, the mod_hdr actions list needs to be kept for the
pre/post flows as well and should be left for these handler to
be freed.

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6fda078d 02-Nov-2022 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, add support for meter mtu offload

Initialize the meter object with the TC police mtu parameter.
Use the hardware range destination to compare the pkt len to the mtu setting.
Assign the range destination hit/miss ft to the police conform/exceed
attributes.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d5671325 23-Aug-2022 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: meter, add mtu post meter tables

TC police action may configure the maximum packet size to be handled by
the policer, in addition to byte/packet rate.
MTU check is realized in hardware using the range destination, specifying
a hit ft, if packet len is in the range, or miss ft otherwise.

Instantiate mtu green/red flow tables with a single match-all rule.
Add the green/red actions to the hit/miss table accordingly.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3603f266 03-Dec-2022 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, allow meter jump control action

Separate the matchall police action validation from flower validation.
Isolate the action validation logic in the police action parser.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-12-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 0d8c38d4 03-Dec-2022 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, init post meter rules with branching attributes

Instantiate the post meter actions with the platform initialized branching
action attributes.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-11-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 3fcb94e3 03-Dec-2022 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, rename post_meter actions

Currently post meter supports only the pipe/drop conform-exceed policy.
This assumption is reflected in several variable names.
Rename the following variables as a pre-step for using the generalized
branching action platform.

Rename fwd_green_rule/drop_red_rule to green_rule/red_rule respectively.
Repurpose red_counter/green_counter to act_counter/drop_counter to allow
police conform-exceed configurations that do not drop.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-10-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c84fa1ab 03-Dec-2022 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, initialize branching action with target attr

Identify the jump target action when iterating the action list.
Initialize the jump target attr with the jumping attribute during the
parsing phase. Initialize the jumping attr post action with the target
during the offload phase.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-9-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f86488cb 03-Dec-2022 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, initialize branch flow attributes

Initialize flow attribute for drop, accept, pipe and jump branching actions.

Instantiate a flow attribute instance according to the specified branch
control action. Store the branching attributes on the branching action
flow attribute during the parsing phase. Then, during the offload phase,
allocate the relevant mod header objects to the branching actions.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-8-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 64426382 03-Dec-2022 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, validate action list per attribute

Currently the entire flow action list is validate for offload limitations.
For example, flow with both forward and drop actions are declared invalid
due to hardware restrictions.
However, a multi-table hardware model changes the limitations from a flow
scope to a single flow attribute scope.

Apply offload limitations to flow attributes instead of the entire flow.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-6-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 8facc02f 03-Dec-2022 Oz Shlomo <ozsh@nvidia.com>

net/mlx5e: TC, reuse flow attribute post parser processing

After the tc action parsing phase the flow attribute is initialized with
relevant eswitch offload objects such as tunnel, vlan, header modify and
counter attributes. The post processing is done both for fdb and post-action
attributes.

Reuse the flow attribute post parsing logic by both fdb and post-action
offloads.

Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221203221337.29267-4-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 94581080 04-Sep-2022 Gal Pressman <gal@nvidia.com>

net/mlx5e: Use clamp operation instead of open coding it

Replace the min/max operations with a single clamp.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f3774220 16-Nov-2022 Chris Mi <cmi@nvidia.com>

net/mlx5e: Offload rule only when all encaps are valid

The cited commit adds a for loop to support multiple encapsulations.
But it only checks if the last encap is valid.

Fix it by setting slow path flag when one of the encap is invalid.

Fixes: f493f15534ec ("net/mlx5e: Move flow attr reformat action bit to per dest flags")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7f1a6d4b 03-Nov-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Fix slab-out-of-bounds in parse_tc_actions

esw_attr is only allocated if namespace is fdb.

BUG: KASAN: slab-out-of-bounds in parse_tc_actions+0xdc6/0x10e0 [mlx5_core]
Write of size 4 at addr ffff88815f185b04 by task tc/2135

CPU: 5 PID: 2135 Comm: tc Not tainted 6.1.0-rc2+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x7d
print_report+0x170/0x471
? parse_tc_actions+0xdc6/0x10e0 [mlx5_core]
kasan_report+0xbc/0xf0
? parse_tc_actions+0xdc6/0x10e0 [mlx5_core]
parse_tc_actions+0xdc6/0x10e0 [mlx5_core]

Fixes: 94d651739e17 ("net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9e064308 03-Nov-2022 Jianbo Liu <jianbol@nvidia.com>

net/mlx5e: TC, Fix wrong rejection of packet-per-second policing

In the bellow commit, we added support for PPS policing without
removing the check which block offload of such cases.
Fix it by removing this check.

Fixes: a8d52b024d6d ("net/mlx5e: TC, Support offloading police action")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 94d65173 26-Oct-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed

On multi table split the driver creates a new attr instance with
data being copied from prev attr instance zeroing action flags.
Also need to reset dests properties to avoid incorrect dests per attr.

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-10-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f382a241 26-Oct-2022 Ariel Levkovich <lariel@nvidia.com>

net/mlx5e: TC, Reject forwarding from internal port to internal port

Reject TC rules that forward from internal port to internal port
as it is not supported.

This include rules that are explicitly have internal port as
the filter device as well as rules that apply on tunnel interfaces
as the route device for the tunnel interface can be an internal
port.

Fixes: 27484f7170ed ("net/mlx5e: Offload tc rules that redirect to ovs internal port")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-9-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 8dc47c05 26-Oct-2022 Paul Blakey <paulb@nvidia.com>

net/mlx5e: Update restore chain id for slow path packets

Currently encap slow path rules just forward to software without
setting the chain id miss register, so driver doesn't restore
the chain, and packets hitting this rule will restart from tc chain
0 instead of continuing to the chain the encap rule was on.

Fix this by setting the chain id miss register to the chain id mapping.

Fixes: 8f1e0b97cc70 ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-6-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 794131c4 01-Oct-2022 Jianbo Liu <jianbol@nvidia.com>

net/mlx5: E-Switch, Return EBUSY if can't get mode lock

It is to avoid tc retrying during device mode change.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 72e0bcd1 13-Jul-2022 Roi Dayan <roid@nvidia.com>

net/mlx5: TC, Add support for SF tunnel offload

VF tunnel flow already exists and SF tunnel is the
same flow. Support offloading of tunneling over SF device
by allow to attach an encap route over SF and set to use
indirect flow table on SF.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f52f2fae 10-Jan-2022 Lama Kayal <lkayal@nvidia.com>

net/mlx5e: Introduce flow steering API

Move mlx5e_flow_steering struct to fs_en.c to make it private.
Introduce flow_steering API and let other files go through it.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# af8bbf73 09-Jan-2022 Lama Kayal <lkayal@nvidia.com>

net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer

Make mlx5e_flow_steering member of mlx5e_priv a pointer.
Add dynamic allocation respectively.

Allocate fs for all profiles when initializing profile,
symmetrically deallocate at profile cleanup.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 23bde065 31-Jan-2022 Lama Kayal <lkayal@nvidia.com>

net/mlx5e: Make mlx5e_tc_table private

Move mlx5e_tc_table struct to en_tc.c thus make it private.
Introduce allocation and deallocation functions as part of the tc API
to allow this switch smoothly.

Convert mlx5e_nic_chain() macro to a function of en_tc.c.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 65f586c2 10-Jan-2022 Lama Kayal <lkayal@nvidia.com>

net/mlx5e: Convert mlx5e_tc_table member of mlx5e_flow_steering to pointer

Make fs.tc be a pointer and allocate it dynamically.
Add mlx5e_priv pointer to mlx5e_tc_table, and thus get a work-around to
accessing priv via tc when handling tc events inside mlx5e_tc_netdev_event.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f8e9d413 24-Jul-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Separate get/update/replace meter functions

mlx5e_tc_meter_get() to get an existing meter.
mlx5e_tc_meter_update() to update an existing meter without refcount.
mlx5e_tc_meter_replace() to get/create a meter and update if needed.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b50ce435 19-Jul-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: Add red and green counters for metering

Add red and green counters per meter instance.
TC police action is implemented as a meter instance.
The meter counters represent the police action
notexceed/exceed counters.
TC rules using the same meter instance will use
the same counters.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e5b1db27 22-Feb-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Allocate post meter ft per rule

To support a TC police action notexceed counter and supporting
actions other than drop/pipe there is a need to create separate ft
and rules per rule and not to use a common one created on eswitch init.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a8d52b02 22-Jun-2021 Jianbo Liu <jianbol@nvidia.com>

net/mlx5e: TC, Support offloading police action

Add parsing support by implementing struct mlx5e_tc_act for police
action.

TC rule with police actions is broken down into several rules in
different tables. One rule with the original match in the original
flow table, which set fte_id, do metering, and jump to the post_meter
table. If there are more police actions, more rules are created for
each of them. Besides, a last rule is created in the end.

In post_meter table, there are two pre-defined rules, one is to drop
packet if its packet color is RED, the other is to jump back to
post_act table. As fte_id is updated before jumping, the rule for next
meter is matched to do another round of metering (if there are
multiple meters in the flow rule). Otherwise, last fte_id is matched
and do the original actions.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 06fe52a4 18-Jun-2021 Jianbo Liu <jianbol@nvidia.com>

net/mlx5e: Add post meter table for flow metering

Flow meter object monitors the packets rate for the flows it is
attached to, and color packets with GREEN or RED. The post meter table
is used to check the color. Packet is dropped if it's RED, or
forwarded to post_act table if GREEN.

Packet color will be set to 8 LSB of the register C5, so they are
reserved for metering, which are previously used for matching fte id.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 17c5da03 31-Oct-2021 Jianbo Liu <jianbol@nvidia.com>

net/mlx5e: Add generic macros to use metadata register mapping

There are many definitions to get bits and mask for different types of
metadata register mapping, add generic macros to unify them.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 74e6b2a8 08-Jun-2021 Jianbo Liu <jianbol@nvidia.com>

net/mlx5e: Prepare for flow meter offload if hardware supports it

If flow meter aso object is supported, set the allocated range, and
initialize aso wqe.

The allocated range is indicated by log_meter_aso_granularity in HW
capabilities, and currently is 6.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d6c13d74 08-Jun-2022 Eli Cohen <elic@nvidia.com>

net/mlx5: TC, allow offload from uplink to other PF's VF

Redirecting traffic from uplink to a VF is a legal operation of
mulitport eswitch mode. Remove the limitation.

Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 4d1e07d8 04-Jul-2022 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Fix matchall police parameters validation

Referenced commit prepared the code for upcoming extension that allows mlx5
to offload police action attached to flower classifier. However, with
regard to existing matchall classifier offload validation should be
reversed as FLOW_ACTION_CONTINUE is the only supported notexceed police
action type. Fix the problem by allowing FLOW_ACTION_CONTINUE for police
action and extend scan_tc_matchall_fdb_actions() to only allow such actions
with matchall classifier.

Fixes: d97b4b105ce7 ("flow_offload: reject offload for all drivers with invalid police parameters")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 66cb64e2 02-May-2022 Maor Dickman <maord@nvidia.com>

net/mlx5e: TC NIC mode, fix tc chains miss table

The cited commit changed promisc table to be created on demand with the
highest priority in the NIC table replacing the vlan table, this caused
tc NIC tables miss flow to skip the prmoisc table because it use vlan
table as miss table.

OVS offload in NIC mode use promisc by default so any unicast packet
which will be handled by tc NIC tables miss flow will skip the promisc
rule and will be dropped.

Fix this by adding new empty table in new tc level with low priority and
point the nic tc chain miss to it, the new table is managed so it will
point to vlan table if promisc is disabled and to promisc table if enabled.

Fixes: 1c46d7409f30 ("net/mlx5e: Optimize promiscuous mode")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 94db3317 30-Jan-2022 Eli Cohen <elic@nvidia.com>

net/mlx5: Support multiport eswitch mode

Multiport eswitch mode is a LAG mode that allows to add rules that
forward traffic to a specific physical port without being affected by LAG
affinity configuration.

This mode of operation is mutual exclusive with the other LAG modes used
by multipath and bonding.

To make the transition between the modes, we maintain a counter on the
number of rules specifying one of the uplink representors as the target
of mirred egress redirect action.

An example of such rule would be:

$ tc filter add dev enp8s0f0_0 prot all root flower dst_mac \
00:11:22:33:44:55 action mirred egress redirect dev enp8s0f0

If the reference count just grows to one and LAG is not in use, we
create the LAG in multiport eswitch mode. Other mode changes are not
allowed while in this mode. When the reference count reaches zero, we
destroy the LAG and let other modes be used if needed.

logic also changed such that if forwarding to some uplink destination
cannot be guaranteed, we fail the operation so the rule will eventually
be in software and not in hardware.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ada09af9 28-Mar-2022 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Don't match double-vlan packets if cvlan is not set

Currently, match VLAN rule also matches packets that have multiple VLAN
headers. This behavior is similar to buggy flower classifier behavior that
has recently been fixed. Fix the issue by matching on
outer_second_cvlan_tag with value 0 which will cause the HW to verify the
packet doesn't contain second vlan header.

Fixes: 699e96ddf47f ("net/mlx5e: Support offloading tc double vlan headers match")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 371c2b34 24-Feb-2022 Dan Carpenter <dan.carpenter@oracle.com>

net/mlx5e: TC, Fix use after free in mlx5e_clone_flow_attr_for_post_act()

This returns freed memory leading to a use after free. It's supposed to
return NULL.

Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d97b4b10 24-Feb-2022 Jianbo Liu <jianbol@nvidia.com>

flow_offload: reject offload for all drivers with invalid police parameters

As more police parameters are passed to flow_offload, driver can check
them to make sure hardware handles packets in the way indicated by tc.
The conform-exceed control should be drop/pipe or drop/ok. Besides,
for drop/ok, the police should be the last action. As hardware can't
configure peakrate/avrate/overhead, offload should not be supported if
any of them is configured.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2a829fe2 21-Nov-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Clean redundant counter flag from tc action parsers

When tc actions being parsed only the last flow attr created needs the
counter flag and the previous flags being reset.
Clean the flag from the tc action parsers.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8300f225 08-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Create new flow attr for multi table actions

Some TC actions use post actions for their implementation.
For example CT and sample actions.

Create a new flow attr after each multi table action and
create a post action rule for it.

First flow attr being offloaded normally and linked to the next
attr (post action rule) with setting an id on reg_c.
Post action rules match the id on reg_c and continue to the next one.

The flow counter is allocated on the last rule.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 0610f8dc 12-Sep-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Pass actions param to actions_match_supported()

Currently the mlx5_flow object contains a single mlx5_attr instance.
However, multi table actions (e.g. CT) instantiate multiple attr instances.

Currently action_match_supported() reads the actions flag from the
flow's attribute instance. Modify the function to receive the action
flags as a parameter which is set by the calling function and
pass the aggregated actions to actions_match_supported().

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d1a3138f 24-Jan-2022 Paul Blakey <paulb@nvidia.com>

net/mlx5e: TC, Move flow hashtable to be per rep

To allow shared tc block offload between two or more reps of the
same eswitch, move the tc flow hashtable to be per rep, instead
of per eswitch.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a572c0a7 19-Dec-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: CT, Remove redundant flow args from tc ct calls

The flow arg is not being used so remove it.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 73a3f1bc 19-Dec-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Store mapped tunnel id on flow attr

In preparation for multiple attr instances the tunnel_id should
be attr specific and not flow specific.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 84ba8062 15-Dec-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Test CT and SAMPLE on flow attr

Currently the mlx5_flow object contains a single mlx5_attr instance.
However, multi table actions (e.g. CT) instantiate multiple attr instances.
Prepare for multiple attr instances by testing for CT or SAMPLE flag on attr
flags instead of flow flag.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e5d4e1da 19-Dec-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Refactor eswitch attr flags to just attr flags

The flags are flow attrs and not esw specific attr flags.
Refactor to remove the esw prefix and move from eswitch.h
to en_tc.h where struct mlx5_flow_attr exists.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# efe6f961 15-Dec-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: CT, Don't set flow flag CT for ct clear flow

ct clear action is a normal flow with a modify header for registers to
0. there is no need for any special handling in tc_ct.c.
Parsing of ct clear action still allocates mod acts to set 0 on the
registers and the driver continue to add a normal rule with modify hdr
context.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# eeed226e 05-Dec-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Hold sample_attr on stack instead of pointer

In later commit we are going to instantiate multiple attr instances
for flow instead of single attr.
Parsing TC sample allocates a new memory but there is no symmetric
cleanup in the infrastructure.
To avoid asymmetric alloc/free use sample_attr as part of the flow attr
and not allocated and held as a pointer.
This will avoid a cleanup leak when sample action is not on the first
attr.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ff993167 25-Nov-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Refactor mlx5e_tc_add_flow_mod_hdr() to get flow attr

In later commit we are going to instantiate multiple attr instances
for flow instead of single attr.
Make sure mlx5e_tc_add_flow_mod_hdr() use the correct attr and not flow->attr.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8be9686d 24-Nov-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Pass attr to tc_act can_offload()

In later commit we are going to instantiate multiple attr instances
for flow instead of single attr.
Make sure the parsing using correct attr and not flow->attr.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 918ed7bf 11-Nov-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Split pedit offloads verify from alloc_tc_pedit_action()

Split pedit verify part into a new subfunction for better
maintainability.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 09bf9792 10-Nov-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Move pedit_headers_action to parse_attr

Move pedit_headers_action from flow parse_state to flow parse_attr.
In a follow up commit we are going to have multiple attr per flow
and pedit_headers_action are unique per attr.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# df67ad62 10-Oct-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Move counter creation call to alloc_flow_attr_counter()

Move shared code to alloc_flow_attr_counter() for reuse by the next patches.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c118ebc9 10-Oct-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Pass attr arg for attaching/detaching encaps

In later commit that we will have multiple attr instances per flow
we would like to pass a specific attr instance to set encaps.

Currently the mlx5_flow object contains a single mlx5_attr instance.
However, multi table actions (e.g. CT) instantiate multiple attr instances.

Currently mlx5e_attach/detach_encap() reads the first attr instance
from the flow instance. Modify the functions to receive the attr
instance as a parameter which is set by the calling function.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 39542e23 23-Sep-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Move code chunk setting encap dests into its own function

Split setting encap dests code chunk out of mlx5e_tc_add_fdb_flow()
to make the function smaller for maintainability and reuse.
For symmetry do the same for mlx5e_tc_del_fdb_flow().
While at it refactor cleanup to first check for encap flag like
done when setting encap dests.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3d65492a 17-Jan-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Reject rules with forward and drop actions

Such rules are redundant but allowed and passed to the driver.
The driver does not support offloading such rules so return an error.

Fixes: 03a9d11e6eeb ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 23216d38 04-Jan-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Reject rules with drop and modify hdr action

This kind of action is not supported by firmware and generates a
syndrome.

kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)

Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 5b209d1a 01-Feb-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: Avoid implicit modify hdr for decap drop rule

Currently the driver adds implicit modify hdr action for
decap rules on tunnel devices if the port is an ovs port.
This is also done if the action is drop and makes the modify
hdr redundant and also the FW doesn't support it and will generate
a syndrome.

kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)

Fix it by adding the implicit modify hdr only for fwd actions.

Fixes: b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device")
Fixes: 077cdda764c7 ("net/mlx5e: TC, Fix memory leak with rules with internal port")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 5623ef8a 17-Jan-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Reject rules with forward and drop actions

Such rules are redundant but allowed and passed to the driver.
The driver does not support offloading such rules so return an error.

Fixes: 03a9d11e6eeb ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a2446bc7 04-Jan-2022 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Reject rules with drop and modify hdr action

This kind of action is not supported by firmware and generates a
syndrome.

kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)

Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b6dfff21 16-Jun-2021 Paul Blakey <paulb@nvidia.com>

net/mlx5e: Fix matching on modified inner ip_ecn bits

Tunnel device follows RFC 6040, and during decapsulation inner
ip_ecn might change depending on inner and outer ip_ecn as follows:

+---------+----------------------------------------+
|Arriving | Arriving Outer Header |
| Inner +---------+---------+---------+----------+
| Header | Not-ECT | ECT(0) | ECT(1) | CE |
+---------+---------+---------+---------+----------+
| Not-ECT | Not-ECT | Not-ECT | Not-ECT | <drop> |
| ECT(0) | ECT(0) | ECT(0) | ECT(1) | CE* |
| ECT(1) | ECT(1) | ECT(1) | ECT(1)* | CE* |
| CE | CE | CE | CE | CE |
+---------+---------+---------+---------+----------+

Cells marked above are changed from original inner packet ip_ecn value.

Tc then matches on the modified inner ip_ecn, but hw offload which
matches the inner ip_ecn value before decap, will fail.

Fix that by mapping all the cases of outer and inner ip_ecn matching,
and only supporting cases where we know inner wouldn't be changed by
decap, or in the outer ip_ecn=CE case, inner ip_ecn didn't matter.

Fixes: bcef735c59f2 ("net/mlx5e: Offload TC matching on tos/ttl for ip tunnels")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 35bb5242 15-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Move goto action checks into tc_action goto post parse op

Move goto action checks from parse nic/fdb funcs into the tc action
infra goto post parse op.
While moving this part also use NL_SET_ERR_MSG_MOD() instead of
NL_SET_ERR_MSG().

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c2208035 23-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Move vlan action chunk into tc action vlan post parse op

Move vlan prio tag rewrite handling into tc action infra vlan post parse op.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# dd5ab6d1 11-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Add post_parse() op to tc action infrastructure

The post_parse() op should be called after the parse op was called
for all actions. It could be an action state is dependent on other
actions. In the new op an action can fail the parse if the state
is not valid anymore.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6bcba1bd 12-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Move sample attr allocation to tc_action sample parse op

There is no reason to wait with the kmalloc to after parsing all
other actions. There could still be a failure later and before
offloading the rule. So alloc the mem when parsing.
The memory is being released on mlx5e_flow_put() which is called
also on error flow.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8333d53e 25-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC action parsing loop

Introduce a common function to implement the generic parsing loop.
The same function can be used for parsing NIC and FDB (Switchdev mode) flows.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 922d69ed 16-Nov-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Add redirect ingress to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3929ff58 27-Oct-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Add sample and ptype to tc_action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 758bc134 21-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Add ct to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ab3f3d5e 21-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Add mirred/redirect to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 163b766f 21-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Add mpls push/pop to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8ee72638 21-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Add vlan push/pop/mangle to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e36db1ee 22-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Add pedit to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9ca1bb2c 21-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Add csum to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c65686d7 21-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Add tunnel encap/decap to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 67d62ee7 21-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Add goto to tc action infra

Add parsing support by implementing struct mlx5e_tc_act
for this action.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# fad54790 12-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Add tc action infrastructure

Add an infrastructure to help parsing tc actions in a generic way.

Supporting an action parser means implementing struct mlx5e_tc_act
for that action.

The infrastructure will give the possibility to be generic when parsing tc
actions, i.e. parse_tc_nic_actions() and parse_tc_fdb_actions().
To parse tc actions a user needs to allocate a parse_state instance
and pass it when iterating over the tc actions parsers.
If a parser doesn't exists then a user can treat it as unsupported.

To add an action parser a user needs to implement two callbacks.
The can_offload() callback to quickly check if an action can be offloaded.
The parse_action() callback to do actual parsing and prepare for offload.

Add implementation for drop, trap, mark and accept action parsers with this
commit to act as examples and implement usage of the new infrastructure for
those actions.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d4bb0531 09-Nov-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Set flow attr ip_version earlier

Setting flow attr ip_version is not related to parsing tc flow actions.
It needs to be set after parsing flower matches which changes the spec.
So move it outside parse_tc_fdb_actions() and set it in
__mlx5e_add_fdb_flow().

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# df990477 28-Oct-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Move common flow_action checks into function

Remove duplicate checks on flow_action by using common function.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 70a140ea 03-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Remove redundant actions arg from vlan push/pop funcs

Passing actions is redundant and can be retrieved from flow attr.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3cc78411f 03-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Remove redundant actions arg from validate_goto_chain()

Passing actions is redundant and can be retrieved from flow.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9745dbe0 26-Oct-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Remove redundant action stack var

Remove the action stack var from parse tc fdb actions
and prase tc nic actions, use the flow attr action var directly.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 31108d14 06-Nov-2021 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'

All the error handling paths of 'mlx5e_tc_add_fdb_flow()' end to 'err_out'
where 'flow_flag_set(flow, FAILED);' is called.

All but the new error handling paths added by the commits given in the
Fixes tag below.

Fix these error handling paths and branch to 'err_out'.

Fixes: 166f431ec6be ("net/mlx5e: Add indirect tc offload of ovs internal port")
Fixes: b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 85c5f7c9 21-Sep-2021 Dmytro Linkin <dlinkin@nvidia.com>

net/mlx5: E-switch, Create QoS on demand

Don't create eswitch QoS (root TSAR) on switch mode change. Create it on
first child TSAR object creation - vport or rate group. Keep track
root TSAR references and release root TSAR with last object deletion.
No need to check for QoS is enabled when installing tc matchall filter.
Remove related helper function due to no users of it.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# fc3a879a 10-Nov-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Move comment about mod header flag to correct place

Move the comment to the correct place where the driver actually
removes the flag and not in the check that maybe pedit actions exists.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 88d97486 01-Nov-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Move kfree() calls after destroying all resources

When deleting fdb/nic flow rules first release all resources
and then call the kfree() calls instead of sparse them around
the function.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 972fe492 01-Nov-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Destroy nic flow counter if exists

Counter is only added if counter flag exists.
So check the counter fag exists for deleting the counter.
This is the same as in add/del fdb flow.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2c0e5cf5 05-Jul-2021 Paul Blakey <paulb@nvidia.com>

net/mlx5e: Refactor mod header management API

For all mod hdr related functions to reside in a single self contained
component (mod_hdr.c), refactor alloc() and add get_id() so that user
won't rely on internal implementation, and move both to mod_hdr
component.

Rename the prefix to mlx5e_mod_hdr_* as other mod hdr functions.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 077cdda7 22-Dec-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: TC, Fix memory leak with rules with internal port

Fix a memory leak with decap rule with internal port as destination
device. The driver allocates a modify hdr action but doesn't set
the flow attr modify hdr action which results in skipping releasing
the modify hdr action when releasing the flow.

backtrace:
[<000000005f8c651c>] krealloc+0x83/0xd0
[<000000009f59b143>] alloc_mod_hdr_actions+0x156/0x310 [mlx5_core]
[<000000002257f342>] mlx5e_tc_match_to_reg_set_and_get_id+0x12a/0x360 [mlx5_core]
[<00000000b44ea75a>] mlx5e_tc_add_fdb_flow+0x962/0x1470 [mlx5_core]
[<0000000003e384a0>] __mlx5e_add_fdb_flow+0x54c/0xb90 [mlx5_core]
[<00000000ed8b22b6>] mlx5e_configure_flower+0xe45/0x4af0 [mlx5_core]
[<00000000024f4ab5>] mlx5e_rep_indr_offload.isra.0+0xfe/0x1b0 [mlx5_core]
[<000000006c3bb494>] mlx5e_rep_indr_setup_tc_cb+0x90/0x130 [mlx5_core]
[<00000000d3dac2ea>] tc_setup_cb_add+0x1d2/0x420

Fixes: b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 4390c6ed 06-Nov-2021 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'

All the error handling paths of 'mlx5e_tc_add_fdb_flow()' end to 'err_out'
where 'flow_flag_set(flow, FAILED);' is called.

All but the new error handling paths added by the commits given in the
Fixes tag below.

Fix these error handling paths and branch to 'err_out'.

Fixes: 166f431ec6be ("net/mlx5e: Add indirect tc offload of ovs internal port")
Fixes: b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
(cherry picked from commit 31108d142f3632970f6f3e0224bd1c6781c9f87d)


# 2820110d 01-Dec-2021 Chris Mi <cmi@nvidia.com>

net/mlx5e: Delete forward rule for ct or sample action

When there is ct or sample action, the ct or sample rule will be deleted
and return. But if there is an extra mirror action, the forward rule can't
be deleted because of the return.

Fix it by removing the return.

Fixes: 69e2916ebce4 ("net/mlx5: CT: Add support for mirroring")
Fixes: f94d6389f6a8 ("net/mlx5e: TC, Add support to offload sample action")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 806401c2 08-Nov-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: CT, Fix multiple allocations and memleak of mod acts

CT clear action offload adds additional mod hdr actions to the
flow's original mod actions in order to clear the registers which
hold ct_state.
When such flow also includes encap action, a neigh update event
can cause the driver to unoffload the flow and then reoffload it.

Each time this happens, the ct clear handling adds that same set
of mod hdr actions to reset ct_state until the max of mod hdr
actions is reached.

Also the driver never releases the allocated mod hdr actions and
causing a memleak.

Fix above two issues by moving CT clear mod acts allocation
into the parsing actions phase and only use it when offloading the rule.
The release of mod acts will be done in the normal flow_put().

backtrace:
[<000000007316e2f3>] krealloc+0x83/0xd0
[<00000000ef157de1>] mlx5e_mod_hdr_alloc+0x147/0x300 [mlx5_core]
[<00000000970ce4ae>] mlx5e_tc_match_to_reg_set_and_get_id+0xd7/0x240 [mlx5_core]
[<0000000067c5fa17>] mlx5e_tc_match_to_reg_set+0xa/0x20 [mlx5_core]
[<00000000d032eb98>] mlx5_tc_ct_entry_set_registers.isra.0+0x36/0xc0 [mlx5_core]
[<00000000fd23b869>] mlx5_tc_ct_flow_offload+0x272/0x1f10 [mlx5_core]
[<000000004fc24acc>] mlx5e_tc_offload_fdb_rules.part.0+0x150/0x620 [mlx5_core]
[<00000000dc741c17>] mlx5e_tc_encap_flows_add+0x489/0x690 [mlx5_core]
[<00000000e92e49d7>] mlx5e_rep_update_flows+0x6e4/0x9b0 [mlx5_core]
[<00000000f60f5602>] mlx5e_rep_neigh_update+0x39a/0x5d0 [mlx5_core]

Fixes: 1ef3018f5af3 ("net/mlx5e: CT: Support clear action")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 362980ea 21-Oct-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Wait for concurrent flow deletion during neigh/fib events

Function mlx5e_take_tmp_flow() skips flows with zero reference count. This
can cause syndrome 0x179e84 when the called from neigh or route update code
and the skipped flow is not removed from the hardware by the time
underlying encap/decap resource is deleted. Add new completion
'del_hw_done' that is completed when flow is unoffloaded. This is safe to
do because flow with reference count zero needs to be detached from
encap/decap entry before its memory is deallocated, which requires taking
the encap_tbl_lock mutex that is held by the event handlers code.

Fixes: 8914add2c9e5 ("net/mlx5e: Handle FIB events to update tunnel endpoint device")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b16eb3c8 08-Jan-2021 Ariel Levkovich <lariel@nvidia.com>

net/mlx5: Support internal port as decap route device

When performing route device lookup for decap action, support
the case of ovs internal port as the lookup result.

In such case, an internal port struct is mapped and attached
to the flow attributes so that the source port matching of the
rule will match on the internal port's metadata value.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 166f431e 29-Apr-2021 Ariel Levkovich <lariel@nvidia.com>

net/mlx5e: Add indirect tc offload of ovs internal port

Register callbacks for tc blocks of ovs internal port devices.

This allows an indirect offloading rules that apply on
such devices as the filter device.

In case a rule is added to a tc block of an internal port,
the mlx5 driver will implicitly add a matching on the internal
port's unique vport metadata value to the rule's matching list.
Therefore, only packets that previously hit a rule that redirects
to an internal port and got the vport metadata overwritten to the
internal port's unique metadata, can match on such indirect rule.

Offloading of both ingress and egress tc blocks of internal ports
is supported as opposed to other devices where only ingress block
offloading is supported.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 100ad4e2 08-Jan-2021 Ariel Levkovich <lariel@nvidia.com>

net/mlx5e: Offload internal port as encap route device

When pefroming encap action, a route lookup is performed
to find the routing device the packet should be forwarded
to after the encapsulation. This is the device that has the
local tunnel ip address.

This change adds support to offload an encap rule where the
route device ends up being an ovs internal port.
In such case, the driver will add a HW rule that will encapsulate
the packet with the tunnel header and will overwrite the vport
metadata in reg_c0 to the internal port metadata value.
Finally, the packet will be forwarded to the root table to be
processed again with the indication that it came from an internal
port.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 27484f71 08-Jan-2021 Ariel Levkovich <lariel@nvidia.com>

net/mlx5e: Offload tc rules that redirect to ovs internal port

Allow offloading rules that redirect to ovs internal port
ingress and egress.

To support redirect to ingress device, offloading of REDIRECT_INGRESS
action is added.

When a tc rule redirects to ovs internal port, the hw rule will
overwrite the input vport value in reg_c0 with a new vport metadata
value that is mapped for this internal port using the internal
port mapping api that is introduce in previous patches.
After that the hw rule will redirect the packet to the root table
to continue processing with the new vport metadata value.

The new vport metadata value indicates that this packet is now
arriving through an internal port and therefore should be processed
using rules that apply on the same internal port as the filter device.
Therefore, following rules that apply on this internal port will have
to match on the same vport metadata value as part of their matching
keys to make sure the packet belongs to the internal port.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# dbac71f2 11-Jan-2021 Ariel Levkovich <lariel@nvidia.com>

net/mlx5e: Accept action skbedit in the tc actions list

Setting the skb packet type field to host is usually
done when performing forwarding to ingress device.

This is required since the receive handling that is used
by the redirect to ingress action checks whether the packet
doesn't belong to this host and drops the packet in such case.

In order to be able to offload action redirect ingress, tc offload
code needs to accept the skbedit ptype action as well.

There's no special handling in HW for such action since it will
be followed by a redirect action and therefore, this code
only allows us to accept such action in the actions list but
not performing anything specific in HW for it.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 4f4edcc2 29-Apr-2021 Ariel Levkovich <lariel@nvidia.com>

net/mlx5: E-Switch, Add ovs internal port mapping to metadata support

Adding infrastructure to map ovs internal port device to vport
match metadata to support offload of rules with internal port as
the filter device or as the destination device.

The infrastructure allows adding and removing internal port device
to an eswitch database and getting a unique vport metadata value to
be placed and match on in reg_c0 when offloading rules that are coming
from or going to an internal port.

The new int port metadata can be written to the source port register
in HW to indicate that current source port of the packet is the
internal port and not one of the actual HW vports (uplink or VF).
Using this method, it is possible to offload TC rules with an OVS
internal port as their destination port (overwriting the src vport
register) or as the filter port (matching on the value of the src
vport register and making sure it matches to the internal port's
value).

There is also a need to handle a miss case where the packet's
src port value was changed in HW to an internal port but a following
rule which matches on this new src port value wasn't found in HW.

In such case, the packet will be forwarded to the driver with
metadata which allows driver to restore the info of the internal
port's netdevice. Once this info is restored, the uplink driver
can forward the packet to the relevant netdevice in SW.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 28e7606f 26-Oct-2021 Ariel Levkovich <lariel@nvidia.com>

net/mlx5e: Refactor rx handler of represetor device

Move the ownership of skb forwarding to network stack to the
tc update_skb handler as different cases will require different
handling of the skb.

While the tc handler will take care of the various cases and
properly handle the handover of the skb to the network stack
and freeing the skb, the main rx handler will be kept clean
from branches and usage of flags.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 016c8946 22-Oct-2021 Jakub Kicinski <kuba@kernel.org>

mlx5: fix build after merge

Silent merge conflict between these two:

3d677735d3b7 ("net/mlx5: Lag, move lag files into directory")
14fe2471c628 ("net/mlx5: Lag, change multipath and bonding to be mutually exclusive")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 0885ae1a 21-Sep-2021 Abhiram R N <abhiramrn@gmail.com>

net/mlx5e: Add extack msgs related to TC for better debug

As multiple places EOPNOTSUPP and EINVAL is returned from driver
it becomes difficult to understand the reason only with error code.
With the netlink extack message exact reason will be known and will
aid in debugging.

Signed-off-by: Abhiram R N <abhiramrn@gmail.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6ba2e2b3 07-Sep-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Support accept action

Support TC generic 'accept' action in mlx5 by introducing
MLX5_ESW_ATTR_FLAG_ACCEPT attribute flag. Flag has similar semantics to
existing MLX5_ESW_ATTR_FLAG_SLOW_PATH flag, however, dedicated flag is
required because existing 'slow path' flag can be flipped by tunneling
subsystem when neighbor changes state.

Introduce new helper function mlx5_esw_attr_flags_skip() to check whether
attribute flags for 'slow path' or 'accept' action are set and use it in
eswitch code instead of direct bit manipulation.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3222efd4 01-Sep-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Reserve a value from TC tunnel options mapping

Reserve one more value from TC tunnel options range to be used by bridge
offload in following patches.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d4f401d9 15-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Move parse fdb check into actions_match_supported_fdb()

The parse fdb/nic actions funcs parse the actions and then call
actions_match_supported() for final check.
Move related check in parse_tc_fdb_actions() into
actions_match_supported_fdb() for more organized code.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9c1d3511 14-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Split actions_match_supported() into a sub function

There will probably be more checks, some for nic flows, some for fdb
flows and some are shared checks. Split it for fdb and nic to avoid
the function getting too big.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d9581e2f 12-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Move mod hdr allocation to a single place

Move mod hdr allocation chunk from parse_tc_fdb_actions() and
parse_tc_nic_actions() to a shared function.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# fca572f2 04-Aug-2021 Dima Chumak <dchumak@nvidia.com>

net/mlx5e: Enable TC offload for egress MACVLAN

Support offloading of TC rules that mirror/redirect egress traffic to a
MACVLAN device, which is attached to mlx5 representor net device.

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c50775d0 29-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Use NL_SET_ERR_MSG_MOD() for errors parsing tunnel attributes

This to be consistent and adds the module name to the error message.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f3e02e47 23-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Use tc sample stubs instead of ifdefs in source file

Instead of having sparse ifdefs in source files use a single
ifdef in the tc sample header file and use stubs.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 1cc35b70 17-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Remove redundant priv arg from parse_pedit_to_reformat()

The priv argument is not being used. remove it.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6b50cf45 12-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Check action fwd/drop flag exists also for nic flows

The driver should add offloaded rules with either a fwd or drop action.
The check existed in parsing fdb flows but not when parsing nic flows.
Move the test into actions_match_supported() which is called for
checking nic flows and fdb flows.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7f8770c7 11-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Set action fwd flag when parsing tc action goto

Do it when parsing like in other actions instead of when
checking if goto is supported in current scenario.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 475fb86a 12-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Remove incorrect addition of action fwd flag

A user is expected to explicit request a fwd or drop action.
It is not correct to implicit add a fwd action for the user,
when modify header action flag exists.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 1836d780 11-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Use correct return type

modify_header_match_supported() should return type bool but
it returns the value returned by is_action_keys_supported()
which is type int.

is_action_keys_supported() always returns either -EOPNOTSUPP
or 0 and it shouldn't change as the purpose of the function
is checking for support. so just make the function return
a bool type.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 14fe2471 07-Oct-2021 Maor Dickman <maord@nvidia.com>

net/mlx5: Lag, change multipath and bonding to be mutually exclusive

Both multipath and bonding events are changing the HW LAG state
independently.
Handling one of the features events while the other is already
enabled can cause unwanted behavior, for example handling
bonding event while multipath enabled will disable the lag and
cause multipath to stop working.

Fix it by ignoring bonding event while in multipath and ignoring FIB
events while in bonding mode.

Fixes: 544fe7c2e654 ("net/mlx5e: Activate HW multipath and handle port affinity based on FIB events")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f9d196bd 24-Jun-2021 Dmytro Linkin <dlinkin@nvidia.com>

net/mlx5e: Use correct eswitch for stack devices with lag

If link aggregation is used within stack devices driver rejects encap
rules if PF of the VF tunnel device is down. This happens because route
resolved for other PF and its eswitch instance is used to determine
correct vport.
To fix that use devcom feature to retrieve other eswitch instance if
failed to find vport for the 1st eswitch and LAG is active.

Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading")
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2d116e3e 28-May-2021 Dmytro Linkin <dlinkin@nvidia.com>

net/mlx5: E-switch, Move QoS related code to dedicated file

Move eswitch QoS related code into dedicated file. Provide eswitch API
to access this code meaning it is isolated and restricted to be used
only by eswitch.c. Exception is legacy NDO vf set rate, which moved to
esw/legacy.c.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>


# 2741f223 21-Jun-2021 Chris Mi <cmi@nvidia.com>

net/mlx5e: TC, Support sample offload action for tunneled traffic

Currently the sample offload actions send the encapsulated packet
to software. This commit decapsulates the packet before performing
the sampling and set the tunnel properties on the skb metadata
fields to make the behavior consistent with OVS sFlow.

If decapsulating first, we can't use the same match like before in
default table. So instantiate a post action instance to continue
processing the action list. If HW can preserve reg_c, also use the
post action instance.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ee950e5d 30-Apr-2021 Chris Mi <cmi@nvidia.com>

net/mlx5e: TC, Restore tunnel info for sample offload

Currently the sample offload actions send the encapsulated packet
to software. sFlow expects tunneled packets to be decapsulated while
having the tunnel properties on the skb metadata fields.

Reuse the functions used by connection tracking to map the outer
header properties to a unique id. The next patch will use that id
to restore the tunnel information of decapsulated packets onto the
skb.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f0da4daa 16-Aug-2021 Chris Mi <cmi@nvidia.com>

net/mlx5e: Refactor ct to use post action infrastructure

Move post action table management to common library providing
add/del/get API. Refactor the ct action offload to use the common
API.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# bcd6740c 18-Aug-2021 Chris Mi <cmi@nvidia.com>

net/mlx5e: Move sample attribute to flow attribute

Currently it is in eswitch attribute. Move it to flow attribute to
reflect the change in previous patch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 0027d70c 18-Aug-2021 Chris Mi <cmi@nvidia.com>

net/mlx5e: Move esw/sample to en/tc/sample

Module sample belongs to en/tc instead of esw. Move it and rename
accordingly.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 5024fa95 16-Aug-2021 Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Remove mlx5e dependency from E-Switch sample

mlx5/esw/sample.c doesn't really need mlx5e_priv object, we can remove
this redundant dependency by passing the eswitch object directly to
the sample object constructor.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>


# 61b6a6c3 09-Aug-2021 Cai Huoqing <caihuoqing@baidu.com>

net/mlx5e: Make use of netdev_warn()

to replace printk(KERN_WARNING ...) with netdev_warn() kindly

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 39c538d6 29-Jul-2021 Cai Huoqing <caihuoqing@baidu.com>

net/mlx5: Fix typo in comments

Fix typo:
*vectores ==> vectors
*realeased ==> released
*erros ==> errors
*namepsace ==> namespace
*trafic ==> traffic
*proccessed ==> processed
*retore ==> restore
*Currenlty ==> Currently
*crated ==> created
*chane ==> change
*cannnot ==> cannot
*usuallly ==> usually
*failes ==> fails
*importent ==> important
*reenabled ==> re-enabled
*alocation ==> allocation
*recived ==> received
*tanslation ==> translation

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2198b932 03-Aug-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Use shared mappings for restoring from metadata

FTEs are added with mapped metadata which is saved per eswitch.
When uplink reps are bonded and we are in a single FDB mode,
we could fail to find metadata which was stored on one eswitch mapping
but not the other or with a different id.
To resolve this issue use shared mapping between eswitch ports.
We do not have any conflict using a single mapping, for a type,
between the ports.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 25f150f4 26-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Return -EOPNOTSUPP if more relevant when parsing tc actions

Instead of returning -EINVAL.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 97a8d29a 26-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Remove redundant assignment of counter to null

counter is being initialized before being used.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c6cfe113 26-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Remove redundant parse_attr arg

Passing parse_attr is redundant in parse_tc_nic_actions() and
mlx5e_tc_add_nic_flow() as we can get it from flow.
This is the same as with parse_tc_fdb_actions() and mlx5e_tc_add_fdb_flow().

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 950b4df9 25-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Remove redundant cap check for flow counter

The cap is very old and today will always exists.
The cap is not being checked anywhere else. Remove the check from
drop action when parsing tc rules in nic mode.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 70f8019e 21-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Remove redundant filter_dev arg from parse_tc_fdb_actions()

filter_dev is saved in parse_attr. and being used in other cases from
there. use it also for the leftover case.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 696ceeb2 22-Jul-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Remove redundant tc act includes

Since the code changed to use the flow action infra
there is no usage of tcf values from those includes.
Remove those.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f4b45940 18-Jul-2021 Maor Gottlieb <maorg@nvidia.com>

net/mlx5: Embed mlx5_ttc_table

mlx5_ttc_table struct shouldn't be exposed to the users so
this patch make it internal to ttc.

In addition add a getter function to get the TTC flow table for users
that need to add a rule which points on it.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>


# bc29764e 02-Jul-2021 Maor Gottlieb <maorg@nvidia.com>

net/mlx5e: Decouple TTC logic from mlx5e

Remove dependency in the mlx5e driver from the TTC implementation
by changing the TTC related functions to receive mlx5 generic arguments.
It allows to decouple TTC logic from mlx5e and reused by other parts of
mlx5 driver.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d443c6f6 02-Jul-2021 Maor Gottlieb <maorg@nvidia.com>

net/mlx5e: Rename traffic type enums

Rename traffic type enums as part of the preparation for moving
the traffic type logic to a separate file.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 43ec0f41 09-Apr-2021 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Hide all implementation details of mlx5e_rx_res

This commit moves all implementation details of struct mlx5e_rx_res
under en/rx_res.c. All access to RX resources is now done using methods.
Encapsulating RX resources into an object allows for better
manageability, because all the implementation details are now in a
single place, and external code can use only a limited set of API
methods to init/teardown the whole thing, reconfigure RSS and LRO
parameters, connect TIRs to flow steering and activate/deactivate TIRs.

mlx5e_rx_res is self-contained and doesn't depend on struct mlx5e_priv
or include en.h.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 43befe99 07-Apr-2021 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Use a new initializer to build uniform indir table

Replace mlx5e_build_default_indir_rqt with a new initializer of struct
mlx5e_rss_params_indir that works directly with the struct, rather than
its internals.

The new initializer is called mlx5e_rss_params_indir_init_uniform, which
also reflects the purpose (uniform spreading) better.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 65d6b6e5 06-Apr-2021 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Move management of indir traffic types to rx_res

This commit moves the responsibility of keeping the RSS configuration
for different traffic types to en/rx_res.{c,h}, hiding the
implementation details behind the new getters, and abandons all usage of
struct mlx5e_tirc_config, which is no longer useful and superseded by
struct mlx5e_rss_params_traffic_type.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a6696735 06-Apr-2021 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Convert TIR to a dedicated object

Code related to TIR is now encapsulated into a dedicated object and put
into new files en/tir.{c,h}. All usages are converted.

The Builder pattern is used to initialize a TIR. It allows to create a
multitude of different configurations, turning on and off some specific
features in different combinations, without having long parameter lists,
initializers per usage and repeating code in initializers.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6fe5ff2c 06-Apr-2021 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Create struct mlx5e_rss_params_hash

This commit introduces a new struct to store RSS hash parameters: hash
function and hash key. The existing usages are changed to use the new
struct.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 093d4bc1 06-Apr-2021 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Use mlx5e_rqt_get_rqtn to access RQT hardware id

In order to abstract from implementation details of mlx5e_rqt, use the
mlx5e_rqt_get_rqtn getter instead of accessing the field directly.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3f22d6c7 05-Apr-2021 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Move RX resources to a separate struct

This commit moves RQTs and TIRs to a separate struct that is allocated
dynamically in profiles that support these RX resources (all profiles,
except IPoIB PKey). It also allows to remove rqt_enabled flags, as RQTs
are always enabled in profiles that support RX resources.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 06e9f13a 02-Apr-2021 Maxim Mikityanskiy <maximmi@nvidia.com>

net/mlx5e: Convert RQT to a dedicated object

Code related to RQT is now encapsulated into a dedicated object and put
into new files en/rqt.{c,h}. All usages are converted.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b1c2f631 26-Apr-2021 Dima Chumak <dchumak@nvidia.com>

net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()

The result of __dev_get_by_index() is not checked for NULL and then gets
dereferenced immediately.

Also, __dev_get_by_index() must be called while holding either RTNL lock
or @dev_base_lock, which isn't satisfied by mlx5e_hairpin_get_mdev() or
its callers. This makes the underlying hlist_for_each_entry() loop not
safe, and can have adverse effects in itself.

Fix by using dev_get_by_index() and handling nullptr return value when
ifindex device is not found. Update mlx5e_hairpin_get_mdev() callers to
check for possible PTR_ERR() result.

Fixes: 77ab67b7f0f9 ("net/mlx5e: Basic setup of hairpin object")
Addresses-Coverity: ("Dereference null return value")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6cdc686a 02-Dec-2020 Ariel Levkovich <lariel@nvidia.com>

net/mlx5: Increase hairpin buffer size

The max packet size a hairpin queue is able to handle
is determined by the total hairpin buffer size divided
by 4.

Currently the buffer size is set to 32KB which makes
the max packet size to be 8KB and doesn't support
jumbo frames of size 9KB.

This change increases the buffer size to 64KB to increase
the max frame size and support 9KB frames.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ed2fe7ba 10-Mar-2021 Paul Blakey <paulb@nvidia.com>

net/mlx5e: TC: Use bit counts for register mapping

To prepare for next patch where we will use a non-byte
aligned mapping, change all byte counts in register
mapping to bits.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a3e5fd93 26-May-2021 Dima Chumak <dchumak@nvidia.com>

net/mlx5e: Fix page reclaim for dead peer hairpin

When adding a hairpin flow, a firmware-side send queue is created for
the peer net device, which claims some host memory pages for its
internal ring buffer. If the peer net device is removed/unbound before
the hairpin flow is deleted, then the send queue is not destroyed which
leads to a stack trace on pci device remove:

[ 748.005230] mlx5_core 0000:08:00.2: wait_func:1094:(pid 12985): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
[ 748.005231] mlx5_core 0000:08:00.2: reclaim_pages:514:(pid 12985): failed reclaiming pages: err -110
[ 748.001835] mlx5_core 0000:08:00.2: mlx5_reclaim_root_pages:653:(pid 12985): failed reclaiming pages (-110) for func id 0x0
[ 748.002171] ------------[ cut here ]------------
[ 748.001177] FW pages counter is 4 after reclaiming all pages
[ 748.001186] WARNING: CPU: 1 PID: 12985 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:685 mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core] [ +0.002771] Modules linked in: cls_flower mlx5_ib mlx5_core ptp pps_core act_mirred sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay fuse [last unloaded: pps_core]
[ 748.007225] CPU: 1 PID: 12985 Comm: tee Not tainted 5.12.0+ #1
[ 748.001376] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 748.002315] RIP: 0010:mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core]
[ 748.001679] Code: 28 00 00 00 0f 85 22 01 00 00 48 81 c4 b0 00 00 00 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 40 cc 19 a1 e8 9f 71 0e e2 <0f> 0b e9 30 ff ff ff 48 c7 c7 a0 cc 19 a1 e8 8c 71 0e e2 0f 0b e9
[ 748.003781] RSP: 0018:ffff88815220faf8 EFLAGS: 00010286
[ 748.001149] RAX: 0000000000000000 RBX: ffff8881b4900280 RCX: 0000000000000000
[ 748.001445] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102a441f51
[ 748.001614] RBP: 00000000000032b9 R08: 0000000000000001 R09: ffffed1054a15ee8
[ 748.001446] R10: ffff8882a50af73b R11: ffffed1054a15ee7 R12: fffffbfff07c1e30
[ 748.001447] R13: dffffc0000000000 R14: ffff8881b492cba8 R15: 0000000000000000
[ 748.001429] FS: 00007f58bd08b580(0000) GS:ffff8882a5080000(0000) knlGS:0000000000000000
[ 748.001695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 748.001309] CR2: 000055a026351740 CR3: 00000001d3b48006 CR4: 0000000000370ea0
[ 748.001506] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 748.001483] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 748.001654] Call Trace:
[ 748.000576] ? mlx5_satisfy_startup_pages+0x290/0x290 [mlx5_core]
[ 748.001416] ? mlx5_cmd_teardown_hca+0xa2/0xd0 [mlx5_core]
[ 748.001354] ? mlx5_cmd_init_hca+0x280/0x280 [mlx5_core]
[ 748.001203] mlx5_function_teardown+0x30/0x60 [mlx5_core]
[ 748.001275] mlx5_uninit_one+0xa7/0xc0 [mlx5_core]
[ 748.001200] remove_one+0x5f/0xc0 [mlx5_core]
[ 748.001075] pci_device_remove+0x9f/0x1d0
[ 748.000833] device_release_driver_internal+0x1e0/0x490
[ 748.001207] unbind_store+0x19f/0x200
[ 748.000942] ? sysfs_file_ops+0x170/0x170
[ 748.001000] kernfs_fop_write_iter+0x2bc/0x450
[ 748.000970] new_sync_write+0x373/0x610
[ 748.001124] ? new_sync_read+0x600/0x600
[ 748.001057] ? lock_acquire+0x4d6/0x700
[ 748.000908] ? lockdep_hardirqs_on_prepare+0x400/0x400
[ 748.001126] ? fd_install+0x1c9/0x4d0
[ 748.000951] vfs_write+0x4d0/0x800
[ 748.000804] ksys_write+0xf9/0x1d0
[ 748.000868] ? __x64_sys_read+0xb0/0xb0
[ 748.000811] ? filp_open+0x50/0x50
[ 748.000919] ? syscall_enter_from_user_mode+0x1d/0x50
[ 748.001223] do_syscall_64+0x3f/0x80
[ 748.000892] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 748.001026] RIP: 0033:0x7f58bcfb22f7
[ 748.000944] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 748.003925] RSP: 002b:00007fffd7f2aaa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 748.001732] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f58bcfb22f7
[ 748.001426] RDX: 000000000000000d RSI: 00007fffd7f2abc0 RDI: 0000000000000003
[ 748.001746] RBP: 00007fffd7f2abc0 R08: 0000000000000000 R09: 0000000000000001
[ 748.001631] R10: 00000000000001b6 R11: 0000000000000246 R12: 000000000000000d
[ 748.001537] R13: 00005597ac2c24a0 R14: 000000000000000d R15: 00007f58bd084700
[ 748.001564] irq event stamp: 0
[ 748.000787] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 748.001399] hardirqs last disabled at (0): [<ffffffff813132cf>] copy_process+0x146f/0x5eb0
[ 748.001854] softirqs last enabled at (0): [<ffffffff8131330e>] copy_process+0x14ae/0x5eb0
[ 748.013431] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 748.001492] ---[ end trace a6fabd773d1c51ae ]---

Fix by destroying the send queue of a hairpin peer net device that is
being removed/unbound, which returns the allocated ring buffer pages to
the host.

Fixes: 4d8fcf216c90 ("net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# afe93f71 13-Apr-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Check for needed capability for cvlan matching

If not supported show an error and return instead of trying to offload
to the hardware and fail.

Fixes: 699e96ddf47f ("net/mlx5e: Support offloading tc double vlan headers match")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9453d45e 25-May-2021 Vlad Buslov <vladbu@nvidia.com>

net: zero-initialize tc skb extension on allocation

Function skb_ext_add() doesn't initialize created skb extension with any
value and leaves it up to the user. However, since extension of type
TC_SKB_EXT originally contained only single value tc_skb_ext->chain its
users used to just assign the chain value without setting whole extension
memory to zero first. This assumption changed when TC_SKB_EXT extension was
extended with additional fields but not all users were updated to
initialize the new fields which leads to use of uninitialized memory
afterwards. UBSAN log:

[ 778.299821] UBSAN: invalid-load in net/openvswitch/flow.c:899:28
[ 778.301495] load of value 107 is not a valid value for type '_Bool'
[ 778.303215] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc7+ #2
[ 778.304933] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 778.307901] Call Trace:
[ 778.308680] <IRQ>
[ 778.309358] dump_stack+0xbb/0x107
[ 778.310307] ubsan_epilogue+0x5/0x40
[ 778.311167] __ubsan_handle_load_invalid_value.cold+0x43/0x48
[ 778.312454] ? memset+0x20/0x40
[ 778.313230] ovs_flow_key_extract.cold+0xf/0x14 [openvswitch]
[ 778.314532] ovs_vport_receive+0x19e/0x2e0 [openvswitch]
[ 778.315749] ? ovs_vport_find_upcall_portid+0x330/0x330 [openvswitch]
[ 778.317188] ? create_prof_cpu_mask+0x20/0x20
[ 778.318220] ? arch_stack_walk+0x82/0xf0
[ 778.319153] ? secondary_startup_64_no_verify+0xb0/0xbb
[ 778.320399] ? stack_trace_save+0x91/0xc0
[ 778.321362] ? stack_trace_consume_entry+0x160/0x160
[ 778.322517] ? lock_release+0x52e/0x760
[ 778.323444] netdev_frame_hook+0x323/0x610 [openvswitch]
[ 778.324668] ? ovs_netdev_get_vport+0xe0/0xe0 [openvswitch]
[ 778.325950] __netif_receive_skb_core+0x771/0x2db0
[ 778.327067] ? lock_downgrade+0x6e0/0x6f0
[ 778.328021] ? lock_acquire+0x565/0x720
[ 778.328940] ? generic_xdp_tx+0x4f0/0x4f0
[ 778.329902] ? inet_gro_receive+0x2a7/0x10a0
[ 778.330914] ? lock_downgrade+0x6f0/0x6f0
[ 778.331867] ? udp4_gro_receive+0x4c4/0x13e0
[ 778.332876] ? lock_release+0x52e/0x760
[ 778.333808] ? dev_gro_receive+0xcc8/0x2380
[ 778.334810] ? lock_downgrade+0x6f0/0x6f0
[ 778.335769] __netif_receive_skb_list_core+0x295/0x820
[ 778.336955] ? process_backlog+0x780/0x780
[ 778.337941] ? mlx5e_rep_tc_netdevice_event_unregister+0x20/0x20 [mlx5_core]
[ 778.339613] ? seqcount_lockdep_reader_access.constprop.0+0xa7/0xc0
[ 778.341033] ? kvm_clock_get_cycles+0x14/0x20
[ 778.342072] netif_receive_skb_list_internal+0x5f5/0xcb0
[ 778.343288] ? __kasan_kmalloc+0x7a/0x90
[ 778.344234] ? mlx5e_handle_rx_cqe_mpwrq+0x9e0/0x9e0 [mlx5_core]
[ 778.345676] ? mlx5e_xmit_xdp_frame_mpwqe+0x14d0/0x14d0 [mlx5_core]
[ 778.347140] ? __netif_receive_skb_list_core+0x820/0x820
[ 778.348351] ? mlx5e_post_rx_mpwqes+0xa6/0x25d0 [mlx5_core]
[ 778.349688] ? napi_gro_flush+0x26c/0x3c0
[ 778.350641] napi_complete_done+0x188/0x6b0
[ 778.351627] mlx5e_napi_poll+0x373/0x1b80 [mlx5_core]
[ 778.352853] __napi_poll+0x9f/0x510
[ 778.353704] ? mlx5_flow_namespace_set_mode+0x260/0x260 [mlx5_core]
[ 778.355158] net_rx_action+0x34c/0xa40
[ 778.356060] ? napi_threaded_poll+0x3d0/0x3d0
[ 778.357083] ? sched_clock_cpu+0x18/0x190
[ 778.358041] ? __common_interrupt+0x8e/0x1a0
[ 778.359045] __do_softirq+0x1ce/0x984
[ 778.359938] __irq_exit_rcu+0x137/0x1d0
[ 778.360865] irq_exit_rcu+0xa/0x20
[ 778.361708] common_interrupt+0x80/0xa0
[ 778.362640] </IRQ>
[ 778.363212] asm_common_interrupt+0x1e/0x40
[ 778.364204] RIP: 0010:native_safe_halt+0xe/0x10
[ 778.365273] Code: 4f ff ff ff 4c 89 e7 e8 50 3f 40 fe e9 dc fe ff ff 48 89 df e8 43 3f 40 fe eb 90 cc e9 07 00 00 00 0f 00 2d 74 05 62 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 64 05 62 00 f4 c3 cc cc 0f 1f 44 00
[ 778.369355] RSP: 0018:ffffffff84407e48 EFLAGS: 00000246
[ 778.370570] RAX: ffff88842de46a80 RBX: ffffffff84425840 RCX: ffffffff83418468
[ 778.372143] RDX: 000000000026f1da RSI: 0000000000000004 RDI: ffffffff8343af5e
[ 778.373722] RBP: fffffbfff0884b08 R08: 0000000000000000 R09: ffff88842de46bcb
[ 778.375292] R10: ffffed1085bc8d79 R11: 0000000000000001 R12: 0000000000000000
[ 778.376860] R13: ffffffff851124a0 R14: 0000000000000000 R15: dffffc0000000000
[ 778.378491] ? rcu_eqs_enter.constprop.0+0xb8/0xe0
[ 778.379606] ? default_idle_call+0x5e/0xe0
[ 778.380578] default_idle+0xa/0x10
[ 778.381406] default_idle_call+0x96/0xe0
[ 778.382350] do_idle+0x3d4/0x550
[ 778.383153] ? arch_cpu_idle_exit+0x40/0x40
[ 778.384143] cpu_startup_entry+0x19/0x20
[ 778.385078] start_kernel+0x3c7/0x3e5
[ 778.385978] secondary_startup_64_no_verify+0xb0/0xbb

Fix the issue by providing new function tc_skb_ext_alloc() that allocates
tc skb extension and initializes its memory to 0 before returning it to the
caller. Change all existing users to use new API instead of calling
skb_ext_add() directly.

Fixes: 038ebb1a713d ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct")
Fixes: d29334c15d33 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7d1a3d08 20-Apr-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Reject mirroring on source port change encap rules

Rules with MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE dest flag are
translated to destination FT in eswitch. Currently it is not possible to
mirror such rules because firmware doesn't support mixing FT and Vport
destinations in single rule when one of them adds encapsulation. Since the
only use case for MLX5_ESW_DEST_CHAIN_WITH_SRC_PORT_CHANGE destination is
support for tunnel endpoints on VF and trying to offload such rule with
mirror action causes either crash in fs_core or firmware error with
syndrome 0xff6a1d, reject all such rules in mlx5 TC layer.

Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# fe7738eb 26-Apr-2021 Dima Chumak <dchumak@nvidia.com>

net/mlx5e: Fix nullptr in mlx5e_tc_add_fdb_flow()

The result of __dev_get_by_index() is not checked for NULL, which then
passed to mlx5e_attach_encap() and gets dereferenced.

Also, in case of a successful lookup, the net_device reference count is
not incremented, which may result in net_device pointer becoming invalid
at any time during mlx5e_attach_encap() execution.

Fix by using dev_get_by_index(), which does proper reference counting on
the net_device pointer. Also, handle nullptr return value when mirred
device is not found.

It's safe to call dev_put() on the mirred net_device pointer, right
after mlx5e_attach_encap() call, because it's not being saved/copied
down the call chain.

Fixes: 3c37745ec614 ("net/mlx5e: Properly deal with encap flows add/del under neigh update")
Addresses-Coverity: ("Dereference null return value")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# dca59f4a 26-Apr-2021 Dima Chumak <dchumak@nvidia.com>

net/mlx5e: Fix nullptr in add_vlan_push_action()

The result of dev_get_by_index_rcu() is not checked for NULL and then
gets dereferenced immediately.

Also, the RCU lock must be held by the caller of dev_get_by_index_rcu(),
which isn't satisfied by the call stack.

Fix by handling nullptr return value when iflink device is not found.
Add RCU locking around dev_get_by_index_rcu() to avoid possible adverse
effects while iterating over the net_device's hlist.

It is safe not to increment reference count of the net_device pointer in
case of a successful lookup, because it's already handled by VLAN code
during VLAN device registration (see register_vlan_dev and
netdev_upper_dev_link).

Fixes: 278748a95aa3 ("net/mlx5e: Offload TC e-switch rules with egress VLAN device")
Addresses-Coverity: ("Dereference null return value")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f94d6389 21-Sep-2020 Chris Mi <cmi@nvidia.com>

net/mlx5e: TC, Add support to offload sample action

The following diagram illustrates the hardware model for tc sample action:

+---------------------+
+ original flow table +
+---------------------+
+ original match +
+---------------------+
|
v
+------------------------------------------------+
+ Flow Sampler Object +
+------------------------------------------------+
+ sample ratio +
+------------------------------------------------+
+ sample table id | default table id +
+------------------------------------------------+
| |
v v
+-----------------------------+ +----------------------------------------+
+ sample table + + default table per <vport, chain, prio> +
+-----------------------------+ +----------------------------------------+
+ forward to management vport + + original match +
+-----------------------------+ +----------------------------------------+
+ other actions +
+----------------------------------------+

The sample action is translated to a goto flow table object
destination which samples packets according to the provided
sample ratio. Sampled packets are duplicated. One copy is
processed by a termination table, named the sample table,
which sends the packet to the eswitch manager port (that will
be processed by software).

The second copy is processed by the default table which executes
the subsequent actions. The default table is created per <vport,
chain, prio> tuple as rules with different prios and chains may
overlap.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2a9ab10a 18-Sep-2020 Chris Mi <cmi@nvidia.com>

net/mlx5e: TC, Add sampler termination table API

Sampled packets are sent to software using termination tables. There
is only one rule in that table that is to forward sampled packets to
the e-switch management vport.

Create a sampler termination table and rule for each eswitch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 41c2fd94 30-Aug-2020 Chris Mi <cmi@nvidia.com>

net/mlx5e: TC, Parse sample action

Parse TC sample action and save sample parameters in flow attribute
data structure.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c9355682 30-Aug-2020 Chris Mi <cmi@nvidia.com>

net/mlx5: Instantiate separate mapping objects for FDB and NIC tables

Currently, the u32 chain id is mapped to u16 value which is stored on
the lower 16 bits of reg_c0 for FDB and reg_b for NIC tables. The
mapping is internally maintained by the chains object. However, with
the introduction of reg_c0 objects the fdb may store more than just
the chain id on reg_c0. This is not relevant for NIC tables.

Separate the chains mapping instantiation for FDB and NIC tables.
Remove the mapping from the chains object. For FDB tables, create
the mapping per eswitch. For NIC tables, create the mapping per tc
table. Pass the corresponding mapping pointer when creating the
chains object.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a91d98a0 10-Sep-2020 Chris Mi <cmi@nvidia.com>

net/mlx5: Map register values to restore objects

Currently reg_c0 lower 16 bits and reg_b are used to store the chain
id that missed in FDB and NIC tables accordingly. However, the
registers' values may index a restore object, rather than a single u32
value. Different object types can be used to restore mutually exclusive
contexts such as chain id and sample group id.

Use the mapping object to associate an index with a restore object
as a prestep for supporting additional restore types.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6783f0a2 26-Feb-2021 Vu Pham <vuhuong@nvidia.com>

net/mlx5e: Dynamic alloc vlan table for netdev when needed

Dynamic allocate vlan table in mlx5e_priv for EN netdev
when needed. Don't allocate it for representor netdev.

Signed-off-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# bb569657 11-Mar-2021 Ariel Levkovich <lariel@nvidia.com>

net/mlx5e: Reject tc rules which redirect from a VF to itself

Since there are self loopback prevention mechanisms at the
VF level, offloading such rules which redirect from a VF
to itself in the eswitch will break the datapath since the
packets will be dropped once they go back to the vport they
came from.

Therefore, offloading such rules will be rejected and left to
be handled by SW.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6def6e47 15-Mar-2021 Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: alloc the correct size for indirection_rqt

The cited patch allocated the wrong size for the indirection_rqt table,
fix that.

Fixes: 2119bda642c4 ("net/mlx5e: allocate 'indirection_rqt' buffer dynamically")
CC: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7dc84de9 16-Sep-2020 Roi Dayan <roid@nvidia.com>

net/mlx5: E-Switch, Protect changing mode while adding rules

We re-use the native NIC port net device instance for the Uplink
representor, a driver currently cannot unbind TC setup callback
actively, hence protect changing E-Switch mode while adding rules.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2ff349c5 16-Sep-2020 Roi Dayan <roid@nvidia.com>

net/mlx5e: Verify dev is present in some ndos

We will re-use the native NIC port net device instance for the Uplink
representor. While changing profiles private resources are not
available but some ndos are not checking if the netdev is present.
So for those ndos check the netdev is present in the driver before
accessing the private resources.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ec9457a6 16-Sep-2020 Roi Dayan <roid@nvidia.com>

net/mlx5e: Distinguish nic and esw offload in tc setup block cb

We will re-use the native NIC port net device instance for the Uplink
representor, hence same ndos will be used.
Now we need to distinguish in the TC callback if the mode is legacy or
switchdev and set the proper flag.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6a56e199 12-Mar-2021 Baowen Zheng <baowen.zheng@corigine.com>

flow_offload: reject configuration of packet-per-second policing in offload drivers

A follow-up patch will allow users to configures packet-per-second policing
in the software datapath. In preparation for this, teach all drivers that
support offload of the policer action to reject such configuration as
currently none of them support it.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a3222a2d 24-Jan-2021 Maor Dickman <maord@nvidia.com>

net/mlx5e: Allow to match on ICMP parameters

Support matching on ICMPv4/6 type and code parameters using misc3
section of match parameters.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 69e2916e 21-Sep-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5: CT: Add support for mirroring

Add support for mirroring before the CT action by spliting the pre ct rule.
Mirror outputs are done first on the tc chain,prio table rule (the fwd
rule), which will then forward to a per port fwd table.
On this fwd table, we insert the original pre ct rule that forwards to
ct/ct nat table.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2119bda6 08-Mar-2021 Arnd Bergmann <arnd@arndb.de>

net/mlx5e: allocate 'indirection_rqt' buffer dynamically

Increasing the size of the indirection_rqt array from 128 to 256 bytes
pushed the stack usage of the mlx5e_hairpin_fill_rqt_rqns() function
over the warning limit when building with clang and CONFIG_KASAN:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:970:1: error: stack frame size of 1180 bytes in function 'mlx5e_tc_add_nic_flow' [-Werror,-Wframe-larger-than=]

Using dynamic allocation here is safe because the caller does the
same, and it reduces the stack usage of the function to just a few
bytes.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 51ada5a5 16-Feb-2021 Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: mlx5_tc_ct_init does not fail

mlx5_tc_ct_init() either returns a valid pointer or a NULL, either way
the caller can continue, remove IS_ERR check from callers as it has no
effect.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e3e0f9b2 08-Apr-2021 wenxu <wenxu@ucloud.cn>

net/mlx5e: fix ingress_ifindex check in mlx5e_flower_parse_meta

In the nft_offload there is the mate flow_dissector with no
ingress_ifindex but with ingress_iftype that only be used
in the software. So if the mask of ingress_ifindex in meta is
0, this meta check should be bypass.

Fixes: 6d65bc64e232 ("net/mlx5e: Add mlx5e_flower_parse_meta support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 96b5b458 04-Mar-2021 Dima Chumak <dchumak@nvidia.com>

net/mlx5e: Offload tuple rewrite for non-CT flows

Setting connection tracking OVS flows and then setting non-CT flows that
use tuple rewrite action (e.g. mod_tp_dst), causes the latter flows not
being offloaded.

Fix by using a stricter condition in modify_header_match_supported() to
check tuple rewrite support only for flows with CT action. The check is
factored out into standalone modify_tuple_supported() function to aid
readability.

Fixes: 7e36feeb0467 ("net/mlx5e: CT: Don't offload tuple rewrites for established tuples")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7d6c86e3 10-Mar-2021 Alaa Hleihel <alaa@nvidia.com>

net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP

Currently, we support hardware offload only for MPLS over UDP.
However, rules matching on MPLS parameters are now wrongly offloaded
for regular MPLS, without actually taking the parameters into
consideration when doing the offload.
Fix it by rejecting such unsupported rules.

Fixes: 72046a91d134 ("net/mlx5e: Allow to match on mpls parameters")
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8b90d897 16-Feb-2021 Parav Pandit <parav@nvidia.com>

net/mlx5e: E-switch, Fix rate calculation division

do_div() returns reminder, while cited patch wanted to use
quotient.
Fix it by using quotient.

Fixes: 0e22bfb7c046 ("net/mlx5e: E-switch, Fix rate calculation for overflow")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 0e22bfb7 12-Jan-2021 Parav Pandit <parav@nvidia.com>

net/mlx5e: E-switch, Fix rate calculation for overflow

rate_bytes_ps is a 64-bit field. It passed as 32-bit field to
apply_police_params(). Due to this when police rate is higher
than 4Gbps, 32-bit calculation ignores the carry. This results
in incorrect rate configurationn the device.

Fix it by performing 64-bit calculation.

Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2b6c3c1e 10-Feb-2021 Wei Yongjun <weiyongjun1@huawei.com>

net/mlx5e: Fix error return code in mlx5e_tc_esw_init()

Fix to return negative error code from the mlx5e_tc_tun_init() error
handling case instead of 0, as done elsewhere in this function.

This commit also using 0 instead of 'ret' when success since it is
always equal to 0.

Fixes: 8914add2c9e5 ("net/mlx5e: Handle FIB events to update tunnel endpoint device")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8914add2 25-Jan-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Handle FIB events to update tunnel endpoint device

Process FIB route update events to dynamically update the stack device
rules when tunnel routing changes. Use rtnl lock to prevent FIB event
handler from running concurrently with neigh update and neigh stats
workqueue tasks. Use encap_tbl_lock mutex to synchronize with TC rule
update path that doesn't use rtnl lock.

FIB event workflow for encap flows:

- Unoffload all flows attached to route encaps from slow or fast path
depending on encap destination endpoint neigh state.

- Update encap IP header according to new route dev.

- Update flows mod_hdr action that is responsible for overwriting reg_c0
source port bits to source port of new underlying VF of new route dev. This
step requires changing flow create/delete code to save flow parse attribute
mod_hdr_acts structure for whole flow lifetime instead of deallocating it
after flow creation. Refactor mod_hdr code to allow saving id of individual
mod_hdr actions and updating them with dedicated helper.

- Offload all flows to either slow or fast path depending on encap
destination endpoint neigh state.

FIB event workflow for decap flows:

- Unoffload all route flows from hardware. When last route flow is deleted
all indirect table rules for the route dev will also be deleted.

- Update flow attr decap_vport and destination MAC according to underlying
VF of new rote dev.

- Offload all route flows back to hardware creating new indirect table
rules according to updated flow attribute data.

Extract some neigh update code to helper functions to be used by both neigh
update and route update infrastructure.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 021905f8 25-Jan-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Rename some encap-specific API to generic names

Some of the encap-specific functions and fields will also be used by route
update infrastructure in following patches. Rename them to generic names.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c7b9038d 25-Jan-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: TC preparation refactoring for routing update event

Following patch in series implement routing update event which requires
ability to modify rule match_to_reg modify header actions dynamically
during rule lifetime. In order to accommodate such behavior, refactor and
extend TC infrastructure in following ways:

- Modify mod_hdr infrastructure to preserve its parse attribute for whole
rule lifetime, instead of deallocating it after rule creation.

- Extend match_to_reg infrastructure with new function
mlx5e_tc_match_to_reg_set_and_get_id() that returns mod_hdr action id that
can be used afterwards to update the action, and
mlx5e_tc_match_to_reg_mod_hdr_change() that can modify existing actions by
its id.

- Extend tun API with new functions mlx5e_tc_tun_update_header_ipv{4|6}()
that are used to updated existing encap entry tunnel header.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 777bb800 21-Sep-2020 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Create route entry infrastructure

Implement dedicated route entry infrastructure to be used in following
patch by route update event. Both encap (indirectly through their
corresponding encap entries) and decap (directly) flows are attached to
routing entry. Since route update also requires updating encap (route
device MAC address is a source MAC address of tunnel encapsulation), same
encap_tbl_lock mutex is used for synchronization.

The new infrastructure looks similar to existing infrastructures for shared
encap, mod_hdr and hairpin entries:

- Per-eswitch hash table is used for quick entry lookup.

- Flows are attached to per-entry linked list and hold reference to entry
during their lifetime.

- Atomic reference counting and rcu mechanisms are used as synchronization
primitives for concurrent access.

The infrastructure also enables connection tracking on stacked devices
topology by attaching CT chain 0 flow on tunneling dev to decap route
entry.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 0d9f9647 24-Jan-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Extract tc tunnel encap/decap code to dedicated file

Following patches in series extend the extracted code with routing
infrastructure. To improve code modularity created a dedicated
tc_tun_encap.c source file and move encap/decap related code to the new
file. Export code that is used by both regular TC code and encap/decap code
into tc_priv.h (new header intended to be used only by TC module). Rename
some exported functions by adding "mlx5e_" prefix to their names.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8e404fef 31-Aug-2020 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Match recirculated packet miss in slow table using reg_c1

Previous patch in series that implements stack devices RX path implements
indirect table rules that match on tunnel VNI. After such rule is created
all tunnel traffic is recirculated to root table. However, recirculated
packet might not match on any rules installed in the table (for example,
when IP traffic follows ARP traffic). In that case packets appear on
representor of tunnel endpoint VF instead being redirected to the VF
itself.

Extend slow table with additional flow group that matches on reg_c0 (source
port value set by indirect tables implemented by previous patch in series)
and reg_c1 (special 0xFFF mark). When creating offloads fdb tables, install
one rule per VF vport to match on recirculated miss packets and redirect
them to appropriate VF vport. Modify indirect tables code to also rewrite
reg_c1 with special 0xFFF mark.

Implementation reuses reg_c1 tunnel id bits. This is safe to do because
recirculated packets are always matched before decapsulation.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 48d216e5 31-Aug-2020 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Refactor reg_c1 usage

Following patch in series uses reg_c1 in eswitch code. To use reg_c1
helpers in both TC and eswitch code, refactor existing helpers according to
similar use case of reg_c0 and move the functionality into eswitch.h.
Calculate reg mappings length from new defines to ensure that they are
always in sync and only need to be changed in single place.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a508728a 25-Jan-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: VF tunnel RX traffic offloading

When tunnel endpoint is on VF the encapsulated RX traffic is exposed on the
representor of the VF without any further processing of rules installed on
the VF. Detect such case by checking if the device returned by route lookup
in decap rule handling code is a mlx5 VF and handle it with new redirection
tables API.

Example TC rules for VF tunnel traffic:

1. Rule that encapsulates the tunneled flow and redirects packets from
source VF rep to tunnel device:

$ tc -s filter show dev enp8s0f0_1 ingress
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
dst_mac 0a:40:bd:30:89:99
src_mac ca:2e:a7:3f:f5:0f
eth_type ipv4
ip_tos 0/0x3
ip_flags nofrag
in_hw in_hw_count 1
action order 1: tunnel_key set
src_ip 7.7.7.5
dst_ip 7.7.7.1
key_id 98
dst_port 4789
nocsum
ttl 64 pipe
index 1 ref 1 bind 1 installed 411 sec used 411 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
no_percpu
used_hw_stats delayed

action order 2: mirred (Egress Redirect to device vxlan_sys_4789) stolen
index 1 ref 1 bind 1 installed 411 sec used 0 sec
Action statistics:
Sent 5615833 bytes 4028 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 5615833 bytes 4028 pkt
backlog 0b 0p requeues 0
cookie bb406d45d343bf7ade9690ae80c7cba4
no_percpu
used_hw_stats delayed

2. Rule that redirects from tunnel device to UL rep:

$ tc -s filter show dev vxlan_sys_4789 ingress
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
dst_mac ca:2e:a7:3f:f5:0f
src_mac 0a:40:bd:30:89:99
eth_type ipv4
enc_dst_ip 7.7.7.5
enc_src_ip 7.7.7.1
enc_key_id 98
enc_dst_port 4789
enc_tos 0
ip_flags nofrag
in_hw in_hw_count 1
action order 1: tunnel_key unset pipe
index 2 ref 1 bind 1 installed 434 sec used 434 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
used_hw_stats delayed

action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen
index 4 ref 1 bind 1 installed 434 sec used 0 sec
Action statistics:
Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 129936 bytes 1082 pkt
backlog 0b 0p requeues 0
cookie ac17cf398c4c69e4a5b2f7aabd1b88ff
no_percpu
used_hw_stats delayed

Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 10742efc 21-Jan-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: VF tunnel TX traffic offloading

When tunnel endpoint is on VF, driver still assumes that endpoint is on
uplink and incorrectly configures encap rule offload according to that
assumption. As a result, traffic is sent directly to the uplink and rules
installed on representor of tunnel endpoint VF are ignored.

Implement following changes to allow offloading tx traffic with tunnel
endpoint on VF:

- For tunneling flows perform route lookup on route and out devices pair.
If out device is uplink and route device is VF of same physical port, then
modify packet reg_c_0 metadata register (source port) with the value of VF
vport. Use eswitch vhca_id->vport mapping introduced in one of previous
patches in the series to obtain vport from route netdevice.

- Recirculate encapsulated packets to VF vport in order to apply any flow
rules installed on VF representor that match on encapsulated traffic.

Only enable support for this functionality when all following conditions
are true:

- Hardware advertises capability to preserve reg_c_0 value on packet
recirculation.

- Vport metadata matching is enabled.

- Termination tables are to be used by the flow.

Example TC rules for VF tunnel traffic:

1. Rule that redirects packets from UL to VF rep that has the tunnel
endpoint IP address:

$ tc -s filter show dev enp8s0f0 ingress
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
dst_mac 16:c9:a0:2d:69:2c
src_mac 0c:42:a1:58:ab:e4
eth_type ipv4
ip_flags nofrag
in_hw in_hw_count 1
action order 1: mirred (Egress Redirect to device enp8s0f0_0) stolen
index 3 ref 1 bind 1 installed 377 sec used 0 sec
Action statistics:
Sent 114096 bytes 952 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 114096 bytes 952 pkt
backlog 0b 0p requeues 0
cookie 878fa48d8c423fc08c3b6ca599b50a97
no_percpu
used_hw_stats delayed

2. Rule that decapsulates the tunneled flow and redirects to destination VF
representor:

$ tc -s filter show dev vxlan_sys_4789 ingress
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
dst_mac ca:2e:a7:3f:f5:0f
src_mac 0a:40:bd:30:89:99
eth_type ipv4
enc_dst_ip 7.7.7.5
enc_src_ip 7.7.7.1
enc_key_id 98
enc_dst_port 4789
enc_tos 0
ip_flags nofrag
in_hw in_hw_count 1
action order 1: tunnel_key unset pipe
index 2 ref 1 bind 1 installed 434 sec used 434 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
used_hw_stats delayed

action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen
index 4 ref 1 bind 1 installed 434 sec used 0 sec
Action statistics:
Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 0 bytes 0 pkt
Sent hardware 129936 bytes 1082 pkt
backlog 0b 0p requeues 0
cookie ac17cf398c4c69e4a5b2f7aabd1b88ff
no_percpu
used_hw_stats delayed

Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9ba33339 25-Jan-2021 Roi Dayan <roid@nvidia.com>

net/mlx5e: Avoid false lock depenency warning on tc_ht

To avoid false lock dependency warning set the tc_ht lock
class different than the lock class of the ht being used when deleting
last flow from a group and then deleting a group, we get into del_sw_flow_group()
which call rhashtable_destroy on fg->ftes_hash which will take ht->mutex but
it's different than the ht->mutex here.

======================================================
WARNING: possible circular locking dependency detected
5.11.0-rc4_net_next_mlx5_949fdcc #1 Not tainted
------------------------------------------------------
modprobe/12950 is trying to acquire lock:
ffff88816510f910 (&node->lock){++++}-{3:3}, at: mlx5_del_flow_rules+0x2a/0x210 [mlx5_core]

but task is already holding lock:
ffff88815834e3e8 (&ht->mutex){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x37/0x340

which lock already depends on the new lock.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9a99c8f1 12-Jan-2021 Jianbo Liu <jianbol@mellanox.com>

net/mlx5e: E-Switch, Offload all chain 0 priorities when modify header and forward action is not supported

Miss path handling of tc multi chain filters (i.e. filters that are
defined on chain > 0) requires the hardware to communicate to the
driver the last chain that was processed. This is possible only when
the hardware is capable of performing the combination of modify header
and forward to table actions. Currently, if the hardware is missing
this capability then the driver only offloads rules that are defined
on tc chain 0 prio 1. However, this restriction can be relaxed because
packets that miss from chain 0 are processed through all the
priorities by tc software.

Allow the offload of all the supported priorities for chain 0 even
when the hardware is not capable to perform modify header and goto
table actions.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 89e39467 21-Jan-2021 Paul Blakey <paulb@nvidia.com>

net/mlx5e: Fix CT rule + encap slow path offload and deletion

Currently, if a neighbour isn't valid when offloading tunnel encap rules,
we offload the original match and replace the original action with
"goto slow path" action. For this we use a temporary flow attribute based
on the original flow attribute and then change the action. Flow flags,
which among those is the CT flag, are still shared for the slow path rule
offload, so we end up parsing this flow as a CT + goto slow path rule.

Besides being unnecessary, CT action offload saves extra information in
the passed flow attribute, such as created ct_flow and mod_hdr, which
is lost onces the temporary flow attribute is freed.

When a neigh is updated and is valid, we offload the original CT rule
with original CT action, which again creates a ct_flow and mod_hdr
and saves it in the flow's original attribute. Then we delete the slow
path rule with a temporary flow attribute based on original updated
flow attribute, and we free the relevant ct_flow and mod_hdr.

Then when tc deletes this flow, we try to free the ct_flow and mod_hdr
on the flow's attribute again.

To fix the issue, skip all furture proccesing (CT/Sample/Split rules)
in offload/unoffload of slow path rules.

Call trace:
[ 758.850525] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000218
[ 758.952987] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 758.964170] Modules linked in: act_csum(E) act_pedit(E) act_tunnel_key(E) act_ct(E) nf_flow_table(E) xt_nat(E) ip6table_filter(E) ip6table_nat(E) xt_comment(E) ip6_tables(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xt_addrtype(E) iptable_filter(E) iptable_nat(E) bpfilter(E) br_netfilter(E) bridge(E) stp(E) llc(E) xfrm_user(E) overlay(E) act_mirred(E) act_skbedit(E) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) esp6_offload(E) esp6(E) esp4_offload(E) esp4(E) xfrm_algo(E) mlx5_ib(OE) ib_uverbs(OE) geneve(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink_cttimeout(E) nfnetlink(E) mlx5_core(OE) act_gact(E) cls_flower(E) sch_ingress(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat(E) mlxfw(OE) psample(E) nf_conntrack(E) nf_defrag_ipv4(E) vfio_mdev(E) mdev(E) ib_core(OE) mlx_compat(OE) crct10dif_ce(E) uio_pdrv_genirq(E) uio(E) i2c_mlx(E) mlxbf_pmc(E) sbsa_gwdt(E) mlxbf_gige(E) gpio_mlxbf2(E) mlxbf_pka(E) mlx_trio(E) mlx_bootctl(E) bluefield_edac(E) knem(O)
[ 758.964225] ip_tables(E) mlxbf_tmfifo(E) ipv6(E) crc_ccitt(E) nf_defrag_ipv6(E)
[ 759.154186] CPU: 5 PID: 122 Comm: kworker/u16:1 Tainted: G OE 5.4.60-mlnx.52.gde81e85 #1
[ 759.172870] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.5.0-2-gc1b5d64 Jan 4 2021
[ 759.195466] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core]
[ 759.207344] pstate: a0000005 (NzCv daif -PAN -UAO)
[ 759.217003] pc : mlx5_del_flow_rules+0x5c/0x160 [mlx5_core]
[ 759.228229] lr : mlx5_del_flow_rules+0x34/0x160 [mlx5_core]
[ 759.405858] Call trace:
[ 759.410804] mlx5_del_flow_rules+0x5c/0x160 [mlx5_core]
[ 759.421337] __mlx5_eswitch_del_rule.isra.43+0x5c/0x1c8 [mlx5_core]
[ 759.433963] mlx5_eswitch_del_offloaded_rule_ct+0x34/0x40 [mlx5_core]
[ 759.446942] mlx5_tc_rule_delete_ct+0x68/0x74 [mlx5_core]
[ 759.457821] mlx5_tc_ct_delete_flow+0x160/0x21c [mlx5_core]
[ 759.469051] mlx5e_tc_unoffload_fdb_rules+0x158/0x168 [mlx5_core]
[ 759.481325] mlx5e_tc_encap_flows_del+0x140/0x26c [mlx5_core]
[ 759.492901] mlx5e_rep_update_flows+0x11c/0x1ec [mlx5_core]
[ 759.504127] mlx5e_rep_neigh_update+0x160/0x200 [mlx5_core]
[ 759.515314] process_one_work+0x178/0x400
[ 759.523350] worker_thread+0x58/0x3e8
[ 759.530685] kthread+0x100/0x12c
[ 759.537152] ret_from_fork+0x10/0x18
[ 759.544320] Code: 97ffef55 51000673 3100067f 54ffff41 (b9421ab3)
[ 759.556548] ---[ end trace fab818bb1085832d ]---

Fixes: 4c3844d9e97e ("net/mlx5e: CT: Introduce connection tracking")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 48470a90 19-Jan-2021 Maor Dickman <maord@nvidia.com>

net/mlx5e: Reduce tc unsupported key print level

"Unsupported key used:" appears in kernel log when flows with
unsupported key are used, arp fields for example.

OpenVSwitch was changed to match on arp fields by default that
caused this warning to appear in kernel log for every arp rule, which
can be a lot.

Fix by lowering print level from warning to debug.

Fixes: e3a2b7ed018e ("net/mlx5e: Support offload cls_flower with drop action")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 1fe3e316 12-Jan-2021 Parav Pandit <parav@nvidia.com>

net/mlx5e: E-switch, Fix rate calculation for overflow

rate_bytes_ps is a 64-bit field. It passed as 32-bit field to
apply_police_params(). Due to this when police rate is higher
than 4Gbps, 32-bit calculation ignores the carry. This results
in incorrect rate configurationn the device.

Fix it by performing 64-bit calculation.

Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e8711402 10-Oct-2020 Leon Romanovsky <leon@kernel.org>

net/mlx5: Simplify eswitch mode check

Provide mlx5_core device instead of "priv" pointer while checking
eswith mode.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>


# 68ec32da 14-Nov-2020 Wang Hai <wanghai38@huawei.com>

net/mlx5: fix error return code in mlx5e_tc_nic_init()

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: aedd133d17bc ("net/mlx5e: Support CT offload for tc nic flows")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e68e28b4 30-Sep-2020 Maor Dickman <maord@nvidia.com>

net/mlx5e: Fix modify header actions memory leak

Modify header actions are allocated during parse tc actions and only
freed during the flow creation, however, on error flow the allocated
memory is wrongly unfreed.

Fix this by calling dealloc_mod_hdr_actions in __mlx5e_add_fdb_flow
and mlx5e_add_nic_flow error flow.

Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Fixes: 2f4fe4cab073 ("net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 825f8b0b 06-Oct-2020 Colin Ian King <colin.king@canonical.com>

net/mlx5: Fix uininitialized pointer read on pointer attr

Currently the error exit path err_free kfree's attr. In the case where
flow and parse_attr failed to be allocated this return path will free
the uninitialized pointer attr, which is not correct. In the other
case where attr fails to allocate attr does not need to be freed. So
in both error exits via err_free attr should not be freed, so remove
it.

Addresses-Coverity: ("Uninitialized pointer read")
Fixes: ff7ea04ad579 ("net/mlx5e: Fix potential null pointer dereference")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ff7ea04a 25-Sep-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

net/mlx5e: Fix potential null pointer dereference

Calls to kzalloc() and kvzalloc() should be null-checked
in order to avoid any potential failures. In this case,
a potential null pointer dereference.

Fix this by adding null checks for _parse_attr_ and _flow_
right after allocation.

Addresses-Coverity-ID: 1497154 ("Dereference before null check")
Fixes: c620b772152b ("net/mlx5: Refactor tc flow attributes structure")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 5efbe617 28-Sep-2020 Ariel Levkovich <lariel@nvidia.com>

net/mlx5: Fix dereference on pointer attr after null check

When removing a flow from the slow path fdb, a flow attr struct is
allocated for the rule removal process. If the allocation fails the
code prints a warning message but continues with the removal flow
which include dereferencing a pointer which could be null.
Fix this by exiting the function in case the attr allocation failed.

Fixes: c620b772152b ("net/mlx5: Refactor tc flow attributes structure")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 89fbdbae 22-Sep-2020 Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: TC: Remove unused parameter from mlx5_tc_ct_add_no_trk_match()

priv is never used in this function

Fixes: 7e36feeb0467 ("net/mlx5e: CT: Don't offload tuple rewrites for established tuples")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# aedd133d 21-Jul-2020 Ariel Levkovich <lariel@mellanox.com>

net/mlx5e: Support CT offload for tc nic flows

Adding support to perform CT related tc actions and
matching on CT states for nic flows.

The ct flows management and handling will be done using a new
instance of the ct database that is declared in this patch to
keep it separate from the eswitch ct flows database.
Offloading and unoffloading ct flows will be done using the
existing ct offload api by providing it the relevant ct
database reference in each mode.

In addition, refactoring the tc ct api is introduced to make it
agnostic to the flow type and perform the resource allocations
and rule insertion to the proper steering domain in the device.

In the initialization call, the api requests and stores in the ct
database instance all the relevant information that distinguishes
between nic flows and esw flows, such as chains database, steering
namespace and mod hdr table.
This way the operations of adding and removing ct flows to the device
can later performed agnostically to the flow type.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c7569097 28-Apr-2020 Ariel Levkovich <lariel@mellanox.com>

net/mlx5e: Add tc chains offload support for nic flows

Allow adding nic tc flow rules with goto chain action.

Connecting the nic flows to the mlx5 chains infrastructure in previous
patches allows us to support the creation of chained flow tables and
rules that direct to another chain for further packet processing.
This is a required preparation to support CT offloads for nic tc flows.

We allow the creation of 256 different chains for nic flows since we
have 8 bits available for the chain restore tag in case of a miss.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c620b772 29-Apr-2020 Ariel Levkovich <lariel@mellanox.com>

net/mlx5: Refactor tc flow attributes structure

In order to support chains and connection tracking offload for
nic flows, there's a need to introduce a common flow attributes
struct so that these features can be agnostic and have access to
a single attributes struct, regardless of the flow type.

Therefore, a new tc flow attributes format is introduced to allow
access to attributes that are common to eswitch and nic flows.

The common attributes will always get allocated for the new flows,
regardless of their type, while the type specific attributes are
separated into different structs and will be allocated based on the
flow type to avoid memory waste.

When allocating the flow attributes the caller provides the flow
steering namespace and according the namespace type the additional
space for the extra, type specific, attributes is determined and
added to the total attribute allocation size.

In addition, the attributes that are going to be common to both
flow types are moved to the common attributes struct.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 08247066 28-Apr-2020 Ariel Levkovich <lariel@mellanox.com>

net/mlx5e: Split nic tc flow allocation and creation

For future support of CT offload with nic tc flows, where
the flow rule is not created immediately but rather following
a future event, the patch is splitting the nic rule creation
and deletion into 2 parts:
1. Creating/Deleting and setting the rule attributes.
2. Creating/Deleting the flow table and flow rule itself.

This way the attributes can be prepared and stored in the
flow handle when the tc flow is created but the rule can
actually be created at any point in the future, using these
pre allocated attributes.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6a064674 27-Apr-2020 Ariel Levkovich <lariel@mellanox.com>

net/mlx5e: Tc nic flows to use mlx5_chains flow tables

Change nic tc flows offload path to use the chains and prios
infrastructure for the flow table creation as a preparation to
support tc multi chains and priorities for nic flows.

Adding an instance of the table chaining database to the nic tc struct
and perform the root table creation and desctuction via the chains api
while keeping the limit of a single chain (0) in nic tc mode.
This will be extendable to supporting multiple chains in the following
patches.

The flow table sizes and default miss table parameters that are provided
to the chains creation api are kept the same.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ae430332 24-Apr-2020 Ariel Levkovich <lariel@mellanox.com>

net/mlx5: Refactor multi chains and prios support

Decouple the chains infrastructure from eswitch and make
it generic to support other steering namespaces.

The change defines an agnostic data structure to keep
all the relevant information for maintaining flow table
chaining in any steering namespace. Each namespace that
requires table chaining will be required to allocate
such data structure.

The chains creation code will receive the steering namespace
and flow table parameters from the caller so it will operate
agnosticly when creating the required resources to
maintain the table chaining function while Parts of the code
that are relevant to eswitch specific functionality are moved
to eswitch files.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 82198d8b 02-Sep-2020 Maor Dickman <maord@nvidia.com>

net/mlx5e: Fix endianness when calculating pedit mask first bit

The field mask value is provided in network byte order and has to
be converted to host byte order before calculating pedit mask
first bit.

Fixes: 88f30bbcbaaa ("net/mlx5e: Bit sized fields rewrite support")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 4c8594ad 26-Jul-2020 Roi Dayan <roid@mellanox.com>

net/mlx5e: CT: Fix freeing ct_label mapping

Add missing mapping remove call when removing ct rule,
as the mapping was allocated when ct rule was adding with ct_label.
Also there is a missing mapping remove call in error flow.

Fixes: 54b154ecfb8c ("net/mlx5e: CT: Map 128 bits labels to 32 bit map ID")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 12a240a4 07-Jul-2020 Jianbo Liu <jianbol@mellanox.com>

net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready

When deleting vxlan flow rule under multipath, tun_info in parse_attr is
not freed when the rule is not ready.

Fixes: ef06c9ee8933 ("net/mlx5e: Allow one failure when offloading tc encap rules under multipath")
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 748cde9a 03-Sep-2020 Maor Dickman <maord@nvidia.com>

net/mlx5e: Add IPv6 traffic class (DSCP) header rewrite support

Add support for rewriting of IPV6 DSCP part of traffic class field.
Next commands, for example, can be used to offload rewrite action:

OVS:
$ ovs-ofctl add-flow ovs-sriov "tcpv6, in_port=REP, \
actions=mod_nw_tos:68, output:NIC"

iproute2:
$ tc filter add dev REP ingress protocol ipv6 prio 1 flower skip_sw \
ip_proto tcp \
action pedit ex munge ip6 traffic_class set 68 retain 0xfc pipe \
action mirred egress redirect dev NIC

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f0288210 23-Aug-2020 Eli Cohen <eli@mellanox.com>

net/mlx5e: Add support for tc trap

Support tc trap such that packets can explicitly be forwarded to slow
path if they match a specific rule.

In the example below, we want packets with src IP equals 7.7.7.8 to be
forwarded to software, in which case it will get to the appropriate
representor net device.

$ tc filter add dev eth1 protocol ip prio 1 root flower skip_sw \
src_ip 7.7.7.8 action trap

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 0faddfe6 01-Jul-2020 Jianbo Liu <jianbol@mellanox.com>

net/mlx5e: E-Switch, Add misc bit when misc fields changed for mirroring

The modified flow_context fields in FTE must be indicated in
modify_enable bitmask. Previously, the misc bit in modify_enable is
always set as source vport must be set for each rule. So, when parsing
vxlan/gre/geneve/qinq rules, this bit is not set because those are all
from the same misc fileds that source vport fields are located at, and
we don't need to set the indicator twice.

After adding per vport tables for mirroring, misc bit is not set, then
firmware syndrome happens. To fix it, set the bit wherever misc fileds
are changed. This also makes it unnecessary to check misc fields and set
the misc bit accordingly in metadata matching, so here remove it.

Besides, flow_source must be specified for uplink because firmware
will check it and some actions are only allowed for packets received
from uplink.

Fixes: 96e326878fa5 ("net/mlx5e: Eswitch, Use per vport tables for mirroring")
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 54b154ec 18-Jun-2020 Eli Britstein <elibr@mellanox.com>

net/mlx5e: CT: Map 128 bits labels to 32 bit map ID

The 128 bits ct_label field is matched using a 32 bit hardware register.
As such, only the lower 32 bits of ct_label field are offloaded. Change
this logic to support setting and matching higher bits too.
Map the 128 bits data to a unique 32 bits ID. Matching is done as exact
match of the mapping ID of key & mask.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Maor Dickman <maord@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d12f4521 05-May-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: CT: Expand tunnel register mappings

Reg_c1 is 32 bits wide. Originally, 24 bit were allocated for the tuple_id,
6 bits for tunnel mapping and 2 bits for tunnel options mappings.

Restoring the ct state from zone lookup instead of tuple id requires
reg_c1 to store 8 bits mapping the ct zone, leaving 24 bits for tunnel
mappings.

Expand tunnel and tunnel options register mappings to 12 bit each.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b2fdf3d0 17-Mar-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: Export sharing of mod headers to a new file

Refactor sharing of mod headers to new file and while there,
remove spin lock and flows list, as this is only used for warn on.

Use the generic API in the next patch to re-use tuple modify headers
for identical modify actions,

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a8eb919b 29-Mar-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: CT: Restore ct state from lookup in zone instead of tupleid

Remove tupleid, and replace it with zone_restore, which is the zone an
established tuple sets after match. On miss, Use this zone + tuple
taken from the skb, to lookup the ct entry and restore it.

This improves flow insertion rate by avoiding the allocation of a header
rewrite context.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7e36feeb 22-Apr-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: CT: Don't offload tuple rewrites for established tuples

Next patches will remove the tupleid registers that is used
to restore the ct state on miss, and instead use the tuple on
the missed packet to lookup which state to restore.
Disable tuple rewrites after connection tracking.

For tuple rewrites, inject a ct_state=-trk match so it won't
change the tuple for established flows (+trk) that passed connection
tracking, and instead miss to software.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3d486ec4 01-Jun-2020 Oz Shlomo <ozsh@mellanox.com>

net/mlx5e: Use netdev_info instead of pr_info

The next patch will pass the mlx5e_priv struct to the
modify_header_match_supported method. Use this opportunity to refactor
the existing pr_info call to a netdev_info call.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a7c119bd 03-May-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: CT: Allow header rewrite of 5-tuple and ct clear action

With ct clear we don't jump to the ct tables, so header rewrite
of 5-tuple can be done in place (and not moved to after the CT action).

Check for ct clear action, and if so, allow 5-tuple header
rewrite.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c1aea9e1 17-Jun-2020 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Fix usage of rcu-protected pointer

In mlx5e_configure_flower() flow pointer is protected by rcu read lock.
However, after cited commit the pointer is being used outside of rcu read
block. Extend the block to protect all pointer accesses.

Fixes: 553f9328385d ("net/mlx5e: Support tc block sharing for representors")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2fb15e72 17-Jun-2020 Vlad Buslov <vladbu@mellanox.com>

net/mxl5e: Verify that rpriv is not NULL

In helper function is_flow_rule_duplicate_allowed() verify that rpviv
pointer is not NULL before dereferencing it. This can happen when device is
in NIC mode and leads to following crash:

[90444.046419] BUG: kernel NULL pointer dereference, address: 0000000000000000
[90444.048149] #PF: supervisor read access in kernel mode
[90444.049781] #PF: error_code(0x0000) - not-present page
[90444.051386] PGD 80000003d35a4067 P4D 80000003d35a4067 PUD 3d35a3067 PMD 0
[90444.053051] Oops: 0000 [#1] SMP PTI
[90444.054683] CPU: 16 PID: 31736 Comm: tc Not tainted 5.8.0-rc1+ #1157
[90444.056340] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[90444.058079] RIP: 0010:mlx5e_configure_flower+0x3aa/0x9b0 [mlx5_core]
[90444.059753] Code: 24 50 49 8b 95 08 02 00 00 48 b8 00 08 00 00 04 00 00 00 48 21 c2 48 39 c2 74 0a 41 f6 85 0d 02 00 00 20 74 16 48 8b 44 24 20 <48> 8b 00 66 83 78 20 ff 74 07 4d 89 aa e0 00 00 00 48 83 7d 28 00
[90444.063232] RSP: 0018:ffffabe9c61ff768 EFLAGS: 00010246
[90444.065014] RAX: 0000000000000000 RBX: ffff9b13c4c91e80 RCX: 00000000000093fa
[90444.066784] RDX: 0000000400000800 RSI: 0000000000000000 RDI: 000000000002d5e0
[90444.068533] RBP: ffff9b174d308468 R08: 0000000000000000 R09: ffff9b17d63003f0
[90444.070285] R10: ffff9b17ea288600 R11: 0000000000000000 R12: ffffabe9c61ff878
[90444.072032] R13: ffff9b174d300000 R14: ffffabe9c61ffbb8 R15: ffff9b174d300880
[90444.073760] FS: 00007f3c23775480(0000) GS:ffff9b13efc80000(0000) knlGS:0000000000000000
[90444.075492] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[90444.077266] CR2: 0000000000000000 CR3: 00000003e2a60002 CR4: 00000000001606e0
[90444.079024] Call Trace:
[90444.080753] tc_setup_cb_add+0xca/0x1e0
[90444.082415] fl_hw_replace_filter+0x15f/0x1f0 [cls_flower]
[90444.084119] fl_change+0xa59/0x13dc [cls_flower]
[90444.085772] ? wait_for_completion+0xa8/0xf0
[90444.087364] tc_new_tfilter+0x3f5/0xa60
[90444.088960] rtnetlink_rcv_msg+0xeb/0x360
[90444.090514] ? __d_lookup_done+0x76/0xe0
[90444.092034] ? proc_alloc_inode+0x16/0x70
[90444.093560] ? prep_new_page+0x8c/0xf0
[90444.095048] ? _cond_resched+0x15/0x30
[90444.096483] ? rtnl_calcit.isra.0+0x110/0x110
[90444.097907] netlink_rcv_skb+0x49/0x110
[90444.099289] netlink_unicast+0x191/0x230
[90444.100629] netlink_sendmsg+0x243/0x480
[90444.101984] sock_sendmsg+0x5e/0x60
[90444.103305] ____sys_sendmsg+0x1f3/0x260
[90444.104597] ? copy_msghdr_from_user+0x5c/0x90
[90444.105916] ? __mod_lruvec_state+0x3c/0xe0
[90444.107210] ___sys_sendmsg+0x81/0xc0
[90444.108484] ? do_filp_open+0xa5/0x100
[90444.109732] ? handle_mm_fault+0x117b/0x1e00
[90444.110970] ? __check_object_size+0x46/0x147
[90444.112205] ? __check_object_size+0x136/0x147
[90444.113402] __sys_sendmsg+0x59/0xa0
[90444.114587] do_syscall_64+0x4d/0x90
[90444.115782] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[90444.116953] RIP: 0033:0x7f3c2393b7b8
[90444.118101] Code: Bad RIP value.
[90444.119240] RSP: 002b:00007ffc6ad8e6c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[90444.120408] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3c2393b7b8
[90444.121583] RDX: 0000000000000000 RSI: 00007ffc6ad8e740 RDI: 0000000000000003
[90444.122750] RBP: 000000005eea0c3a R08: 0000000000000001 R09: 00007ffc6ad8e68c
[90444.123928] R10: 0000000000404fa8 R11: 0000000000000246 R12: 0000000000000001
[90444.125073] R13: 0000000000000000 R14: 00007ffc6ad92a00 R15: 00000000004866a0
[90444.126221] Modules linked in: act_skbedit act_tunnel_key act_mirred bonding vxlan ip6_udp_tunnel udp_tunnel nfnetlink act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nfsv3 nfs_acl nfs lockd grace fscache tun bridge stp llc sunrpc rdma_ucm rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core intel_r
apl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mlxfw kvm act_ct nf_flow_table nf_nat nf_conntrack irqbypass crct10dif_pclmul nf_defrag_ipv6 igb ipmi_ssif libcrc32c crc32_pclmul crc32c_intel ipmi_si nf_defrag_ipv4 ptp ghash_clmulni_intel mei_me ses iTCO_wdt i2c_i801 pps_core
ioatdma iTCO_vendor_support joydev mei enclosure intel_cstate i2c_smbus wmi dca ipmi_devintf intel_uncore lpc_ich ipmi_msghandler pcspkr acpi_pad acpi_power_meter ast i2c_algo_bit drm_vram_helper drm_kms_helper drm_ttm_helper ttm drm mpt3sas raid_class scsi_transport_sas
[90444.136253] CR2: 0000000000000000
[90444.137621] ---[ end trace 924af62aa2b151bd ]---

Fixes: 553f9328385d ("net/mlx5e: Support tc block sharing for representors")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 4b61d3e8 19-Jun-2020 Po Liu <Po.Liu@nxp.com>

net: qos offload add flow status with dropped count

This patch adds a drop frames counter to tc flower offloading.
Reporting h/w dropped frames is necessary for some actions.
Some actions like police action and the coming introduced stream gate
action would produce dropped frames which is necessary for user. Status
update shows how many filtered packets increasing and how many dropped
in those packets.

v2: Changes
- Update commit comments suggest by Jiri Pirko.

Signed-off-by: Po Liu <Po.Liu@nxp.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 58ff18e1 28-May-2020 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: en_tc: Fix cast to restricted __be32 warning

Fixes sparse warnings:
warning: cast to restricted __be32
warning: restricted __be32 degrades to integer

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>


# c51323ee 28-May-2020 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: en_tc: Fix incorrect type in initializer warnings

Fix some trivial warnings of the type:
warning: incorrect type in initializer (different base types)

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>


# 28619046 27-May-2020 Nathan Chancellor <nathan@kernel.org>

net/mlx5e: Don't use err uninitialized in mlx5e_attach_decap

Clang warns:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:3712:6: warning:
variable 'err' is used uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
if (IS_ERR(d->pkt_reformat)) {
^~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:3718:6: note:
uninitialized use occurs here
if (err)
^~~
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:3712:2: note: remove the
'if' if its condition is always true
if (IS_ERR(d->pkt_reformat)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:3670:9: note: initialize
the variable 'err' to silence this warning
int err;
^
= 0
1 warning generated.

It is not wrong, err is only ever initialized in if statements but this
one is not in one. Initialize err to 0 to fix this.

Fixes: 14e6b038afa0 ("net/mlx5e: Add support for hw decapsulation of MPLS over UDP")
Link: https://github.com/ClangBuiltLinux/linux/issues/1037
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a683012a 19-Apr-2020 Pablo Neira Ayuso <pablo@netfilter.org>

net/mlx5e: replace EINVAL in mlx5e_flower_parse_meta()

The drivers reports EINVAL to userspace through netlink on invalid meta
match. This is confusing since EINVAL is usually reserved for malformed
netlink messages. Replace it by more meaningful codes.

Fixes: 6d65bc64e232 ("net/mlx5e: Add mlx5e_flower_parse_meta support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 20300aaf 24-May-2020 Maor Dickman <maord@mellanox.com>

net/mlx5e: Remove warning "devices are not on same switch HW"

On tunnel decap rule insertion, the indirect mechanism will attempt to
offload the rule on all uplink representors which will trigger the
"devices are not on same switch HW, can't offload forwarding" message
for the uplink which isn't on the same switch HW as the VF representor.

The above flow is valid and shouldn't cause warning message,
fix by removing the warning and only report this flow using extack.

Fixes: 321348475d54 ("net/mlx5e: Fix allowed tc redirect merged eswitch offload cases")
Signed-off-by: Maor Dickman <maord@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 0a2a6f49 27-May-2020 Roi Dayan <roid@mellanox.com>

net/mlx5e: Fix stats update for matchall classifier

It's bytes, packets, lastused.

Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# fca53304 18-May-2020 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Optimize performance for IPv4/IPv6 ethertype

The HW is optimized for IPv4/IPv6. For such cases, pending capability,
avoid matching on ethertype, and use ip_version field instead.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 4a5d5d73 11-May-2020 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Helper function to set ethertype

Set ethertype match in a helper function as a pre-step towards
optimizing it.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d34eb2fc 05-Mar-2019 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Offload flow rules to active lower representor

When a bond device is created over one or more non uplink representors,
and when a flow rule is offloaded to such bond device, offload a rule
to the active lower device.

Assuming that this is active-backup lag, the rules should be offloaded
to the active lower device which is the representor of the direct
path (not the failover).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 553f9328 02-Aug-2019 Vu Pham <vuhuong@mellanox.com>

net/mlx5e: Support tc block sharing for representors

Currently offloading a rule over a tc block shared by multiple
representors fails because an e-switch global hashtable to keep
the mapping from tc cookies to mlx5e flow instances is used, and
tc block sharing offloads the same rule/cookie multiple times,
each time for different representor sharing the tc block.

Changing the implementation and behavior by acknowledging and returning
success if the same rule/cookie is offloaded again to other slave
representor sharing the tc block by setting, checking and comparing
the netdev that added the rule first.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 32134847 23-Apr-2020 Maor Dickman <maord@mellanox.com>

net/mlx5e: Fix allowed tc redirect merged eswitch offload cases

After changing the parent_id to be the same for both NICs of same
The cited commit wrongly allow offload of tc redirect flows from
VF to uplink and vice versa when devcies are on different eswitch,
these cases aren't supported by HW.

Disallow the above offloads when devcies are on different eswitch
and VF LAG is not configured.

Fixes: f6dc1264f1c0 ("net/mlx5e: Disallow tc redirect offload cases we don't support")
Signed-off-by: Maor Dickman <maord@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 582234b4 08-Apr-2020 Eli Cohen <eli@mellanox.com>

net/mlx5e: Support pedit on mpls over UDP decap

Allow to modify ethernet headers while decapsulating mpls over UDP
packets. This is implemented using the same reformat object used for
decapsulation.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 14e6b038 03-Feb-2020 Eli Cohen <eli@mellanox.com>

net/mlx5e: Add support for hw decapsulation of MPLS over UDP

MPLS over UDP is supported in hardware by using a packet reformat object
with reformat type equal L3_TUNNEL_TO_L2 which both decapsulates the
outer L3, L4 and MPLS headers, and allows for setting the L2 headers of
the resulting decapsulated packet. For the hardware to operate
correctly, the configuration of the firmware must have
FLEX_PARSER_PROFILE_ENABLE = 1.

Example tc rule:
tc filter add dev bareudp0 protocol all prio 1 root flower enc_dst_port \
6635 enc_src_ip 8.8.8.23 action mpls pop protocol ip pipe \
action pedit ex munge eth dst set 00:11:22:33:44:21 pipe action \
mirred egress redirect dev enp59s0f0_0

We use pedit to set the correct destination MAC.

For MPLS over UDP decapsulation to take place, the driver logic requires
the following:

1. flower filter added on bareudp device.
2. action mpls pop
3. zero or more pedit munge actions
4. one redirect action

Current implementation supports only IPv4 and no VLAN.

tc filter show output looks like this:
filter protocol all pref 1 flower chain 0
filter protocol all pref 1 flower chain 0 handle 0x1
enc_src_ip 8.8.8.24
enc_dst_port 6635
in_hw in_hw_count 1
action order 1: mpls pop protocol ip pipe
index 2 ref 1 bind 1

action order 2: pedit action pipe keys 2
index 1 ref 1 bind 1
key #0 at eth+0: val 00112233 mask 00000000
key #1 at eth+4: val 44210000 mask 0000ffff

action order 3: mirred (Egress Redirect to device enp59s0f0_0) stolen
index 2 ref 1 bind 1

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 72046a91 29-Jan-2020 Eli Cohen <eli@mellanox.com>

net/mlx5e: Allow to match on mpls parameters

Support matching on MPLS over UDP parameters using misc2 section of
match parameters.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f828ca6a 17-Nov-2019 Eli Cohen <eli@mellanox.com>

net/mlx5e: Add support for hw encapsulation of MPLS over UDP

MPLS over UDP is supported by adding a rule on a representor net device
which does tunnel_key set, push mpls and forward to a baredup device. At
the hardware level we use a packet_reformat_context object to do the
encapsulation of the packet.

The resulting packet looks as follows (left side transmitted first):
outer L2 | outer IP | UDP | MPLS | inner L3 and data |

Example usage:
tc filter add dev $rep0 protocol ip prio 1 root flower skip_sw \
action tunnel_key set src_ip 8.8.8.21 dst_ip 8.8.8.24 id 555 \
dst_port 6635 tos 4 ttl 6 csum action mpls push protocol 0x8847 \
label 555 tc 3 action mirred egress redirect dev bareudp0

This is how the filter is shown with tc filter show:
tc filter show dev enp59s0f0_0 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
skip_sw
in_hw in_hw_count 1
action order 1: tunnel_key set
src_ip 8.8.8.21
dst_ip 8.8.8.24
key_id 555
dst_port 6635
csum
tos 0x4
ttl 6 pipe
index 1 ref 1 bind 1

action order 2: mpls push protocol mpls_uc label 555 tc 3 ttl 255 pipe
index 1 ref 1 bind 1

action order 3: mirred (Egress Redirect to device bareudp0) stolen
index 1 ref 1 bind 1

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e2394a61 12-May-2020 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Move TC-specific code from en_main.c to en_tc.c

As a preparation for introducing new kconfig option that controls
compilation of all TC offloads code in mlx5, extract TC-specific code from
en_main.c to en_tc.c. This allows easily compiling out the code by
only including new source in make file when corresponding kconfig is
enabled instead of adding multiple ifdef blocks to en_main.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 549c243e 12-May-2020 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Extract neigh-specific code from en_rep.c to rep/neigh.c

As a preparation for introducing new kconfig option that controls
compilation of all TC offloads code in mlx5, extract neigh-specific code
from en_rep.c to standalone file. This allows easily compiling out the code
by only including new source in make file when corresponding kconfig is
enabled instead of adding multiple ifdef blocks to en_rep.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 768c3667 12-May-2020 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Extract TC-specific code from en_rep.c to rep/tc.c

As a preparation for introducing new kconfig option that controls
compilation of all TC offloads code in mlx5, extract TC-specific code from
en_rep.c to standalone file. This allows easily compiling out the code by
only including new source in make file when corresponding kconfig is
enabled instead of adding multiple ifdef blocks to en_rep.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c75a33c8 06-May-2020 Jacob Keller <jacob.e.keller@intel.com>

net: remove newlines in NL_SET_ERR_MSG_MOD

The NL_SET_ERR_MSG_MOD macro is used to report a string describing an
error message to userspace via the netlink extended ACK structure. It
should not have a trailing newline.

Add a cocci script which catches cases where the newline marker is
present. Using this script, fix the handful of cases which accidentally
included a trailing new line.

I couldn't figure out a way to get a patch mode working, so this script
only implements context, report, and org.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 244faedf 24-Apr-2020 Raed Salem <raeds@mellanox.com>

net/mlx5: Refactor imm_inval_pkey field in cqe struct

The imm_inval_pkey field can hold four different types of data,
depends on the usage, the data could be one of the below:
- Immediate field of the received message
- Invalidate rkey
- Pkey of the packet
- Flow table metadata

Current implementation doesn't reflect the intended usage of the
field at usage time.

Reflect the different types by replace this field with a union,
modify code where this field is used to reflect its intended
usage.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d65dbedf 24-Apr-2020 Huy Nguyen <huyn@mellanox.com>

net/mlx5: Add support for COPY steering action

Add COPY type to modify_header action. IPsec feature is the first
feature that needs COPY steering action.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>


# e0b4b472 09-Apr-2020 Leon Romanovsky <leon@kernel.org>

net/mlx5: Update transobj.c new cmd interface

Do mass update of transobj.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# d7a42ad0 25-Mar-2020 Roi Dayan <roid@mellanox.com>

net/mlx5e: Allow partial data mask for tunnel options

We use mapping to save and restore the tunnel options.
Save also the tunnel options mask.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d5a3c2b6 29-Mar-2020 Roi Dayan <roid@mellanox.com>

net/mlx5e: Fix missing pedit action after ct clear action

With ct clear action we should not allocate the action in hw
and not release the mod_acts parsed in advance.
It will be done when handling the ct clear action.

Fixes: 1ef3018f5af3 ("net/mlx5e: CT: Support clear action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 70f478ca 01-Apr-2020 Dmytro Linkin <dmitrolin@mellanox.com>

net/mlx5e: Fix nest_level for vlan pop action

Current value of nest_level, assigned from net_device lower_level value,
does not reflect the actual number of vlan headers, needed to pop.
For ex., if we have untagged ingress traffic sended over vlan devices,
instead of one pop action, driver will perform two pop actions.
To fix that, calculate nest_level as difference between vlan device and
parent device lower_levels.

Fixes: f3b0a18bb6cb ("net: remove unnecessary variables and callback")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 93a129eb 28-Mar-2020 Jiri Pirko <jiri@mellanox.com>

net: sched: expose HW stats types per action used by drivers

It may be up to the driver (in case ANY HW stats is passed) to select
which type of HW stats he is going to use. Add an infrastructure to
expose this information to user.

$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
dst_ip 192.168.1.1
in_hw in_hw_count 2
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 10 sec used 10 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
used_hw_stats immediate <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 49964352 13-Mar-2020 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: E-Switch: Move eswitch chains to a new directory

eswitch_offloads_chains.{c,h} were just introduced this kernel release
cycle, eswitch is in high development demand right now and many
features are planned to be added to it. eswitch deserves its own
directory and here we move these new files to there, in preparation for
upcoming eswitch features and new files.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>


# 49397b80 20-Mar-2020 Dan Carpenter <dan.carpenter@oracle.com>

net/mlx5e: Fix actions_match_supported() return

The actions_match_supported() function returns a bool, true for success
and false for failure. This error path is returning a negative which
is cast to true but it should return false.

Fixes: 4c3844d9e97e ("net/mlx5e: CT: Introduce connection tracking")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 53eca1f3 16-Mar-2020 Jakub Kicinski <kuba@kernel.org>

net: rename flow_action_hw_stats_types* -> flow_action_hw_stats*

flow_action_hw_stats_types_check() helper takes one of the
FLOW_ACTION_HW_STATS_*_BIT values as input. If we align
the arguments to the opening bracket of the helper there
is no way to call this helper and stay under 80 characters.

Remove the "types" part from the new flow_action helpers
and enum values.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 613f53fe 24-Feb-2020 Eli Cohen <eli@mellanox.com>

net/mlx5: Eswitch, enable forwarding back to uplink port

Add dependencny on cap termination_table_raw_traffic to allow non
encapsulated packets received from uplink to be forwarded back to the
received uplink port.

Refactor the conditions into a separate function.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b5f814cc 01-Mar-2020 Eli Cohen <eli@mellanox.com>

net/mlx5: Avoid configuring eswitch QoS if not supported

Check if QoS is enabled for the eswitch before attempting to configure
QoS parameters and emit a netlink error if not supported.

Introduce an API to check if QoS is supported for the eswitch.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d0645b37 03-Mar-2020 Roi Dayan <roid@mellanox.com>

net/mlx5e: Fix rejecting all egress rules not on vlan

The original condition rejected all egress rules that
are not on tunnel device.
Also, the whole point of this egress reject was to disallow bad
rules because of egdev which doesn't exists today, so remove
this check entirely.

Fixes: 0a7fcb78cc21 ("net/mlx5e: Support inner header rewrite with goto action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 636bb968 10-Mar-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: en_tc: Rely just on register loopback for tunnel restoration

Register loopback which is needed for tunnel restoration, is now always
enabled if supported and not just with metadata enabled, check for
that instead.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1ef3018f 11-Mar-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: CT: Support clear action

Clear action, as with software, removes all ct metadata from
the packet.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5c6b9460 11-Mar-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: CT: Handle misses after executing CT action

Mark packets with a unique tupleid, and on miss use that id to get
the act ct restore_cookie. Using that restore cookie, we ask CT to
restore the relevant info on the SKB.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4c3844d9 11-Mar-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: CT: Introduce connection tracking

Add support for offloading tc ct action and ct matches.
We translate the tc filter with CT action the following HW model:

+-------------------+ +--------------------+ +--------------+
+ pre_ct (tc chain) +----->+ CT (nat or no nat) +--->+ post_ct +----->
+ original match + | + tuple + zone match + | + fte_id match + |
+-------------------+ | +--------------------+ | +--------------+ |
v v v
set chain miss mapping set mark original
set fte_id set label filter
set zone set established actions
set tunnel_id do nat (if needed)
do decap

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a16fa289 10-Mar-2020 Jiri Pirko <jiri@resnulli.us>

flow_offload: restrict driver to pass one allowed bit to flow_action_hw_stats_types_check()

The intention of this helper was to allow driver to specify one type
that it supports, so not only "any" value would pass. So make the API
more strict and allow driver to pass only 1 bit that is going
to be checked.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2fbbc30d 18-Feb-2020 Eli Cohen <eli@mellanox.com>

net/mlx5: Verify goto chain offload support

According to PRM, forward to flow table along with either packet
reformat or decap is supported only if reformat_and_fwd_to_table
capability is set for the flow table.

Add dependency on the capability and pack all the conditions for "goto
chain" in a single function.

Fix language in error message in case of not supporting forward to a
lower numbered flow table.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 48855479 06-Mar-2020 Jiri Pirko <jiri@mellanox.com>

flow_offload: introduce "delayed" HW stats type and allow it in mlx5

Introduce new type for delayed HW stats and allow the value in
mlx5 offload.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 319a1d19 06-Mar-2020 Jiri Pirko <jiri@mellanox.com>

flow_offload: check for basic action hw stats type

Introduce flow_action_basic_hw_stats_types_check() helper and use it
in drivers. That sanitizes the drivers which do not have support
for action HW stats types.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 404402ab 20-Feb-2020 Sebastian Hense <sebastian.hense1@ibm.com>

net/mlx5e: Fix endianness handling in pedit mask

The mask value is provided as 64 bit and has to be casted in
either 32 or 16 bit. On big endian systems the wrong half was
casted which resulted in an all zero mask.

Fixes: 2b64beba0251 ("net/mlx5e: Support header re-write of partial fields in TC pedit offload")
Signed-off-by: Sebastian Hense <sebastian.hense1@ibm.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# bc1d75fa 13-Feb-2020 Roi Dayan <roid@mellanox.com>

net/mlx5e: Remove redundant comment about goto slow path

The code is self explanatory and makes the comment redundant.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 178f69b4 13-Feb-2020 Eli Cohen <eli@mellanox.com>

net/mlx5e: Reduce number of arguments in slow path handling

mlx5e_tc_offload_to_slow_path() and mlx5e_tc_unoffload_from_slow_path()
take an extra argument allocated on the stack of the caller but not used
by the caller. Avoid the extra argument and use local variable in the
function itself.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# dec481c8 13-Feb-2020 Eli Cohen <eli@mellanox.com>

net/mlx5e: Remove unused argument from parse_tc_pedit_action()

parse_attr is not used by parse_tc_pedit_action() so revmove it.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 61644c3d 18-Feb-2020 Roi Dayan <roid@mellanox.com>

net/mlx5e: Use NL_SET_ERR_MSG_MOD() extack for errors

This to be consistent and adds the module name to the error message.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 4ccd83f4 18-Feb-2020 Roi Dayan <roid@mellanox.com>

net/mlx5e: Use netdev_warn() instead of pr_err() for errors

This is for added netdev prefix that helps identify
the source of the message.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 297eaf5b 06-Feb-2020 Roi Dayan <roid@mellanox.com>

net/mlx5: E-Switch, Allow goto earlier chain if FW supports it

Mellanox FW can support this if ignore_flow_level capability exists.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ffec9702 17-Feb-2020 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net/mlx5e: Don't allow forwarding between uplink

We can install forwarding packets rule between uplink
in switchdev mode, as show below. But the hardware does
not do that as expected (mlnx_perf -i $PF1, we can't get
the counter of the PF1). By the way, if we add the uplink
PF0, PF1 to Open vSwitch and enable hw-offload, the rules
can be offloaded but not work fine too. This patch add a
check and if so return -EOPNOTSUPP.

$ tc filter add dev $PF0 protocol all parent ffff: prio 1 handle 1 \
flower skip_sw action mirred egress redirect dev $PF1

$ tc -d -s filter show dev $PF0 ingress
skip_sw
in_hw in_hw_count 1
action order 1: mirred (Egress Redirect to device enp130s0f1) stolen
...
Sent hardware 408954 bytes 4173 pkt
...

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b8ce9037 15-Feb-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: Restore tunnel metadata on miss

In tunnel and chains setup, we decapsulate the packets on first chain hop,
if we miss on later chains, the packet will comes up without tunnel header,
so it won't be taken by the tunnel device automatically, which fills the
tunnel metadata, and further tc tunnel matches won't work.

On miss, we get the tunnel mapping id, which was set on the chain 0 rule
that decapsulated the packet. This rule matched the tunnel outer
headers. From the tunnel mapping id, we get to this tunnel matches
and restore the equivalent tunnel info metadata dst on the skb.
We also set the skb->dev to the relevant device (tunnel device).
Now further tc processing can be done on the relevant device.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 0a7fcb78 15-Feb-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: Support inner header rewrite with goto action

The hardware supports header rewrite of outer headers only.
To perform header rewrite on inner headers, we must first
decapsulate the packet.

Currently, the hardware decap action is explicitly set by the tc
tunnel_key unset action. However, with goto action the user won't
use the tunnel_key unset action. In addition, header rewrites actions
will not apply to the inner header as done by the software model.

To support this, we will map each tunnel matches seen on a tc rule to
a unique tunnel id, implicity add a decap action on tc chain 0 flows,
and mark the packets with this unique tunnel id. Tunnel matches on
the decapsulated tunnel on later chains will match on this unique id
instead of the actual packet.

We will also use this mapping to restore the tunnel info metadata
on miss.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7f2fd0a5 15-Feb-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: Disallow inserting vxlan/vlan egress rules without decap/pop

Currently, rules on tunnel devices can be offloaded without decap action
when a vlan pop action exists. Similarly, the driver will offload rules
on vlan interfaces with no pop action when a decap action exists.

Disallow the faulty behavior by checking that vlan egress rules do pop or
drop and vxlan egress rules do decap, as intended.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ea4cd837 15-Feb-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: Move tc tunnel parsing logic with the rest at tc_tun module

Currently, tunnel parsing is split between en_tc and tc_tun. The next
patch will replace the tunnel fields matching with a register match,
and will not need this parsing.

Move the tunnel parsing logic to tc_tun as a pre-step for skipping
it in the next patch.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6ae4a6a5 15-Feb-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: Allow re-allocating mod header actions

Currently the size of the mod header actions array is deduced from the
number of parsed TC header rewrite actions. However, mod header actions
are also used for setting HW register values. Support the dynamic
reallocation of the mod header array as a pre-step for adding HW
registers mod actions.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d6d27782 15-Feb-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5: E-Switch, Restore chain id on miss

Chain ids are mapped to the lower part of reg C, and after loopback
are copied to to CQE via a restore rule's flow_tag.

To let tc continue in the correct chain, we find the corresponding
chain id in the eswitch chain id <-> reg C mapping, and set the SKB's
tc extension chain to it.

That tells tc to continue processing from this set chain.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 8f1e0b97 15-Feb-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5: E-Switch, Mark miss packets with new chain id mapping

Currently, if we miss in hardware after jumping to some chain,
we continue in chain 0 in software. This is wrong, and with the new
tc skb extension we can now restore the chain id on the skb, so
tc can continue with in the correct chain.

To restore the chain id in software after a miss in hardware, we create
a register mapping from 32bit chain ids to 16bit of reg_c0 (that
survives loopback), to 32bit chain ids. We then mark packets that
miss on some chain with the current chain id mapping on their reg_c0
field. Using this mapping, we will support up to 64K concurrent
chains.

This register survives loopback and gets to the CQE on flow_tag
via the eswitch restore rules.

In next commit, we will reverse the mapping we got on the CQE
to a chain id and tell tc to continue in the sw chain where we
left off via the tc skb extension.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d48834f9 24-Jan-2020 Jiri Pirko <jiri@mellanox.com>

mlx5: Use dev_net netdevice notifier registrations

Register the dev_net notifier and allow the per-net notifier to follow
the device into different namespace.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e401a184 12-Jan-2020 Eli Cohen <eli@mellanox.com>

net/mlx5: E-Switch, Prevent ingress rate configuration of uplink rep

Since the implementation relies on limiting the VF transmit rate to
simulate ingress rate limiting, and since either uplink representor or
ecpf are not associated with a VF, we limit the rate limit configuration
for those ports.

Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6d65bc64 07-Jan-2020 wenxu <wenxu@ucloud.cn>

net/mlx5e: Add mlx5e_flower_parse_meta support

In the flowtables offload all the devices in the flowtables
share the same flow_block. An offload rule will be installed on
all the devices. This scenario is not correct.

It is no problem if there are only two devices in the flowtable,
The rule with ingress and egress on the same device can be reject
by driver.

But more than two devices in the flowtable will install the wrong
rules on hardware.

For example:
Three devices in a offload flowtables: dev_a, dev_b, dev_c

A rule ingress from dev_a and egress to dev_b:
The rule will install on device dev_a.
The rule will try to install on dev_b but failed for ingress
and egress on the same device.
The rule will install on dev_c. This is not correct.

The flowtables offload avoid this case through restricting the ingress dev
with FLOW_DISSECTOR_KEY_META.

So the mlx5e driver also should support the FLOW_DISSECTOR_KEY_META parse.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 39ac237c 07-Jan-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5: E-Switch, Refactor chains and priorities

To support the entire chain and prio range (32bit + 16bit),
instead of a using a static array of chains/prios of limited size, create
them dynamically, and use a rhashtable to search for existing chains/prio
combinations.

This will be used in next patch to actually increase the number using
unamanged tables support and ignore flow level capability.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 61dc7b01 14-Nov-2019 Paul Blakey <paulb@mellanox.com>

net/mlx5: Refactor mlx5_create_auto_grouped_flow_table

Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param
which already carries the max_fte, prio and flags memebers, and is
used the same in similar mlx5_create_flow_table() function.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 15fc92ec 10-Dec-2019 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net/mlx5e: Support accept action on nic table

In one case, we may forward packets from one vport
to others, but only one packets flow will be accepted,
which destination ip was assign to VF.

+-----+ +-----+ +-----+
| VFn | | VF1 | | VF0 | accept
+--+--+ +--+--+ hairpin +--^--+
| | <--------------- |
| | |
+--+-----------v-+ +--+-------------+
| eswitch PF1 | | eswitch PF0 |
+----------------+ +----------------+

tc filter add dev $PF0 protocol all parent ffff: prio 1 handle 1 \
flower skip_sw action mirred egress redirect dev $VF0_REP
tc filter add dev $VF0 protocol ip parent ffff: prio 1 handle 1 \
flower skip_sw dst_ip $VF0_IP action pass
tc filter add dev $VF0 protocol all parent ffff: prio 2 handle 2 \
flower skip_sw action mirred egress redirect dev $VF1

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6412bb39 11-Dec-2019 Eli Cohen <eli@mellanox.com>

net/mlx5e: Fix hairpin RSS table size

Set hairpin table size to the corret size, based on the groups that
would be created in it. Groups are laid out on the table such that a
group occupies a range of entries in the table. This implies that the
group ranges should have correspondence to the table they are laid upon.

The patch cited below made group 1's size to grow hence causing
overflow of group range laid on the table.

Fixes: a795d8db2a6d ("net/mlx5e: Support RSS for IP-in-IP and IPv6 tunneled packets")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 554fe75c 31-Oct-2019 Dmytro Linkin <dmitrolin@mellanox.com>

net/mlx5e: Avoid duplicating rule destinations

Following scenario easily break driver logic and crash the kernel:
1. Add rule with mirred actions to same device.
2. Delete this rule.
In described scenario rule is not added to database and on deletion
driver access invalid entry.
Example:

$ tc filter add dev ens1f0_0 ingress protocol ip prio 1 \
flower skip_sw \
action mirred egress mirror dev ens1f0_1 pipe \
action mirred egress redirect dev ens1f0_1
$ tc filter del dev ens1f0_0 ingress protocol ip prio 1

Dmesg output:

[ 376.634396] mlx5_core 0000:82:00.0: mlx5_cmd_check:756:(pid 3439): DESTROY_FLOW_GROUP(0x934) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x563e2f)
[ 376.654983] mlx5_core 0000:82:00.0: del_hw_flow_group:567:(pid 3439): flow steering can't destroy fg 89 of ft 3145728
[ 376.673433] kasan: CONFIG_KASAN_INLINE enabled
[ 376.683769] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 376.695229] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
[ 376.705069] CPU: 7 PID: 3439 Comm: tc Not tainted 5.4.0-rc5+ #76
[ 376.714959] Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0a 08/12/2016
[ 376.726371] RIP: 0010:mlx5_del_flow_rules+0x105/0x960 [mlx5_core]
[ 376.735817] Code: 01 00 00 00 48 83 eb 08 e8 28 d9 ff ff 4c 39 e3 75 d8 4c 8d bd c0 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 84 04 00 00 48 8d 7d 28 8b 9 d
[ 376.761261] RSP: 0018:ffff888847c56db8 EFLAGS: 00010202
[ 376.770054] RAX: dffffc0000000000 RBX: ffff8888582a6da0 RCX: ffff888847c56d60
[ 376.780743] RDX: 0000000000000058 RSI: 0000000000000008 RDI: 0000000000000282
[ 376.791328] RBP: 0000000000000000 R08: fffffbfff0c60ea6 R09: fffffbfff0c60ea6
[ 376.802050] R10: fffffbfff0c60ea5 R11: ffffffff8630752f R12: ffff8888582a6da0
[ 376.812798] R13: dffffc0000000000 R14: ffff8888582a6da0 R15: 00000000000002c0
[ 376.823445] FS: 00007f675f9a8840(0000) GS:ffff88886d200000(0000) knlGS:0000000000000000
[ 376.834971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 376.844179] CR2: 00000000007d9640 CR3: 00000007d3f26003 CR4: 00000000001606e0
[ 376.854843] Call Trace:
[ 376.868542] __mlx5_eswitch_del_rule+0x49/0x300 [mlx5_core]
[ 376.877735] mlx5e_tc_del_fdb_flow+0x6ec/0x9e0 [mlx5_core]
[ 376.921549] mlx5e_flow_put+0x2b/0x50 [mlx5_core]
[ 376.929813] mlx5e_delete_flower+0x5b6/0xbd0 [mlx5_core]
[ 376.973030] tc_setup_cb_reoffload+0x29/0xc0
[ 376.980619] fl_reoffload+0x50a/0x770 [cls_flower]
[ 377.015087] tcf_block_playback_offloads+0xbd/0x250
[ 377.033400] tcf_block_setup+0x1b2/0xc60
[ 377.057247] tcf_block_offload_cmd+0x195/0x240
[ 377.098826] tcf_block_offload_unbind+0xe7/0x180
[ 377.107056] __tcf_block_put+0xe5/0x400
[ 377.114528] ingress_destroy+0x3d/0x60 [sch_ingress]
[ 377.122894] qdisc_destroy+0xf1/0x5a0
[ 377.129993] qdisc_graft+0xa3d/0xe50
[ 377.151227] tc_get_qdisc+0x48e/0xa20
[ 377.165167] rtnetlink_rcv_msg+0x35d/0x8d0
[ 377.199528] netlink_rcv_skb+0x11e/0x340
[ 377.219638] netlink_unicast+0x408/0x5b0
[ 377.239913] netlink_sendmsg+0x71b/0xb30
[ 377.267505] sock_sendmsg+0xb1/0xf0
[ 377.273801] ___sys_sendmsg+0x635/0x900
[ 377.312784] __sys_sendmsg+0xd3/0x170
[ 377.338693] do_syscall_64+0x95/0x460
[ 377.344833] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 377.352321] RIP: 0033:0x7f675e58e090

To avoid this, for every mirred action check if output device was
already processed. If so - drop rule with EOPNOTSUPP error.

Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# eb252c3a 02-Dec-2019 Roi Dayan <roid@mellanox.com>

net/mlx5e: Fix free peer_flow when refcount is 0

It could be neigh update flow took a refcount on peer flow so
sometimes we cannot release peer flow even if parent flow is
being freed now.

Fixes: 5a7e5bcb663d ("net/mlx5e: Extend tc flow struct with reference counter")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a23dae79 04-Dec-2019 Roi Dayan <roid@mellanox.com>

net/mlx5e: Fix freeing flow with kfree() and not kvfree()

Flows are allocated with kzalloc() so free with kfree().

Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ab818362 22-Nov-2019 Taehee Yoo <ap420073@gmail.com>

net: use rhashtable_lookup() instead of rhashtable_lookup_fast()

rhashtable_lookup_fast() internally calls rcu_read_lock() then,
calls rhashtable_lookup(). So if rcu_read_lock() is already held,
rhashtable_lookup() is enough.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>


# b6a4ac24 07-Nov-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Reorder mirrer action parsing to check for encap first

Mirred action parsing code in parse_tc_fdb_actions() first checks if
out_dev has same parent id, and only verifies that there is a pending encap
action that was parsed before. Recent change in vxlan module made function
netdev_port_same_parent_id() to return true when called for mlx5 eswitch
representor and vxlan device created explicitly on mlx5 representor
device (vxlan devices created with "external" flag without explicitly
specifying parent interface are not affected). With call to
netdev_port_same_parent_id() returning true, incorrect code path is chosen
and encap rules fail to offload because vxlan dev is not a valid eswitch
forwarding dev. Dmesg log of error:

[ 1784.389797] devices ens1f0_0 vxlan1 not on same switch HW, can't offload forwarding

In order to fix the issue, rearrange conditional in parse_tc_fdb_actions()
to check for pending encap action before checking if out_dev has the same
parent id.

Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7b83355f 07-Nov-2019 Eli Cohen <eli@mellanox.com>

net/mlx5e: Fix ingress rate configuration for representors

Current code uses the old method of prio encoding in
flow_cls_common_offload. Fix to follow the changes introduced in
commit ef01adae0e43 ("net: sched: use major priority number as hardware priority").

Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 84179981 11-Nov-2019 Paul Blakey <paulb@mellanox.com>

net/mlx5: TC: Offload flow table rules

Since both tc rules and flow table rules are of the same format,
we can re-use tc parsing for that, and move the flow table rules
to their steering domain - In this case, the next chain after
max tc chain.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2cf2954b 11-Nov-2019 Paul Blakey <paulb@mellanox.com>

net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines

Rename it to prepare for next patch that will add a
different type of offload to the FDB.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ab9341b5 07-Oct-2019 Dmytro Linkin <dmitrolin@mellanox.com>

net/mlx5e: Add ToS (DSCP) header rewrite support

Add support for rewriting of DSCP part of ToS field.
Next commands, for example, can be used to offload rewrite action:

OVS:
$ ovs-ofctl add-flow ovs-sriov "ip, in_port=REP, \
actions=mod_nw_tos:68, output:NIC"

iproute2 (used retain mask, as tc command rewrite whole ToS field):
$ tc filter add dev REP ingress protocol ip prio 1 flower skip_sw \
ip_proto icmp action pedit munge ip tos set 68 retain 0xfc pipe \
action mirred egress redirect dev NIC

Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 88f30bbc 02-Oct-2019 Dmytro Linkin <dmitrolin@mellanox.com>

net/mlx5e: Bit sized fields rewrite support

This patch doesn't change any functionality, but is a pre-step for
adding support for rewriting of bit-sized fields, like DSCP and ECN
in IPv4 header, similar fields in IPv6, etc.

Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ae2741e2 11-Sep-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Verify that rule has at least one fwd/drop action

Currently, mlx5 tc layer doesn't verify that rule has at least one forward
or drop action which leads to following firmware syndrome when user tries
to offload such action:

[ 1824.860501] mlx5_core 0000:81:00.0: mlx5_cmd_check:753:(pid 29458): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x144b7a)

Add check at the end of parse_tc_fdb_actions() that verifies that resulting
attribute has action fwd or drop flag set.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2a4b6526 10-Sep-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Don't store direct pointer to action's tunnel info

Geneve implementation changed mlx5 tc to user direct pointer to tunnel_key
action's internal struct ip_tunnel_info instance. However, this leads to
use-after-free error when initial filter that caused creation of new encap
entry is deleted or when tunnel_key action is manually overwritten through
action API. Moreover, with recent TC offloads API unlocking change struct
flow_action_entry->tunnel point to temporal copy of tunnel info that is
deallocated after filter is offloaded to hardware which causes bug to
reproduce every time new filter is attached to existing encap entry with
following KASAN bug:

[ 314.885555] ==================================================================
[ 314.886641] BUG: KASAN: use-after-free in memcmp+0x2c/0x60
[ 314.886864] Read of size 1 at addr ffff88886c746280 by task tc/2682

[ 314.887179] CPU: 22 PID: 2682 Comm: tc Not tainted 5.3.0-rc7+ #703
[ 314.887188] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 314.887195] Call Trace:
[ 314.887215] dump_stack+0x9a/0xf0
[ 314.887236] print_address_description+0x67/0x323
[ 314.887248] ? memcmp+0x2c/0x60
[ 314.887257] ? memcmp+0x2c/0x60
[ 314.887272] __kasan_report.cold+0x1a/0x3d
[ 314.887474] ? __mlx5e_tc_del_fdb_peer_flow+0x100/0x1b0 [mlx5_core]
[ 314.887484] ? memcmp+0x2c/0x60
[ 314.887509] kasan_report+0xe/0x12
[ 314.887521] memcmp+0x2c/0x60
[ 314.887662] mlx5e_tc_add_fdb_flow+0x51b/0xbe0 [mlx5_core]
[ 314.887838] ? mlx5e_encap_take+0x110/0x110 [mlx5_core]
[ 314.887902] ? lockdep_init_map+0x87/0x2c0
[ 314.887924] ? __init_waitqueue_head+0x4f/0x60
[ 314.888062] ? mlx5e_alloc_flow.isra.0+0x18c/0x1c0 [mlx5_core]
[ 314.888207] __mlx5e_add_fdb_flow+0x2d7/0x440 [mlx5_core]
[ 314.888359] ? mlx5e_tc_update_neigh_used_value+0x6f0/0x6f0 [mlx5_core]
[ 314.888374] ? match_held_lock+0x2e/0x240
[ 314.888537] mlx5e_configure_flower+0x830/0x16a0 [mlx5_core]
[ 314.888702] ? __mlx5e_add_fdb_flow+0x440/0x440 [mlx5_core]
[ 314.888713] ? down_read+0x118/0x2c0
[ 314.888728] ? down_read_killable+0x300/0x300
[ 314.888882] ? mlx5e_rep_get_ethtool_stats+0x180/0x180 [mlx5_core]
[ 314.888899] tc_setup_cb_add+0x127/0x270
[ 314.888937] fl_hw_replace_filter+0x2ac/0x380 [cls_flower]
[ 314.888976] ? fl_hw_destroy_filter+0x1b0/0x1b0 [cls_flower]
[ 314.888990] ? fl_change+0xbcf/0x27ef [cls_flower]
[ 314.889030] ? fl_change+0xa57/0x27ef [cls_flower]
[ 314.889069] fl_change+0x16bd/0x27ef [cls_flower]
[ 314.889135] ? __rhashtable_insert_fast.constprop.0+0xa00/0xa00 [cls_flower]
[ 314.889167] ? __radix_tree_lookup+0xa4/0x130
[ 314.889200] ? fl_get+0x169/0x240 [cls_flower]
[ 314.889218] ? fl_walk+0x230/0x230 [cls_flower]
[ 314.889249] tc_new_tfilter+0x5e1/0xd40
[ 314.889281] ? __rhashtable_insert_fast.constprop.0+0xa00/0xa00 [cls_flower]
[ 314.889309] ? tc_del_tfilter+0xa30/0xa30
[ 314.889335] ? __lock_acquire+0x5b5/0x2460
[ 314.889378] ? find_held_lock+0x85/0xa0
[ 314.889442] ? tc_del_tfilter+0xa30/0xa30
[ 314.889465] rtnetlink_rcv_msg+0x4ab/0x5f0
[ 314.889488] ? rtnl_dellink+0x490/0x490
[ 314.889518] ? lockdep_hardirqs_on+0x260/0x260
[ 314.889538] ? netlink_deliver_tap+0xab/0x5a0
[ 314.889550] ? match_held_lock+0x1b/0x240
[ 314.889575] netlink_rcv_skb+0xd0/0x200
[ 314.889588] ? rtnl_dellink+0x490/0x490
[ 314.889605] ? netlink_ack+0x440/0x440
[ 314.889635] ? netlink_deliver_tap+0x161/0x5a0
[ 314.889648] ? lock_downgrade+0x360/0x360
[ 314.889657] ? lock_acquire+0xe5/0x210
[ 314.889686] netlink_unicast+0x296/0x350
[ 314.889707] ? netlink_attachskb+0x390/0x390
[ 314.889726] ? _copy_from_iter_full+0xe0/0x3a0
[ 314.889738] ? __virt_addr_valid+0xbb/0x130
[ 314.889771] netlink_sendmsg+0x394/0x600
[ 314.889800] ? netlink_unicast+0x350/0x350
[ 314.889817] ? move_addr_to_kernel.part.0+0x90/0x90
[ 314.889852] ? netlink_unicast+0x350/0x350
[ 314.889872] sock_sendmsg+0x96/0xa0
[ 314.889891] ___sys_sendmsg+0x482/0x520
[ 314.889919] ? copy_msghdr_from_user+0x250/0x250
[ 314.889930] ? __fput+0x1fa/0x390
[ 314.889941] ? task_work_run+0xb7/0xf0
[ 314.889957] ? exit_to_usermode_loop+0x117/0x120
[ 314.889972] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 314.889982] ? do_syscall_64+0x74/0xe0
[ 314.889992] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 314.890012] ? mark_lock+0xac/0x9a0
[ 314.890028] ? __lock_acquire+0x5b5/0x2460
[ 314.890053] ? mark_lock+0xac/0x9a0
[ 314.890083] ? __lock_acquire+0x5b5/0x2460
[ 314.890112] ? match_held_lock+0x1b/0x240
[ 314.890144] ? __fget_light+0xa1/0xf0
[ 314.890166] ? sockfd_lookup_light+0x91/0xb0
[ 314.890187] __sys_sendmsg+0xba/0x130
[ 314.890201] ? __sys_sendmsg_sock+0xb0/0xb0
[ 314.890225] ? __blkcg_punt_bio_submit+0xd0/0xd0
[ 314.890264] ? lockdep_hardirqs_off+0xbe/0x100
[ 314.890274] ? mark_held_locks+0x24/0x90
[ 314.890286] ? do_syscall_64+0x1e/0xe0
[ 314.890308] do_syscall_64+0x74/0xe0
[ 314.890325] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 314.890336] RIP: 0033:0x7f00ca33d7b8
[ 314.890348] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 5
4
[ 314.890356] RSP: 002b:00007ffea2983928 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 314.890369] RAX: ffffffffffffffda RBX: 000000005d777d5b RCX: 00007f00ca33d7b8
[ 314.890377] RDX: 0000000000000000 RSI: 00007ffea2983990 RDI: 0000000000000003
[ 314.890384] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[ 314.890392] R10: 0000000000404eda R11: 0000000000000246 R12: 0000000000000001
[ 314.890400] R13: 000000000047f640 R14: 00007ffea2987b58 R15: 0000000000000021

[ 314.890529] Allocated by task 2687:
[ 314.890684] save_stack+0x1b/0x80
[ 314.890694] __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 314.890705] __kmalloc_track_caller+0x102/0x340
[ 314.890721] kmemdup+0x1d/0x40
[ 314.890730] tc_setup_flow_action+0x731/0x2c27
[ 314.890743] fl_hw_replace_filter+0x23b/0x380 [cls_flower]
[ 314.890756] fl_change+0x16bd/0x27ef [cls_flower]
[ 314.890765] tc_new_tfilter+0x5e1/0xd40
[ 314.890776] rtnetlink_rcv_msg+0x4ab/0x5f0
[ 314.890786] netlink_rcv_skb+0xd0/0x200
[ 314.890796] netlink_unicast+0x296/0x350
[ 314.890805] netlink_sendmsg+0x394/0x600
[ 314.890815] sock_sendmsg+0x96/0xa0
[ 314.890825] ___sys_sendmsg+0x482/0x520
[ 314.890834] __sys_sendmsg+0xba/0x130
[ 314.890844] do_syscall_64+0x74/0xe0
[ 314.890854] entry_SYSCALL_64_after_hwframe+0x49/0xbe

[ 314.890937] Freed by task 2687:
[ 314.891076] save_stack+0x1b/0x80
[ 314.891086] __kasan_slab_free+0x12c/0x170
[ 314.891095] kfree+0xeb/0x2f0
[ 314.891106] tc_cleanup_flow_action+0x69/0xa0
[ 314.891119] fl_hw_replace_filter+0x2c5/0x380 [cls_flower]
[ 314.891132] fl_change+0x16bd/0x27ef [cls_flower]
[ 314.891140] tc_new_tfilter+0x5e1/0xd40
[ 314.891151] rtnetlink_rcv_msg+0x4ab/0x5f0
[ 314.891161] netlink_rcv_skb+0xd0/0x200
[ 314.891170] netlink_unicast+0x296/0x350
[ 314.891180] netlink_sendmsg+0x394/0x600
[ 314.891190] sock_sendmsg+0x96/0xa0
[ 314.891200] ___sys_sendmsg+0x482/0x520
[ 314.891208] __sys_sendmsg+0xba/0x130
[ 314.891218] do_syscall_64+0x74/0xe0
[ 314.891228] entry_SYSCALL_64_after_hwframe+0x49/0xbe

[ 314.891315] The buggy address belongs to the object at ffff88886c746280
which belongs to the cache kmalloc-96 of size 96
[ 314.891762] The buggy address is located 0 bytes inside of
96-byte region [ffff88886c746280, ffff88886c7462e0)
[ 314.892196] The buggy address belongs to the page:
[ 314.892387] page:ffffea0021b1d180 refcount:1 mapcount:0 mapping:ffff88835d00ef80 index:0x0
[ 314.892398] flags: 0x57ffffc0000200(slab)
[ 314.892413] raw: 0057ffffc0000200 ffffea00219e0340 0000000800000008 ffff88835d00ef80
[ 314.892423] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 314.892430] page dumped because: kasan: bad access detected

[ 314.892515] Memory state around the buggy address:
[ 314.892707] ffff88886c746180: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 314.892976] ffff88886c746200: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 314.893251] >ffff88886c746280: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 314.893522] ^
[ 314.893657] ffff88886c746300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 314.893924] ffff88886c746380: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
[ 314.894189] ==================================================================

Fix the issue by duplicating tunnel info into per-encap copy that is
deallocated with encap structure. Also, duplicate tunnel info in flow parse
attribute to support cases when flow might be attached asynchronously.

Fixes: 1f6da30697d0 ("net/mlx5e: Geneve, Keep tunnel info as pointer to the original struct")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f3b0a18b 21-Oct-2019 Taehee Yoo <ap420073@gmail.com>

net: remove unnecessary variables and callback

This patch removes variables and callback these are related to the nested
device structure.
devices that can be nested have their own nest_level variable that
represents the depth of nested devices.
In the previous patch, new {lower/upper}_level variables are added and
they replace old private nest_level variable.
So, this patch removes all 'nest_level' variables.

In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added
to get lockdep subclass value, which is actually lower nested depth value.
But now, they use the dynamic lockdep key to avoid lockdep warning instead
of the subclass.
So, this patch removes ->ndo_get_lock_subclass() callback.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fe1587a7 13-Sep-2019 Dmytro Linkin <dmitrolin@mellanox.com>

net/mlx5e: Fix matching on tunnel addresses type

In mlx5 parse_tunnel_attr() function dispatch on encap IP address type
is performed by directly checking flow_rule_match_key() on
FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS, and then on
FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS. However, since those are stored in
union, first check is always true if any type of encap address is set,
which leads to IPv6 tunnel encap address being parsed as IPv4 by mlx5.
Determine correct IP address type by checking control key first and if
it set, take address type from match.key->addr_type.

Fixes: d1bda7eecd88 ("net/mlx5e: Allow matching only enc_key_id/enc_dst_port for decapsulation action")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a2b7189b 03-Sep-2019 zhong jiang <zhongjiang@huawei.com>

net/mlx5: Use PTR_ERR_OR_ZERO rather than its implementation

PTR_ERR_OR_ZERO contains if(IS_ERR(...)) + PTR_ERR. It is better
to use it directly. hence just replace it.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 5cc3a8c6 27-Aug-2019 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Use ipv6_stub to avoid dependency with ipv6 being a module

mlx5 is dependent on IPv6 tristate since we use ipv6's nd_tbl directly,
alternatively we can use ipv6_stub->nd_tbl and remove the dependency.

Reported-by: Walter Harms <wharms@bfs.de>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2b688ea5 15-Aug-2019 Maor Gottlieb <maorg@mellanox.com>

net/mlx5: Add flow steering actions to fs_cmd shim layer

Add flow steering actions: modify header and packet reformat
to the fs_cmd shim layer. This allows each namespace to define
possibly different functionality for alloc/dealloc action commands.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# fc603294 29-Aug-2019 Mark Bloch <markb@mellanox.com>

net/mlx5: Set only stag for match untagged packets

cvlan_tag enabled in match criteria and disabled in
match value means both S & C tags don't exist (untagged of both).

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c786fe59 25-Jun-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Add trace point for neigh used value update

Allow tracing result of neigh used value update task that is executed
periodically on workqueue.

Usage example:
># cd /sys/kernel/debug/tracing
># echo mlx5:mlx5e_tc_update_neigh_used_value >> set_event
># cat trace
...
kworker/u48:4-8806 [009] ...1 55117.882428: mlx5e_tc_update_neigh_used_value:
netdev: ens1f0 IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_used=1

Added corresponding documentation in
Documentation/networking/device-driver/mellanox/mlx5.rst

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7a978759 27-Jun-2019 Dmytro Linkin <dmitrolin@mellanox.com>

net/mlx5e: Add tc flower tracepoints

Implemented following tracepoints:
1. Configure flower (mlx5e_configure_flower)
2. Delete flower (mlx5e_delete_flower)
3. Stats flower (mlx5e_stats_flower)

Usage example:
># cd /sys/kernel/debug/tracing
># echo mlx5:mlx5e_configure_flower >> set_event
># cat trace
...
tc-6535 [019] ...1 2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT

Added corresponding documentation in
Documentation/networking/device-driver/mellanox/mlx5.rst

Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 95435ad7 04-Aug-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Only access fully initialized flows in neigh update

To remove dependency on rtnl lock and prevent neigh update code from
accessing uninitialized flows when executing concurrently with tc, extend
mlx5e_tc_flow with 'init_done' completion. Modify helper
mlx5e_take_all_encap_flows() to wait for flow completion after obtaining
reference to it. Modify mlx5e_tc_encap_flows_del() and
mlx5e_tc_encap_flows_add() to skip flows that don't have OFFLOADED flag
set, which can happen if concurrent flow initialization failed.

This commit finishes neigh update refactoring for concurrent execution
started in previous change in this series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2a1f1768 03-Aug-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Refactor neigh update for concurrent execution

In order to remove dependency on rtnl lock and allow neigh update workqueue
task to execute concurrently with tc, refactor mlx5e_rep_neigh_update() for
concurrent execution:

- Lock encap table when accessing encap entry to prevent concurrent
changes. To do this properly, the initial encap state check is moved from
mlx5e_rep_neigh_update() into mlx5e_rep_update_flows() to be performed
under encap_tbl_lock protection.

- Wait for encap to be fully initialized before accessing it by means of
'res_ready' completion.

- Add mlx5e_take_all_encap_flows() helper which is used to construct a
temporary list of flows and efi indexes that is used to access current
encap data in flow which can be attached to multiple encaps
simultaneously. Release the flows from temporary list after
encap_tbl_lock critical section. This is necessary because
mlx5e_flow_put() can't be called while holding encap_tbl_lock.

- Modify mlx5e_tc_encap_flows_add() and mlx5e_tc_encap_flows_del() to work
with user-provided list of flows built by mlx5e_take_all_encap_flows(),
instead of traversing encap flow list directly.

This is first step in complex neigh update refactoring, which is finished
by following commit in this series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6a06c2f7 03-Aug-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Refactor neigh used value update for concurrent execution

In order to remove dependency on rtnl lock and allow neigh used value
update workqueue task to execute concurrently with tc, refactor
mlx5e_tc_update_neigh_used_value() for concurrent execution:

- Lock encap table when accessing encap entry to prevent concurrent
changes.

- Save offloaded encap flows to temporary list and release them after encap
entry is updated. Add mlx5e_put_encap_flow_list() helper which is
intended to be shared with neigh update code in following patch in this
series. This is necessary because mlx5e_flow_put() can't be called while
holding encap_tbl_lock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ac0d9176 11-Jun-2018 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Protect neigh hash encap list with spinlock and rcu

Rcu-ify mlx5e_neigh_hash_entry->encap_list by changing operations on encap
list to their rcu counterparts and extending encap structure with rcu_head
to free the encap instances after rcu grace period. Use rcu read lock when
traversing encap list. Implement helper mlx5e_get_next_valid_encap()
function that is used by mlx5e_tc_update_neigh_used_value() to safely
iterate over valid entries of nhe->encap_list.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3c140dd5 12-Aug-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Fix deallocation of non-fully init encap entries

Recent rtnl lock dependency refactoring changed encap entry attach code to
insert encap entry to hash table before it was fully initialized in order
to allow concurrent tc users to wait on completion for encap entry to
finish initialization. That change required all the users of encap entry to
obtain reference to it first and for caller that creates encap to put
reference to it on error, instead of freeing the entry memory directly.
However, releasing reference to such encap entry that wasn't fully
initialized causes NULL pointer dereference in
mlx5e_rep_encap_entry_detach() which expects e->out_dev to be set and encap
to be attached to nhe:

[ 1092.454517] BUG: unable to handle page fault for address: 00000000000420e8
[ 1092.454571] #PF: supervisor read access in kernel mode
[ 1092.454602] #PF: error_code(0x0000) - not-present page
[ 1092.454632] PGD 800000083032c067 P4D 800000083032c067 PUD 84107d067 PMD 0
[ 1092.454673] Oops: 0000 [#1] SMP PTI
[ 1092.454697] CPU: 20 PID: 22393 Comm: tc Not tainted 5.3.0-rc3+ #589
[ 1092.454733] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 1092.454806] RIP: 0010:mlx5e_rep_encap_entry_detach+0x1c/0x630 [mlx5_core]
[ 1092.454845] Code: be f4 ff ff ff e9 11 ff ff ff 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 30 <48> 8b 87 28 16 04 00 48 89 f7 48 05 d0 03 00 00 48 89
45 c8 e8 cb
[ 1092.454942] RSP: 0018:ffffb6f08421f5a0 EFLAGS: 00010286
[ 1092.454974] RAX: 0000000000000000 RBX: ffff8ab668644e00 RCX: ffffb6f08421f56c
[ 1092.455013] RDX: ffff8ab668644e40 RSI: ffff8ab668644e00 RDI: 0000000000000ac0
[ 1092.455053] RBP: ffffb6f08421f5f8 R08: 0000000000000001 R09: 0000000000000000
[ 1092.455092] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000ac0
[ 1092.455131] R13: 00000000ffffff9b R14: ffff8ab63f200ac0 R15: ffff8ab668644e40
[ 1092.455171] FS: 00007fa195bdc480(0000) GS:ffff8ab66fa00000(0000) knlGS:0000000000000000
[ 1092.455216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1092.455249] CR2: 00000000000420e8 CR3: 0000000867522001 CR4: 00000000001606e0
[ 1092.455288] Call Trace:
[ 1092.455315] ? __mutex_unlock_slowpath+0x4d/0x2a0
[ 1092.455365] mlx5e_encap_dealloc.isra.0+0x31/0x60 [mlx5_core]
[ 1092.455424] mlx5e_tc_add_fdb_flow+0x596/0x750 [mlx5_core]
[ 1092.455484] __mlx5e_add_fdb_flow+0x152/0x210 [mlx5_core]
[ 1092.455534] mlx5e_configure_flower+0x4d5/0xe30 [mlx5_core]
[ 1092.455574] tc_setup_cb_call+0x67/0xb0
[ 1092.455601] fl_hw_replace_filter+0x142/0x300 [cls_flower]
[ 1092.455639] fl_change+0xd24/0x1bdb [cls_flower]
[ 1092.455675] tc_new_tfilter+0x3e0/0x970
[ 1092.455709] ? tc_del_tfilter+0x720/0x720
[ 1092.455735] rtnetlink_rcv_msg+0x389/0x4b0
[ 1092.455763] ? netlink_deliver_tap+0x95/0x400
[ 1092.455791] ? rtnl_dellink+0x2d0/0x2d0
[ 1092.455817] netlink_rcv_skb+0x49/0x110
[ 1092.455844] netlink_unicast+0x171/0x200
[ 1092.455872] netlink_sendmsg+0x224/0x3f0
[ 1092.455901] sock_sendmsg+0x5e/0x60
[ 1092.455924] ___sys_sendmsg+0x2ae/0x330
[ 1092.455950] ? task_work_add+0x43/0x50
[ 1092.455976] ? fput_many+0x45/0x80
[ 1092.456004] ? __lock_acquire+0x248/0x18e0
[ 1092.456033] ? find_held_lock+0x2b/0x80
[ 1092.456058] ? task_work_run+0x7b/0xd0
[ 1092.456085] __sys_sendmsg+0x59/0xa0
[ 1092.457013] do_syscall_64+0x5c/0xb0
[ 1092.457924] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1092.458842] RIP: 0033:0x7fa195da27b8
[ 1092.459918] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83
ec 28 89 54
[ 1092.462634] RSP: 002b:00007fff94409298 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1092.464011] RAX: ffffffffffffffda RBX: 000000005d515b0e RCX: 00007fa195da27b8
[ 1092.465391] RDX: 0000000000000000 RSI: 00007fff94409300 RDI: 0000000000000003
[ 1092.466761] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[ 1092.468121] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000001
[ 1092.469456] R13: 0000000000480640 R14: 0000000000000016 R15: 0000000000000001
[ 1092.470766] Modules linked in: act_mirred act_tunnel_key cls_flower dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache tun bridge stp llc sunrpc rdma_ucm rdma_cm
iw_cm ib_cm mlx5_ib ib_uverbs ib_core intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mlx5_core kvm_intel kvm irqbypass crct10dif_pclmul mei_me crc32_pclmul crc32
c_intel igb iTCO_wdt ghash_clmulni_intel ses mlxfw intel_cstate iTCO_vendor_support ptp intel_uncore lpc_ich pps_core mei i2c_i801 joydev intel_rapl_perf ioatdma enclosure ipmi_ssif pcspkr dca wmi ipmi_
si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas
[ 1092.479618] CR2: 00000000000420e8
[ 1092.481214] ---[ end trace ce2e0f4d9a67f604 ]---

To fix the issue, set e->compl_result to positive value after encap was
initialized successfully. Check e->compl_result value in
mlx5e_encap_dealloc() and only detach and dealloc encap if the value is
positive.

Fixes: d589e785baf5 ("net/mlx5e: Allow concurrent creation of encap entries")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ef01adae 15-Aug-2019 Pablo Neira Ayuso <pablo@netfilter.org>

net: sched: use major priority number as hardware priority

tc transparently maps the software priority number to hardware. Update
it to pass the major priority which is what most drivers expect. Update
drivers too so they do not need to lshift the priority field of the
flow_cls_common_offload object. The stmmac driver is an exception, since
this code assumes the tc software priority is fine, therefore, lshift it
just to be conservative.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d589e785 08-Aug-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Allow concurrent creation of encap entries

Encap entries creation is fully synchronized by encap_tbl_lock. In order to
allow concurrent allocation of hardware resources used to offload
encapsulation, extend mlx5e_encap_entry with 'res_ready' completion. Move
call to mlx5e_tc_tun_create_header_ipv{4|6}() out of encap_tbl_lock
critical section. Modify code that attaches new flows to existing encap to
wait for 'res_ready' completion before using the entry. Insert encap entry
to table before provisioning it to hardware and modify all users of the
encap table to verify that encap was fully initialized by checking
completion result for non-zero value (and to wait for 'res_ready'
completion, if necessary).

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 61086f39 02-Aug-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Protect encap hash table with mutex

To remove dependency on rtnl lock, protect encap hash table from concurrent
modifications with new "encap_tbl_lock" mutex. Use the mutex to protect
internal encap entry state from concurrent modification. This is necessary
because a flow can be attached to multiple encap entries simultaneously,
which significantly complicates using finer grained per-entry lock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 948993f2 03-Jun-2018 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Extend encap entry with reference counter

List of flows attached to encap entry is used as implicit reference
counter (encap entry is deallocated when list becomes free) and as a
mechanism to obtain encap entry that flow is attached to (through list
head). This is not safe when concurrent modification of list of flows
attached to encap entry is possible. Proper atomic reference counter is
required to support concurrent access.

As a preparation for extending encap with reference counting, extract code
that lookups and deletes encap entry into standalone put/get helpers. In
order to remove this dependency on external locking, extend encap entry
with reference counter to manage its lifetime and extend flow structure
with direct pointer to encap entry that flow is attached to.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a734d007 08-Aug-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Allow concurrent creation of mod_hdr entries

Mod_hdr entries creation is fully synchronized by mod_hdr_tbl->lock. In
order to allow concurrent allocation of hardware resources used to offload
header rewrite, extend mlx5e_mod_hdr_entry with 'res_ready' completion.
Move call to mlx5_modify_header_alloc() out of mod_hdr_tbl->lock critical
section. Modify code that attaches new flows to existing mh to wait for
'res_ready' completion before using the entry. Insert mh to mod_hdr table
before provisioning it to hardware and modify all users of mod_hdr table to
verify that mh was fully initialized by checking completion result for
negative value (and to wait for 'res_ready' completion, if necessary).

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d2faae25 09-Aug-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Protect mod_hdr hash table with mutex

To remove dependency on rtnl lock, protect mod_hdr hash table from
concurrent modifications with new mutex.

Implement helper function to get flow namespace to prevent code
duplication.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 83a52f0d 08-Jun-2018 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Protect mod header entry flows list with spinlock

To remove dependency on rtnl lock, extend mod header entry with spinlock
and use it to protect list of flows attached to mod header entry from
concurrent modifications.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# dd58edc3 01-Jun-2018 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Extend mod header entry with reference counter

List of flows attached to mod header entry is used as implicit reference
counter (mod header entry is deallocated when list becomes free) and as a
mechanism to obtain mod header entry that flow is attached to (through list
head). This is not safe when concurrent modification of list of flows
attached to mod header entry is possible. Proper atomic reference counter
is required to support concurrent access.

As a preparation for extending mod header with reference counting, extract
code that lookups and deletes mod header entry into standalone put/get
helpers. In order to remove this dependency on external locking, extend mod
header entry with reference counter to manage its lifetime and extend flow
structure with direct pointer to mod header entry that flow is attached to.

To remove code duplication between legacy and switchdev mode
implementations that both support mod_hdr functionality, store mod_hdr
table in dedicated structure used by both fdb and kernel namespaces. New
table structure is extended with table lock by one of the following patches
in this series. Implement helper function to get correct mod_hdr table
depending on flow namespace.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# db76ca24 01-Aug-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Allow concurrent creation of hairpin entries

Hairpin entries creation is fully synchronized by hairpin_tbl_lock. In
order to allow concurrent initialization of mlx5e_hairpin structure
instances and provisioning of hairpin entries to hardware, extend
mlx5e_hairpin_entry with 'res_ready' completion. Move call to
mlx5e_hairpin_create() out of hairpin_tbl_lock critical section. Modify
code that attaches new flows to existing hpe to wait for 'res_ready'
completion before using the hpe. Insert hpe to hairpin table before
provisioning it to hardware and modify all users of hairpin table to verify
that hpe was fully initialized by checking hpe->hp pointer (and to wait for
'res_ready' completion, if necessary).

Modify dead peer update event handling function to save hpe's to temporary
list with their reference counter incremented. Wait for completion of hpe's
in temporary list and update their 'peer_gone' flag outside of
hairpin_tbl_lock critical section.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b32accda 31-Jul-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Protect hairpin hash table with mutex

To remove dependency on rtnl lock, protect hairpin hash table from
concurrent modifications with new "hairpin_tbl_lock" mutex.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 73edca73 07-Jun-2018 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Protect hairpin entry flows list with spinlock

To remove dependency on rtnl lock, extend hairpin entry with spinlock and
use it to protect list of flows attached to hairpin entry from concurrent
modifications.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e4f9abbd 08-Jun-2018 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Extend hairpin entry with reference counter

List of flows attached to hairpin entry is used as implicit reference
counter (hairpin entry is deallocated when list becomes free) and as a
mechanism to obtain hairpin entry that flow is attached to (through list
head). This is not safe when concurrent modification of list of flows
attached to hairpin entry is possible. Proper atomic reference counter is
required to support concurrent access.

As a preparation for extending hairpin with reference counting, extract
code that deletes hairpin entry into standalone function. In order to
remove this dependency on external locking, extend hairpin entry with
reference counter to manage its lifetime and extend flow structure with
direct pointer to hairpin entry that flow is attached to.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 93b3586e 17-Jul-2019 Huy Nguyen <huyn@mellanox.com>

net/mlx5: Support inner header match criteria for non decap flow action

We have an issue that OVS application creates an offloaded drop rule
that drops VXLAN traffic with both inner and outer header match
criteria. mlx5_core driver detects correctly the inner and outer
header match criteria but does not enable the inner header match criteria
due to an incorrect assumption in mlx5_eswitch_add_offloaded_rule that
only decap rule needs inner header criteria.

Solution:
Remove mlx5_esw_flow_attr's match_level and tunnel_match_level and add
two new members: inner_match_level and outer_match_level.
inner/outer_match_level is set to NONE if the inner/outer match criteria
is not specified in the tc rule creation request. The decap assumption is
removed and the code just needs to check for inner/outer_match_level to
enable the corresponding bit in firmware's match_criteria_enable value.

Fixes: 6363651d6dd7 ("net/mlx5e: Properly set steering match levels for offloaded TC decap rules")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6830b468 01-Aug-2019 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net/mlx5e: Allow dropping specific tunnel packets

In some case, we don't want to allow specific tunnel packets
to host that can avoid to take up high CPU (e.g network attacks).
But other tunnel packets which not matched in hardware will be
sent to host too.

$ tc filter add dev vxlan_sys_4789 \
protocol ip chain 0 parent ffff: prio 1 handle 1 \
flower dst_ip 1.1.1.100 ip_proto tcp dst_port 80 \
enc_dst_ip 2.2.2.100 enc_key_id 100 enc_dst_port 4789 \
action tunnel_key unset pipe action drop

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# fcb64c0f 08-May-2019 Eli Cohen <eli@mellanox.com>

net/mlx5: E-Switch, add ingress rate support

Use the scheduling elements to implement ingress rate limiter on an
eswitch ports ingress traffic. Since the ingress of eswitch port is the
egress of VF port, we control eswitch ingress by controlling VF egress.

Configuration is done using the ports' representor net devices.

Please note that burst size configuration is not supported by devices
ConnectX-5 and earlier generations.

Configuration examples:
tc:
tc filter add dev enp59s0f0_0 root protocol ip matchall action police rate 1mbit burst 20k

ovs:
ovs-vsctl set interface eth0 ingress_policing_rate=1000

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b6fac0b4 17-Sep-2018 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Protect tc flow table with mutex

TC flow table is created when first flow is added, and destroyed when last
flow is removed. This assumes that all accesses to the table are externally
synchronized with rtnl lock. To remove dependency on rtnl lock, add new
mutex mlx5e_tc_table->t_lock and use it to protect the flow table.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# fa833bd5 12-Mar-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Rely on rcu instead of rtnl lock when getting upper dev

Function netdev_master_upper_dev_get() generates warning if caller doesn't
hold rtnl lock. Modify rules update path to use rcu version of that
function.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ad86755b 13-Mar-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Protect unready flows with dedicated lock

In order to remove dependency on rtnl lock for protecting unready_flows
list when reoffloading unready flows on workqueue, extend representor
uplink private structure with dedicated 'unready_flows_lock' mutex. Take
the lock in all users of unready_flows list before accessing it. Implement
helper functions to add and delete unready flow.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c5d326b2 10-Oct-2018 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Protect tc flows hashtable with rcu

In order to remove dependency on rtnl lock, access to tc flows hashtable
must be explicitly protected from concurrent flows removal.

Extend tc flow structure with rcu to allow concurrent parallel access. Use
rcu read lock to safely lookup flow in tc flows hash table, and take
reference to it. Use rcu free for flow deletion to accommodate concurrent
stats requests.

Add new DELETED flow flag. Imlement new flow_flag_test_and_set() helper
that is used to set a flag and return its previous value. Use it to
atomically set the flag in mlx5e_delete_flower() to guarantee that flow can
only be deleted once, even when same flow is deleted concurrently by
multiple tasks.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 226f2ca3 08-Nov-2018 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Change flow flags type to unsigned long

To remove dependency on rtnl lock and allow concurrent modification of
'flags' field of tc flow structure, change flow flag type to unsigned long
and use atomic bit ops for reading and changing the flags. Implement
auxiliary functions for setting, resetting and getting specific flag, and
for checking most often used flag values.

Always set flags with smp_mb__before_atomic() to ensure that all
mlx5e_tc_flow are updated before concurrent readers can read new flags
value. Rearrange all code paths to actually set flow->rule[] pointers
before setting the OFFLOADED flag. On read side, use smp_mb__after_atomic()
when accessing flags to ensure that offload-related flow fields are only
read after the flags.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 5a7e5bcb 08-Nov-2018 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Extend tc flow struct with reference counter

With new classifier type that doesn't require rtnl lock, following
invariant holds:
- Filter with specified cookie created only once.
- Filter with specified cookie deleted only once.
- Stats updates can be performed in parallel to each other.

Extend tc flow with rcu and reference counter. To protect from concurrent
delete, get reference to tc flow when:
- Reading flow stats.
- Accessing flow in neigh update handler.
- Accessing flow in neigh update used value handler.

Only free flow when reference counter reached zero. Modify flow cleanup to
account for flows that could be not fully initialized by checking if flow
is actually in the list of corresponding mod_hdr, hairpin and encap
entries. Don't cleanup flow directly in case of error to allow concurrent
neigh update (neigh update will be modified to always take reference to
flow when using it).

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 90bb7692 06-Jul-2019 Ariel Levkovich <lariel@mellanox.com>

net/mlx5e: Prevent encap flow counter update async to user query

This patch prevents a race between user invoked cached counters
query and a neighbor last usage updater.

The cached flow counter stats can be queried by calling
"mlx5_fc_query_cached" which provides the number of bytes and
packets that passed via this flow since the last time this counter
was queried.
It does so by reducting the last saved stats from the current, cached
stats and then updating the last saved stats with the cached stats.
It also provide the lastuse value for that flow.

Since "mlx5e_tc_update_neigh_used_value" needs to retrieve the
last usage time of encapsulation flows, it calls the flow counter
query method periodically and async to user queries of the flow counter
using cls_flower.
This call is causing the driver to update the last reported bytes and
packets from the cache and therefore, future user queries of the flow
stats will return lower than expected number for bytes and packets
since the last saved stats in the driver was updated async to the last
saved stats in cls_flower.

This causes wrong stats presentation of encapsulation flows to user.

Since the neighbor usage updater only needs the lastuse stats from the
cached counter, the fix is to use a dedicated lastuse query call that
returns the lastuse value without synching between the cached stats and
the last saved stats.

Fixes: f6dfb4c3f216 ("net/mlx5e: Update neighbour 'used' state using HW flow rules counters")
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3d144578 11-Jul-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Allow dissector meta key in tc flower

Recently, fl_flow_key->indev_ifindex int field was refactored into
flow_dissector_key_meta field. With this, flower classifier also sets
FLOW_DISSECTOR_KEY_META flow dissector key. However, mlx5 flower dissector
validation code rejects filters that use flow dissector keys that are not
supported. Add FLOW_DISSECTOR_KEY_META to the list of allowed dissector
keys in __parse_cls_flower() to prevent following error when offloading
flower classifier to mlx5:

Error: mlx5_core: Unsupported key.

Fixes: 8212ed777f40 ("net: sched: cls_flower: use flow_dissector for ingress ifindex")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 075973c7 08-Jul-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Rely on filter_dev instead of dissector keys for tunnels

Currently, tunnel attributes are parsed and inner header matching is used
only when flow dissector specifies match on some of the supported
encapsulation fields. When user tries to offload tc filter that doesn't
match any encapsulation fields on tunnel device, mlx5 tc layer incorrectly
sets to match packet header keys on encap header (outer header) and
firmware rejects the rule with syndrome 0x7e1579 when creating new flow
group.

Change __parse_cls_flower() to determine whether tunnel is used based on
fitler_dev tunnel info, instead of determining it indirectly by checking
flow dissector enc keys.

Fixes: bbd00f7e2349 ("net/mlx5e: Add TC tunnel release action for SRIOV offloads")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d71f895c 26-Jun-2019 Eli Cohen <eli@mellanox.com>

net/mlx5e: Verify encapsulation is supported

When mlx5e_attach_encap() calls mlx5e_get_tc_tun() to get the tunnel
info data struct, check that returned value is not NULL, as would be in
the case of unsupported encapsulation.

Fixes: d386939a327d2 ("net/mlx5e: Rearrange tc tunnel code in a modular way")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f9e30088 09-Jul-2019 Pablo Neira Ayuso <pablo@netfilter.org>

net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload

And any other existing fields in this structure that refer to tc.
Specifically:

* tc_cls_flower_offload_flow_rule() to flow_cls_offload_flow_rule().
* TC_CLSFLOWER_* to FLOW_CLS_*.
* tc_cls_common_offload to tc_cls_common_offload.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f6455de0 28-Jun-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Refactor eswitch SR-IOV interface

Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF
can be at offload mode when SR-IOV is not enabled.

Rename the interface and eswitch mode names to decouple from SR-IOV,
and cleanup eswitch messages accordingly.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f6dc1264 24-Jun-2019 Paul Blakey <paulb@mellanox.com>

net/mlx5e: Disallow tc redirect offload cases we don't support

After changing the parent_id to be the same for both NICs of same
the hardware device, netdev_port_same_parent_id now returns true for
more cases (all the lower devices in the hierarchy are on the same
hardware device).

If merged eswitch isn't enabled, these cases aren't supported, so disallow
them.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# bb0ee7dc 25-Jun-2019 Jianbo Liu <jianbol@mellanox.com>

net/mlx5: Add flow context for flow tag

Refactor the flow data structures, add new flow_context and move
flow_tag into it, as flow_tag doesn't belong to the rule action.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b83c0730 01-Jun-2019 Raed Salem <raeds@mellanox.com>

net/mlx5e: Fix source port matching in fdb peer flow rule

The cited commit changed the initialization placement of the eswitch
attributes so it is done prior to parse tc actions function call,
including among others the in_rep and in_mdev fields which are mistakenly
reassigned inside the parse actions function.

This breaks the source port matching criteria of the peer redirect rule.

Fix by removing the now redundant reassignment of the already initialized
fields.

Fixes: 988ab9c7363a ("net/mlx5e: Introduce mlx5e_flow_esw_attr_init() helper")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 9272e3df 03-Apr-2019 Yevgeny Kliteynik <kliteyn@mellanox.com>

net/mlx5e: Geneve, Add support for encap/decap flows offload

Add HW offloading support for flows with Geneve encap/decap.

Notes about decap flows with Geneve TLV Options:
- Support offloading of 32-bit options data only
- At any given time, only one combination of class/type parameters
can be offloaded, but the same class/type combination can have
many different flows offloaded with different 32-bit option data
- Options with value of 0 can't be offloaded

Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d386939a 14-Apr-2019 Yevgeny Kliteynik <kliteyn@mellanox.com>

net/mlx5e: Rearrange tc tunnel code in a modular way

Rearrange tc tunnel code so that it would be easy to add future tunnels:
- Define tc tunnel object with the fields and callbacks that any
tunnel must implement.
- Define tc UDP tunnel object for UDP tunnels, such as VXLAN
- Move each tunnel code (GRE, VXLAN) to its own separate file
- Rewrite tc tunnel implementation in a general way - using only
the objects and their callbacks.

Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1f6da306 12-Feb-2019 Yevgeny Kliteynik <kliteyn@mellanox.com>

net/mlx5e: Geneve, Keep tunnel info as pointer to the original struct

In mlx5e encap entry structure, IP tunnel info data structure is copied
by value. This approach worked till now, but it breaks when there are
encapsulation options, such as in case of Geneve.

These options are stored in the structure that is allocated adjacent to
the IP tunnel info struct, and not pointed at by any field in that struct.
Therefore, when copying the struct by value, we loose the address of the
original struct and can't get to the encapsulation options.

Fix the problem by storing the pointer to the tunnel info data instead.

Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d4a18e16 30-Jan-2019 Yevgeny Kliteynik <kliteyn@mellanox.com>

net/mlx5e: Enable setting multiple match criteria for flow group

When filling in flow spec match criteria, to allow previous
modifications of the match criteria, use "|=" rather than "=".

Tunnel options are parsed before the match criteria of the offloaded
flow are being set. If the the flow that we're about to offload has
encapsulation options, the flow group might need to match on additional
criteria.

For Geneve, an additional flow group matching parameter should
be used - misc3. The appropriate bit in the match criteria is set
while parsing the tunnel options, so the criteria value shouldn't
be overwritten.

This is a pre-step for supporting Geneve TLV options offload.

Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d1bda7ee 06-May-2019 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net/mlx5e: Allow matching only enc_key_id/enc_dst_port for decapsulation action

In some case, we don't care the enc_src_ip and enc_dst_ip, and
if we don't match the field enc_src_ip and enc_dst_ip, we can use
fewer flows in hardware when revice the tunnel packets. For example,
the tunnel packets may be sent from different hosts, we must offload
one rule for each host.

$ tc filter add dev vxlan0 protocol ip parent ffff: prio 1 \
flower dst_mac 00:11:22:33:44:00 \
enc_src_ip Host0_IP enc_dst_ip 2.2.2.100 \
enc_dst_port 4789 enc_key_id 100 \
action tunnel_key unset action mirred egress redirect dev eth0_1

$ tc filter add dev vxlan0 protocol ip parent ffff: prio 1 \
flower dst_mac 00:11:22:33:44:00 \
enc_src_ip Host1_IP enc_dst_ip 2.2.2.100 \
enc_dst_port 4789 enc_key_id 100 \
action tunnel_key unset action mirred egress redirect dev eth0_1

If we support flows which only match the enc_key_id and enc_dst_port,
a flow can process the packets sent to VM which (mac 00:11:22:33:44:00).

$ tc filter add dev vxlan0 protocol ip parent ffff: prio 1 \
flower dst_mac 00:11:22:33:44:00 \
enc_dst_port 4789 enc_key_id 100 \
action tunnel_key unset action mirred egress redirect dev eth0_1

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e7739a60 12-May-2019 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Fix possible modify header actions memory leak

The cited commit could disable the modify header flag, but did not free
the allocated memory for the modify header actions. Fix it.

Fixes: 27c11b6b844cd ("net/mlx5e: Do not rewrite fields with the same match")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2ef86872 10-Apr-2019 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Fix no rewrite fields with the same match

With commit 27c11b6b844c ("net/mlx5e: Do not rewrite fields with the
same match") there are no rewrites if the rewrite value is the same as
the matched value. However, if the field is not matched, the rewrite is
also wrongly skipped. Fix it.

Fixes: 27c11b6b844c ("net/mlx5e: Do not rewrite fields with the same match")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 12d5cbf8 14-May-2019 Jianbo Liu <jianbol@mellanox.com>

net/mlx5e: Fix calling wrong function to get inner vlan key and mask

When flow_rule_match_XYZ() functions were first introduced,
flow_rule_match_cvlan() for inner vlan is missing.

In mlx5_core driver, to get inner vlan key and mask, flow_rule_match_vlan()
is just called, which is wrong because it obtains outer vlan information by
FLOW_DISSECTOR_KEY_VLAN.

This commit fixes this by changing to call flow_rule_match_cvlan() after
it's added.

Fixes: 8f2566225ae2 ("flow_offload: add flow_rule and flow_match structures and use them")
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0e1c1a2f 15-Apr-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Return error when trying to insert existing flower filter

With unlocked TC it is possible to have spurious deletes and inserts of
same filter. TC layer needs drivers to always return error when flow
insertion failed in order to correctly calculate "in_hw_count" for each
filter. Fix mlx5e_configure_flower() to return -EEXIST when TC tries to
insert a filter that is already provisioned to the driver.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 0bac1194 04-Mar-2019 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Replace TC VLAN pop with VLAN 0 rewrite in prio tag mode

Current ConnectX HW is unable to perform VLAN pop in TX path and VLAN
push on RX path. To workaround that limitation untagged packets are
tagged with VLAN ID 0x000 (priority tag) and pop/push actions are
replaced by VLAN re-write actions (which are supported by the HW).
Replace TC VLAN pop action with a VLAN priority tag header rewrite.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 27b942fb 29-Apr-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: Get rid of storing copy of device name

Currently mlx5 core stores copy of the PCI device name in a
mlx5_priv structure and uses pr_warn, pr_err helpers.

Get rid of the copy of this name; instead store the parent device
pointer that contains name as well as dma specific parameters.
This also allows to use kernel's well defined dev_warn, dev_err, dev_dbg
device specific print routines.

This is also a preparation patch to access non PCI parent device in
future.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 27c11b6b 17-Mar-2019 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Do not rewrite fields with the same match

If we have a match for the same value of a rewrite field, there is no
point for the rewrite. In order to save rewrite actions, and avoid
entirely rewrite actions (if all rewrites are the same), ignore such
rewrite fields.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 35a605db 14-Mar-2019 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Offload TC e-switch rules with ingress VLAN device

Offload TC rule on a VLAN device by matching the VLAN properties
of the VLAN device and emulating vlan pop actions.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 278748a9 16-Dec-2018 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Offload TC e-switch rules with egress VLAN device

Upon redirection to an uplink VLAN device, emulate vlan push actions
according to the VLAN properties of the VLAN device and redirect to
the uplink.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6fca9d1e 14-Mar-2019 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Allow VLAN rewrite of prio field with the same match

Changing the prio field of the VLAN is not supported. With
commit 37410902874c ("net/mlx5e: Support VLAN modify action") zero
value indicated "no-change". Allow the vid rewrite if the prio match
is the same as the prio set value.

Fixes: 37410902874c ("net/mlx5e: Support VLAN modify action")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# bf2f3bca 14-Mar-2019 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Deny VLAN rewrite if there is no VLAN header match

Rewrite of the packet in the VLAN offset may corrupt the packet if it's
not VLAN tagged. Deny the rewrite in this case.

Fixes: 37410902874c ("net/mlx5e: Support VLAN modify action")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 8377629e 19-Mar-2019 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Use helpers to get headers criteria and value pointers

The headers criteria and value pointers may be either of the inner
packet, if a tunnel exists, or of the outer. Simplify the code by using
helper functions to retrieve them.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2cc1cb1d 27-Feb-2019 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net/mlx5e: Return -EOPNOTSUPP when attempting to offload an unsupported action

* Now the encapsulation is not supported for mlx5 VFs. When we try to
offload that action, the -EINVAL is returned, but not -EOPNOTSUPP.
This patch changes the returned value and ignore to confuse user.
The command is shown as below [1].

* When max modify header action is zero, we return -EOPNOTSUPP
directly. In this way, we can ignore wrong message info (e.g.
"mlx5: parsed 0 pedit actions, can't do more"). This happens when
offloading pedit actions on mlx(cx4) VFs. The command is shown as below [2].

For example: (p2p1_0 is VF net device)
[1]
$ tc filter add dev p2p1_0 protocol ip parent ffff: prio 1 flower skip_sw \
src_mac e4:11:22:33:44:01 \
action tunnel_key set \
src_ip 1.1.1.100 \
dst_ip 1.1.1.200 \
dst_port 4789 id 100 \
action mirred egress redirect dev vxlan0

[2]
$ tc filter add dev p2p1_0 parent ffff: protocol ip prio 1 \
flower skip_sw dst_mac 00:10:56:fb:64:e8 \
dst_ip 1.1.1.100 src_ip 1.1.1.200 \
action pedit ex munge eth src set 00:10:56:b4:5d:20

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 20bb4a81 27-Feb-2019 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net/mlx5e: Deletes unnecessary setting of esw_attr->parse_attr

This patch deletes unnecessary setting of the esw_attr->parse_attr
to parse_attr in parse_tc_fdb_actions() because it is already done
by the mlx5e_flow_esw_attr_init() function.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6f9af8ff 27-Feb-2019 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net/mlx5e: Remove 'parse_attr' argument in parse_tc_fdb_actions()

This patch is a little improvement. Simplify the parse_tc_fdb_actions().

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7f1a546e 18-Mar-2019 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Consider tunnel type for encap contexts

The driver allocates an encap context based on the tunnel properties,
and reuse that context for all flows using the same tunnel properties.
Commit df2ef3bff193 ("net/mlx5e: Add GRE protocol offloading")
introduced another tunnel protocol other than the single VXLAN
previously supported. A flow that uses a tunnel with the same tunnel
properties but with a different tunnel type (GRE vs VXLAN for example)
would mistakenly reuse the previous alocated context, causing the
traffic to be sent with the wrong encapsulation. Fix that by
considering the tunnel type for encap contexts.

Fixes: df2ef3bff193 ("net/mlx5e: Add GRE protocol offloading")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 84be899f 26-Feb-2019 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net/mlx5e: Correctly use the namespace type when allocating pedit action

The capacity of FDB offloading and NIC offloading table are
different, and when allocating the pedit actions, we should
use the correct namespace type.

Fixes: c500c86b0c75d ("net/mlx5e: support for two independent packet edit actions")
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 8998576b 04-Feb-2019 Dmytro Linkin <dmitrolin@mellanox.com>

net/mlx5e: Allow IPv4 ttl & IPv6 hop_limit rewrite for all L4 protocols

For some protocols we are not allowing IP header rewrite offload, since
the HW is not capable to properly adjust the l4 checksum. However, TTL
& HOPLIMIT modification can be done for all IP protocols, because they
are not part of the pseudo header taken into account for checksum.

Fixes: 738678817573 ("drivers: net: use flow action infrastructure")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3616d08b 22-Mar-2019 David Ahern <dsahern@gmail.com>

ipv6: Move ipv6 stubs to a separate header file

The number of stubs is growing and has nothing to do with addrconf.
Move the definition of the stubs to a separate header file and update
users. In the move, drop the vxlan specific comment before ipv6_stub.

Code move only; no functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 76b496b1 21-Mar-2019 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Replace TC VLAN pop and push actions with VLAN modify

Changing the VLAN header may be implemented by pop the existing header
and push a new one. Translate those operations as VLAN modify.
Applicable for use cases such as OVS where the controller translates a
vlan modify meta (OF) rule to DP pop+push actions rule.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# bdc837ee 21-Mar-2019 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Support VLAN modify action

Support VLAN modify action by emulating a rewrite action for the VLAN
fields. Currently, the only supported field is the vid. The prio in the
action must be set to 0 to indicate no change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 0eb69bb9 21-Mar-2019 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Add VLAN ID rewrite fields

Add VLAN ID rewrite fields as a pre-step to support this rewrite.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 10fbb1cd 07-Feb-2019 Roi Dayan <roid@mellanox.com>

net/mlx5e: Set peer flow needed also for multipath

Update the predicate that determines if to duplicate rules installed on
vport reps to account also for the multipath case.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 68931c7d 05-Feb-2019 Roi Dayan <roid@mellanox.com>

net/mlx5e: Update check for merged eswitch device

The current check only validates if both netdevs use the same ops
which means both are vf reps or both uplink reps.

Unlike the case where the two uplinks are bonded (VF LAG), under
multipath scheme the switchdev parent id is not unified between the
uplink reps (and all the associated vf reps). However, we still want
to duplicate in the driver encap flows, adjust the merged eswitch
check for that matter.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 316d5f72 02-Jul-2018 Roi Dayan <roid@mellanox.com>

net/mlx5e: Always query offloaded tc peer rule counter

Under multipath when encap rules are duplicated to HW in the driver,
it's possible for one flow to be currently un-offloaded (e.g. lack of
next-hop route or neigh entry) while the other flow is offloaded. As
such, we move to query the counters of both flows at all times.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b4a23329 06-Feb-2019 Roi Dayan <roid@mellanox.com>

net/mlx5e: Re-attempt to offload flows on multipath port affinity events

Under multipath it's possible for us to offload the flow only through
the e-switch for which proper route through the uplink exists.
When the port is up and the next-hop route is set again we want to
offload through it as well.

We generate SW event from the FIB event handler when multipath port
affinity changes. The tc offloads code gets this event, goes over the
flows which were marked as of having missing route and attempts to
offload them.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ef06c9ee 05-Feb-2019 Roi Dayan <roid@mellanox.com>

net/mlx5e: Allow one failure when offloading tc encap rules under multipath

In a similar manner to uplink/VF LAG, under multipath we add encap peer
rule on the second port as well.

However, unlike the LAG case, we do want to allow failure for adding
one of the rules. This happens due to using a routing hint while doing
the route lookup when one path (next hop device) is down.

Introduce a new flag to indicate that route lookup failed for encap
flow. Note that a flow may still not be offloaded to hw due to missing
neighbour, in that case, the neigh update event will take care of it.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 95dc1902 19-Feb-2019 Roi Dayan <roid@mellanox.com>

net/mlx5e: Don't inherit flow flags on peer flow creation

Currently the peer flow inherits the flags from the original flow
after we've set it. At this time the flags are set according to
the flow state, e.g marked as going to slow path and such.

Even if not getting us to real bugs now, this opens the door to
get us to troubles later. Future proof the code and avoid the
inheritance, use the peer flags as were set on input when we
started adding the original flow.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 0ad060ee 12-Feb-2019 Roi Dayan <roid@mellanox.com>

net/mlx5e: Don't make internal use of errno to denote missing neigh

EAGAIN is treated as a specific case when we consider the attachment
successful but wait for neigh event before offloading the flow.
This can result in unwanted behavior when sub calls on the offloading
path will return EAGAIN and we pass this error up.

Instead of attaching to a specific error code return a boolean value
from the attach encap operation saying if the encap is valid or not.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 733d4f36 21-Feb-2019 Roi Dayan <roid@mellanox.com>

net/mlx5e: Cleanup attach encap function

Remove the tunnel info argument which we can get from the other args.
Also reorder the args to have input args first and output args later.

This patch doesn't change functionality.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e8763611 19-Feb-2019 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Remove unused variable ‘esw’

Fix the following compiler warning:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:2770:
warning: unused variable ‘esw’ [-Wunused-variable]

Fixes: 1cd3ab86b713 ("net/mlx5e: Introduce mlx5e_flow_esw_attr_init() helper")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7040632d 11-Feb-2019 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net/mlx5e: Remove 'parse_attr' argument in mlx5e_tc_add_fdb_flow()

This patch is a little improvement. Simplify the mlx5e_tc_add_fdb_flow().

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 988ab9c7 11-Feb-2019 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net/mlx5e: Introduce mlx5e_flow_esw_attr_init() helper

Introduce the mlx5e_flow_esw_attr_init() helper
for simplifying codes.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 73c718fb 12-Feb-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Remove wrong and superfluous tc pedit header type check

With recent introduction of flow_rule infrastructure drivers no longer
directly include action headers, so it is no longer possible to use
constants defined in them. Instead, one of flow_rule patches substituted
pedit action header constant with hardcoded value '2' in mlx5
set_pedit_val() function conditional which verifies that header type is in
range of values allowed by pedit action. That conditional is now both
wrong (hardcoded value is '2' but __PEDIT_HDR_TYPE_MAX is 6 in current
version) and superfluous (pedit action already verifies that header type is
in allowed range during init). Remove the described check from mlx5 code.

Fixes: 738678817573 ("drivers: net: use flow action infrastructure")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b05af6aa 12-Feb-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Normalize the name of uplink vport number

Driver used to name uplink vport as FDB_UPLINK_VPORT, it's hard to
comply with the same naming convention along with the introduction of
other vports. Use MLX5_VPORT as the prefix for such vports and
relocate the uplink vport definition to public header file for the
benefits of both net and IB drivers.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 218d05ce 28-Jan-2019 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net/mlx5e: Don't overwrite pedit action when multiple pedit used

In some case, we may use multiple pedit actions to modify packets.
The command shown as below: the last pedit action is effective.

$ tc filter add dev netdev_rep parent ffff: protocol ip prio 1 \
flower skip_sw ip_proto icmp dst_ip 3.3.3.3 \
action pedit ex munge ip dst set 192.168.1.100 pipe \
action pedit ex munge eth src set 00:00:00:00:00:01 pipe \
action pedit ex munge eth dst set 00:00:00:00:00:02 pipe \
action csum ip pipe \
action tunnel_key set src_ip 1.1.1.100 dst_ip 1.1.1.200 dst_port 4789 id 100 \
action mirred egress redirect dev vxlan0

To fix it, we add max_mod_hdr_actions to mlx5e_tc_flow_parse_attr struction,
max_mod_hdr_actions will store the max pedit action number we support and
num_mod_hdr_actions indicates how many pedit action we used, and store all
pedit action to mod_hdr_actions.

Fixes: d79b6df6b10a ("net/mlx5e: Add parsing of TC pedit actions to HW format")
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6dcfa234 06-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

net/mlx5e: Implement ndo_get_port_parent_id()

mlx5e only supports SWITCHDEV_ATTR_ID_PORT_PARENT_ID, which makes it a
great candidate to be converted to use the ndo_get_port_parent_id() NDO
instead of implementing switchdev_port_attr_get().

Since mlx5e makes use of switchdev_port_parent_id() convert it to use
netdev_port_same_parent_id().

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 73867881 01-Feb-2019 Pablo Neira Ayuso <pablo@netfilter.org>

drivers: net: use flow action infrastructure

This patch updates drivers to use the new flow action infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3b1903ef 01-Feb-2019 Pablo Neira Ayuso <pablo@netfilter.org>

flow_offload: add statistics retrieval infrastructure and use it

This patch provides the flow_stats structure that acts as container for
tc_cls_flower_offload, then we can use to restore the statistics on the
existing TC actions. Hence, tcf_exts_stats_update() is not used from
drivers anymore.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c500c86b 01-Feb-2019 Pablo Neira Ayuso <pablo@netfilter.org>

net/mlx5e: support for two independent packet edit actions

This patch adds pedit_headers_action structure to store the result of
parsing tc pedit actions. Then, it calls alloc_tc_pedit_action() to
populate the mlx5e hardware intermediate representation once all actions
have been parsed.

This patch comes in preparation for the new flow_action infrastructure,
where each packet mangling comes in an separated action, ie. not packed
as in tc pedit.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f256622 01-Feb-2019 Pablo Neira Ayuso <pablo@netfilter.org>

flow_offload: add flow_rule and flow_match structures and use them

This patch wraps the dissector key and mask - that flower uses to
represent the matching side - around the flow_match structure.

To avoid a follow up patch that would edit the same LoCs in the drivers,
this patch also wraps this new flow match structure around the flow rule
object. This new structure will also contain the flow actions in follow
up patches.

This introduces two new interfaces:

bool flow_rule_match_key(rule, dissector_id)

that returns true if a given matching key is set on, and:

flow_rule_match_XYZ(rule, &match);

To fetch the matching side XYZ into the match container structure, to
retrieve the key and the mask with one single call.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1651925d 28-Jan-2019 Guy Shattah <sguy@mellanox.com>

net/mlx5e: Use the inner headers to determine tc/pedit offload limitation on decap flows

In packets that need to be decaped the internal headers
have to be checked, not the external ones.

Fixes: bdd66ac0aeed ("net/mlx5e: Disallow TC offloading of unsupported match/action combinations")
Signed-off-by: Guy Shattah <sguy@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6363651d 10-Jan-2019 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Properly set steering match levels for offloaded TC decap rules

The match level computed by the driver gets to be wrong for decap
rules with wildcarded inner packet match such as:

tc filter add dev vxlan_sys_4789 protocol all parent ffff: prio 2 flower
enc_dst_ip 192.168.0.9 enc_key_id 100 enc_dst_port 4789
action tunnel_key unset
action mirred egress redirect dev eth1

The FW errs for a missing matching meta-data indicator for the outer
headers (where we do have a match), and a wrong matching meta-data
indicator for the inner headers (where we don't have a match).

Fix that by taking into account the matching on the tunnel info and
relating the match level of the encapsulated packet to the firmware
inner headers indicator in case of decap.

As for vxlan we mandate a match on the tunnel udp dst port, and in general
we practically madndate a match on the source or dest ip for any IP tunnel,
the fix was done in a minimal manner around the tunnel match parsing code.

Fixes: d708f902989b ('net/mlx5e: Get the required HW match level while parsing TC flow matches')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 71129676 20-Jan-2019 Jason Gunthorpe <jgg@ziepe.ca>

net/mlx5e: Return the allocated flow directly from __mlx5e_add_fdb_flow

This confusing construction confuses the compiler which can't see
that flow is initialized if !err:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function `mlx5e_configure_flower`
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:2727:28: warning:
`flow` may be used uninitialized in this function
[-Wmaybe-uninitialized]

There is no reason for two function outputs, just return the
pointer directly and use ERR_PTR to encode a failure.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2534f14a 23-Dec-2018 Julia Lawall <Julia.Lawall@lip6.fr>

net/mlx5e: drop useless LIST_HEAD

Drop LIST_HEAD where the variable it declares is never used.

These became useless in 244cd96adb5f ("net_sched: remove list_head
from tc_action")

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
... when != x
// </smpl>

Fixes: 244cd96adb5f ("net_sched: remove list_head from tc_action")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a0646c88 19-Dec-2018 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Fail attempt to offload e-switch TC flows with egress upper devices

We use the switchdev parent HW id helper to identify if the mirred device
shares the same ASIC/port with the ingress device. This can get us wrong
in the presence of upper devices such as vlan or bridge set over the HW
devices (VF or uplink representors), b/c the switchdev ID is retrieved
recursively.

To fail offload attempts in such cases, we condition the check on the
egress device to have not only the same switchdev ID but also the relevant
mlx5 netdev ops.

Fixes: 03a9d11e6eeb ('net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads')
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d9ee0491 13-Feb-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Use dedicated uplink vport netdev representor

Currently, when running in sriov switchdev mode, we are using the PF
netdevice as the uplink representor, this is problematic from few aspects:

- will break when the PF isn't eswitch manager (e.g smart NIC env)
- misalignment with other NIC switchdev drivers
- makes us have and maintain special code, hurts the driver quality/robustness
- which in turn opens the door for future bugs

As of each and all of the above, we move to have a dedicated netdev representor
for the uplink vport in a similar manner done for for the VF vports.

This includes the following:

1. have an uplink rep netdev as we have for VF reps
2. all reps use same load/unload functions
3. HW stats for uplink based on physical port counters and not vport counters
4. link state for the uplink managed through PAOS and not vport state
5. the uplink rep has sysfs link to the PF PCI function && uses the PF MAC address

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7c34ec19 23-Aug-2018 Aviv Heller <avivh@mellanox.com>

net/mlx5: Make RoCE and SR-IOV LAG modes explicit

With the introduction of SR-IOV LAG, checking whether LAG is active
is no longer good enough, since RoCE and SR-IOV LAG each entails
different behavior by both the core and infiniband drivers.

This patch introduces facilities to discern LAG type, in addition to
mlx5_lag_is_active(). These are implemented in such a way as to allow
more complex mode combinations in the future.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1418ddd9 02-Oct-2018 Aviv Heller <avivh@mellanox.com>

net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG

Under uplink LAG, packets that match a flow might arrive on both uplink
ports and transmitted through both as part of supporting aggregation and
high-availability.

When the netdevs representing the uplinks are set into LAG (bonding,
teaming), duplicate the TC flow offloading into each of the per-uplink
e-switches.

Duplication is not required if the source is the bond device, since in
this case it is assumed that the bond and the uplink netdevs share the
same TC block, and thus duplication will occur naturally by the stack.

Note that under encapsulation scheme, both flows will use the same
neighbour and hence both will contribute to the last-used feedback
towards the stack.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7ba58ba7 06-Jun-2018 Rabie Loulou <rabiel@mellanox.com>

net/mlx5e: Offload TC e-switch rules with egress LAG device

When parsing TC FDB actions, if the egress device is a bond/team
net-device which enslaved the uplink representor of the e-switch,
use the uplink representor as the destination in the HW rule.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f9392795 30-Nov-2017 Shahar Klein <shahark@mellanox.com>

net/mlx5e: Enhance flow counter scheme for offloaded TC eswitch rules

Assign a counter dev attribute according to device capability and use
it for management of counters related to offloaded eswitch TC flows.

With upcoming support for uplink LAG, we have two HW rules per one
logical SW (TC) rule. Although the HW supports attaching one counter
to multiple rules, we are allocating counter per HW rule.

We need this separation for two reasons:

1. "flow eswitch" counter affinity HW require the counter to be
allocated on the device where the eswitch rule is set.

2. for some use-cases (multi-path routing) each HW flow relates to
different neighbour, hence our neigh update logic must have a per-rule
HW accountant in order to provide the proper feedback to the kernel.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 04de7dda 11-Nov-2018 Roi Dayan <roid@mellanox.com>

net/mlx5e: Infrastructure for duplicated offloading of TC flows

Under uplink LAG or multipath schemes, traffic that matches one flow
might arrive on both uplink ports and transmitted through both
as part of supporting aggregation and high-availability.

To cope with the fact that the SW model might use logical SW port
(e.g uplink team or bond) but we have two HW ports with e-switch on
each, there are cases where in order to offload a SW TC rule we
need to duplicate it to two HW flows.

Since each HW rule has its own counter we also aggregate the counter
of both rules when a flow stats query is executed from user-space.

Introduce the changes for the different elements (add/delete/stats),
currently nothing is duplicated.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 61c806da 10-Dec-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Avoid encap flows deletion attempt the 1st time a neigh is resolved

Currently, we are deleting offloaded encap flows in case the relevant neigh
becomes unconnected while the encap is valid (a sign that it used to be
connected), or if the curr neigh mac is different from the cached mac
(a sign that the remote side changed their mac).

The 2nd check also applies when the neigh becomes connected on the 1st
time (we start with zero mac). Before the offending commit, the deleting
handler was practically no op, as no flows were offloaded. But since
that commit, we offload neigh-less encap flows to slow path.

Under mirroring scheme, we go into the delete handler, attempt to unoffload a
mirror rule which was never set (as we were offloading to slow path) and crash.

Fix that by calling the delete handler only when the encap is valid,
which covers both cases mentioned above.

Fixes: 5dbe906ff1d5 ('net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 154e62ab 09-Dec-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Properly initialize flow attributes for slow path eswitch rule deletion

When a neighbour is resolved, we delete the goto slow path rule from HW.

The eswitch flow attributes where not properly initialized on that case,
hence we mess up the eswitch refcounts for chain zero (the default one).

Fix that along with making sure to use semicolons and not commas on that code;

Fixes: 5dbe906ff1d5 ('net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d14f6f2a 09-Dec-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Avoid overriding the user provided priority for offloaded tc rules

Just a leftover which was wrongly left there, remove it while spawning
a message to suggest firmware upgrade.

Fixes: bf07aa730a04 ('net/mlx5e: Support offloading tc priorities and chains for eswitch flows')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e88afe75 09-Dec-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Err if asked to mirror a goto chain tc eswitch rule

Currently we are not supporting this and not err-ing on that either.

For now, just err if asked to do that.

Fixes: bf07aa730a04 ('net/mlx5e: Support offloading tc priorities and chains for eswitch flows')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 8c4dc42b 18-Nov-2018 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Support multiple encapsulations for a TC flow

Currently a flow is associated with a single encap structure. The FW
extended destination features enables the driver to associate a flow
with multiple encap instances.

Change the encap id field from a flow scope to a per destination value
in the flow attributes struct. Use the encaps array to associate a flow
table entry with multiple encap entries.

Update the neigh logic to offload only if all encapsulations used in a
flow are connected, and un-offload upon the first one disconnected.

Note that the driver can now support up to two encap destinations.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 79baaec7 02-Dec-2018 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Allow association of a flow to multiple encaps

Currently a flow can be associated with a single encap entry. The
extended destination feature enables the driver to configure multiple
encap entries per flow.

Change the encap flow association field to array as a pre-step towards
supporting multiple encap destinations. Use only the first array
element, with no functional change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 98b66cb1 18-Nov-2018 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Change parse attr struct to accommodate multiple tunnel infos

Currently the driver can support only a single TC tunnel_set action.
Change the tunnel info fields to arrays, as a pre-step to support
multiple encapsulations for a single flow, with no functional change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1cc26d74 25-Nov-2018 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Support header rewrite actions with remote port mirroring

A rule with the following actions is split to a two level FDB:
1. Forward to local mirror vport
2. Header rewrite
3. Forward to local vport
In the first level flow table, forward the packet to the local port and
forward the packet to the second level flow table for header rewrite and
local port forwarding. This configuration fails when mirroring to a
remote encapsulated destination because currently an FTE cannot support
encap and table destinations.

Use the extended destination capabilities to configure the first level
flow table with a multi-destination FTE to the uplink and second level
table and the second level flow table for the header rewrite and local
port forwarding.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 38c9d269 22-Nov-2018 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Replace the split logic with extended destination

Currently the FTE encap flag applies to all destinations.
To support mirroring encapsulated traffic to a local port the driver
split the two destinations to two flow table entries:
Table#0: - FWD to the local vport
- Goto table#1
Table#1: - Encap and FWD to wire
The firmware extended destination capabilities enable the driver to set
an encapsulation flag per destination.

Remove the split logic and use the extended destination mechanism
instead.

Note that split technique is still required for pedit and VLAN push
scenarios.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f493f155 01-Dec-2018 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Move flow attr reformat action bit to per dest flags

Flow attr reformat action bit is moved from the global action bits to a
per destination flags field, as a pre-step for adding additional flags
to support encapsulation properties per destination, with no
functionality change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# df65a573 01-Dec-2018 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Refactor eswitch flow attr for destination specific properties

Currently the eswitch flow attr structure stores each destination
specific property in its own specific array.
Group them in an array of destination structures as a pre-step towards
adding additional destination specific field properties.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e85e02ba 23-Nov-2018 Eli Britstein <elibr@mellanox.com>

net/mlx5: E-Switch, Rename esw attr mirror count field

The mirror count esw attributes field is used to determine if splitting
the rule to two FTEs is required while programming e-switch mirroring.
Rename it to split count, making it clearer with no functional change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 101f4de9 02-Dec-2018 Oz Shlomo <ozsh@mellanox.com>

net/mlx5e: Move TC tunnel offloading code to separate source file

Move tunnel offloading related code to a separate source file for better
code maintainability.

Code refactoring with no functional change.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 54c177ca 14-Nov-2018 Oz Shlomo <ozsh@mellanox.com>

net/mlx5e: Branch according to classified tunnel type

Currently the tunnel offloading encap/decap methods assumes that VXLAN
is the sole tunneling protocol. Lay the infrastructure for supporting
multiple tunneling protocols by branching according to the tunnel
net device kind.

Encap filters tunnel type is determined according to the egress/mirred
net device. Decap filters classify the tunnel type according to the
filter's ingress net device kind.

Distinguish between the tunnel type as defined by the SW model and
the FW reformat type that specifies the HW operation being made.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 4d70564d 14-Nov-2018 Oz Shlomo <ozsh@mellanox.com>

net/mlx5e: Refactor VXLAN tunnel decap offloading code

Separates the vxlan header match handling from the matching on the
general fields of ipv4/6 tunnels, thus allowing the common IP tunnel
match code to branch in down stream patch, to multiple IP tunnels.

This patch doesn't add any functionality.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ea7162ac 14-Nov-2018 Oz Shlomo <ozsh@mellanox.com>

net/mlx5e: Refactor VXLAN tunnel encap offloading code

Separates the vxlan header encap logic from the general ipv4/6
encapsulation methods, thus allowing the common IP encap/decap code to
branch in downstream patch to multiple IP tunnels.

Code refactoring with no functional change.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ef381359 28-Oct-2018 Oz Shlomo <ozsh@mellanox.com>

net/mlx5e: Replace egdev with indirect block notifications

Use TC indirect block notifications to offload filters that
are configured on higher level device interfaces (e.g. tunnel
devices). This mechanism replaces the current egdev implementation.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d11afc26 28-Oct-2018 Oz Shlomo <ozsh@mellanox.com>

net/mlx5e: Propagate the filter's net device to mlx5e structures

Propagate the filter's net_device parameter to the tc flower parsed
attributes structure so that it can later be used in tunnel decap
offloading sequences.

Pre-step for replacing egdev logic with the indirect block
notification mechanism.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 71d82d2a 28-Oct-2018 Oz Shlomo <ozsh@mellanox.com>

net/mlx5e: Provide the TC filter netdev as parameter to flower callbacks

Currently the driver controls flower filters that are installed on its
devices. However, with the introduction of the indirect block
notifications platform the driver may receive control events for filters
that are installed on higher level net devices (e.g. tunnel devices).
Therefore, the driver filter control API will not be able to implicitly
assume the filter's net device.

Explicitly specify the filter's net device, no functional change

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f5bc2c5d 28-Oct-2018 Oz Shlomo <ozsh@mellanox.com>

net/mlx5e: Support TC indirect block notifications for eswitch uplink reprs

Towards using this mechanism as the means to offload tunnel decap rules
set on SW tunnel devices instead of egdev, add the supporting structures
and functions.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ec1366c2 25-Oct-2018 Oz Shlomo <ozsh@mellanox.com>

net/mlx5e: Store eswitch uplink representor state on a dedicated struct

Currently only a single field in the representor private structure
is relevant for uplink representors. As a pre-step to allow adding
additional uplink representor fields, introduce uplink representor
private structure.

This is prepration step towards replacing egdev logic with the
indirect block notification mechanism. This patch doesn't change
any functionality.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# bbeb53b8 06-Nov-2018 Aya Levin <ayal@mellanox.com>

net/mlx5e: Move RSS params to a dedicated struct

Remove RSS params from params struct under channels, and introduce
a new struct with RSS configuration params under priv struct. There is
no functional change here.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d930ac79 28-Oct-2018 Aya Levin <ayal@mellanox.com>

net/mlx5e: Refactor TIR configuration function

Refactor mlx5e_build_indir_tir_ctx_hash for better code re-use. TIR
stands for Transport Interface Receive, which is responsible for all
transport related operations on the receive side. Added a
static array with TIR default configuration values. This separates
configuration values from command setting, which is needed for
downstream patch.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1392f44b 23-Oct-2018 Roi Dayan <roid@mellanox.com>

net/mlx5e: Apply the correct check for supporting TC esw rules split

The mirror and not the output count is the one denoting a split.
Fix to condition the offload attempt on the mirror count being > 0
along the firmware to have the related capability.

Fixes: 592d36515969 ("net/mlx5e: Parse mirroring action for offloaded TC eswitch flows")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Yossi Kuperman <yossiku@mellanox.com>
Reviewed-by: Chris Mi <chrism@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 83621b7d 27-Oct-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Always use the match level enum when parsing TC rule match

We get the match level (none, l2, l3, l4) while going over the match
dissectors of an offloaded tc rule. When doing this, the match level
enum and the not min inline enum values should be used, fix that.

This worked accidentally b/c both enums have the same numerical values.

Fixes: d708f902989b ('net/mlx5e: Get the required HW match level while parsing TC flow matches')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d3a80bb5 25-Oct-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Don't match on vlan non-existence if ethertype is wildcarded

For the "all" ethertype we should not care whether the packet has
vlans. Besides being wrong, the way we did it caused FW error
for rules such as:

tc filter add dev eth0 protocol all parent ffff: \
prio 1 flower skip_sw action drop

b/c the matching meta-data (outer headers bit in struct mlx5_flow_spec)
wasn't set. Fix that by matching on vlan non-existence only if we were
also told to match on the ethertype.

Fixes: cee26487620b ('net/mlx5e: Set vlan masks for all offloaded TC rules')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# bf07aa73 02-Sep-2018 Paul Blakey <paulb@mellanox.com>

net/mlx5e: Support offloading tc priorities and chains for eswitch flows

Currently we fail when user specify a non-zero chain, this patch adds the
support for it and tc priorities. To get to a new chain, use the tc
goto action.

Currently we support a fixed prio range 1-16, and chain range 0-3.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 5dbe906f 01-Aug-2018 Paul Blakey <paulb@mellanox.com>

net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available

When adding a vxlan tc rule, and a neighbour isn't available, we
don't insert any rule to hardware. Once we enable offloading flows
with multiple priorities, a packet that should have matched this rule
will continue in hardware pipeline and might match a wrong one.

This is unlike in tc software path where it will be matched and
forwarded to the vxlan device (which will cause a ARP lookup
eventually) and stop processing further tc filters.

To address that, when when a neighbour isn't available (EAGAIN from
attach_encap), or gets deleted, change the original action to be a
forward to slow path instead. Neighbour update will restore the original
action once the neighbour becomes available. This will be done atomically
so at any given time we will have a the correct match.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6d2a3ed0 04-Sep-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Avoid duplicated code for tc offloads add/del fdb rule

The code for adding/deleting fdb flow is repeated when
user-space does flow add/del and when we add/del from
the neigh update path - unify them to avoid the duplication.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 42f7ad67 02-Sep-2018 Paul Blakey <paulb@mellanox.com>

net/mlx5e: For TC offloads, always add new flow instead of appending the actions

When replacing a tc flower rule, flower first requests to add the
new rule (new action), then deletes the old one.
But currently when asked to add a new tc flower flow, we append the
actions (and counters to it).

This can result in a fte with two flow counters or conflicting
actions (drop and encap action) which firmware complains/errs
about and isn't achieving what the user aimed for.

Instead, insert the flow using the new no-append flag which will add a
new HW rule, the old flow and rule will be deleted later by flower

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanmox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d5634fee 19-Sep-2018 Paul Blakey <paulb@mellanox.com>

net/mlx5: Add a no-append flow insertion mode

If no-append flag is set, we will add a new FTE, instead of appending
the actions of the inserted rule when the same match already exists.

While here, move the has_flow_tag boolean indicator to be a flag too.

This patch doesn't change any functionality.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanmox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e52c2802 02-Jul-2018 Paul Blakey <paulb@mellanox.com>

net/mlx5: E-Switch, Add chains and priorities

A chain is a group of priorities, so use the fdb parallel
sub namespaces to implement chains, and a flow table for each
priority in them.

Because these namespaces are parallel and in series to the slow path
fdb, the chains aren't connected to one another (but to the slow path),
and one must use a explicit goto action to reach a different chain.

Flow tables for the priorities will be created on demand and destroyed
once not used.

The Firmware has four pools of tables for sizes S/XS/M/L (4k, 64k, 1m, 4m).
We maintain ghost copies of the pools occupancy.

When a new table is to be created, we scan the pools from large to small
and find the 1st table size which can be now created. When a table is
destroyed, we update the relevant pool.

Multi chain/prio isn't enabled yet by this patch, for now all flows
will use the default chain 0, and prio 1.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 48265006 20-Sep-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: E-Switch, Have explicit API to delete fwd rules

Be symmetric with the e-switch API to add rules which has a
specific function to add fwd rules which are used as part of
vport mirroring.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a88780a9 14-May-2018 Roi Dayan <roid@mellanox.com>

net/mlx5e: Split TC add rule path for nic vs e-switch

Move to have clear separation on the code path to add nic vs e-switch
flows. While here we break the code that deals with adding offloaded
TC tool to few smaller stages, each on helper function.

Besides getting us simpler and readable code, these are pre-steps
for being able to have two HW flows serving one SW TC flow for some
e-switch use cases.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c83954ab 15-Oct-2017 Rabie Loulou <rabiel@mellanox.com>

net/mlx5e: Change return type of tc add flow functions

Refactor the flow add utility functions to return err code instead of rule
pointers. This will allow for simpler logic when one tc rule is
duplicated to two HW rules in downstream patches.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 171c7625b 02-Oct-2018 Mark Bloch <markb@mellanox.com>

net/mlx5: Use flow counter IDs and not the wrapping cache object

Currently, when a flow rule is created using the FS core layer, the caller
has to pass the entire flow counter object and not just the counter HW
handle (ID). This requires both the FS core and the caller to have
knowledge about the inner implementation of the FS layer flow counters
cache and limits the possible users.

Move to use the counter ID across the place when dealing with flows.

Doing this decoupling, now can we privatize the inner implementation
of the flow counters.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b8aee822 02-Oct-2018 Mark Bloch <markb@mellanox.com>

net/mlx5: E-Switch, Get counters for offloaded flows from callers

There's no real reason for the e-switch logic to manage the creation of
counters for offloaded flows. The API already has the directive for the
caller to denote they want to attach a counter to the created flow.
As such, we go and move the management of flow counters to the mlx5e
tc offload logic. This also lets us remove an inelegant interface where
the FS layer had to provide a way to retrieve a counter from a flow rule.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e98bedf5 15-Aug-2018 Eli Britstein <elibr@mellanox.com>

net/mlx5e: Add extack messages for TC offload failures

Return tc extack messages for failures to user space.
Messages provide reasons for not being able to offload rules to HW.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 59c9d35e 05-Sep-2018 Alaa Hleihel <alaa@mellanox.com>

net/mlx5: Cache the system image guid

The system image guid is a read-only field which is used by the TC
offloads code to determine if two mlx5 devices belong to the same
ASIC while adding flows.

Read this once and save it on the core device rather than querying each
time an offloaded flow is added.

Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# cee26487 24-Aug-2018 Jianbo Liu <jianbol@mellanox.com>

net/mlx5e: Set vlan masks for all offloaded TC rules

In flow steering, if asked to, the hardware matches on the first ethertype
which is not vlan. It's possible to set a rule as follows, which is meant
to match on untagged packet, but will match on a vlan packet:
tc filter add dev eth0 parent ffff: protocol ip flower ...

To avoid this for packets with single tag, we set vlan masks to tell
hardware to check the tags for every matched packet.

Fixes: 095b6cfd69ce ('net/mlx5e: Add TC vlan match parsing')
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 4d8fcf216 05-Sep-2018 Alaa Hleihel <alaa@mellanox.com>

net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules

If the peer device was already unbound, then do not attempt to modify
it's resources, otherwise we will crash on dereferencing non-existing
device.

Fixes: 5c65c564c962 ("net/mlx5e: Support offloading TC NIC hairpin flows")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 31ca3648 28-Aug-2018 Mark Bloch <markb@mellanox.com>

net/mlx5: Pass a namespace for packet reformat ID allocation

Currently we attach packet reformat actions only to the FDB namespace.
In preparation to be able to use that for NIC steering, pass the actual
namespace as a parameter.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 60786f09 28-Aug-2018 Mark Bloch <markb@mellanox.com>

{net, RDMA}/mlx5: Rename encap to reformat packet

Renames all encap mlx5_{core,ib} code to use the new naming of packet
reformat. This change doesn't introduce any function change and is
needed to properly reflect the operation being done by this action.
For example not only can we encapsulate a packet, but also decapsulate it.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# e0e7a386 28-Aug-2018 Mark Bloch <markb@mellanox.com>

net/mlx5: Move header encap type to IFC header file

Those bits are hardware specification and should be defined in the
IFC header file.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 244cd96a 19-Aug-2018 Cong Wang <xiyou.wangcong@gmail.com>

net_sched: remove list_head from tc_action

After commit 90b73b77d08e, list_head is no longer needed.
Now we just need to convert the list iteration to array
iteration for drivers.

Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 816f6706 08-Aug-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Properly check if hairpin is possible between two functions

The current check relies on function BDF addresses and can get
us wrong e.g when two VFs are assigned into a VM and the PCI
v-address is set by the hypervisor.

Fixes: 5c65c564c962 ('net/mlx5e: Support offloading TC NIC hairpin flows')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Alaa Hleihel <alaa@mellanox.com>
Tested-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a3e67366 19-May-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Issue direct lookup on vxlan ports by vport representors

Remove uplink representor netdevice private structure lookup, and use
mlx5 core handle directly from representor private structure to lookup
vxlan ports.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>


# 358aa5ce 09-May-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Vxlan, move vxlan logic to core driver

Move vxlan logic and objects to mlx5 core dirver.
Since it going to be used from different mlx5 interfaces.
e.g. mlx5e PF NIC netdev and mlx5e E-Switch representors.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>


# a3c785d7 08-May-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Vxlan, rename from mlx5e to mlx5

Rename vxlan functions from mlx5e_vxlan_* to mlx5_vxlan_*.
Rename mlx5e_vxlan_db to mlx5_vxlan and move it from en.h to vxlan.c
since it is not related to mlx5e anymore.

Allocate mlx5_vxlan structure dynamically in order to make it easier to
move later to core driver and to make it private in vxlan.c.

This is in preparation to move vxlan API to mlx5 core.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>


# bcef735c 24-Jul-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Offload TC matching on tos/ttl for ip tunnels

Enable offloading of TC matching on tos/ttl for ipv4/6 tunnels.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f35f800d 24-Jul-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Support setup of tos and ttl for tunnel key TC action offload

Use the values provided by user-space for the encapsulation headers.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6360cd62 24-Jul-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Use ttl from route lookup on tc encap offload only if needed

Currnetly, the ttl for the encapsulation headers is taken from the
route lookup result. As a pre-step to allow for an offload case when
the user specifies the ttl, take it from the route lookup only if
not zero. While here, also move to use u8 instead int for the ttl.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cc495188 25-Apr-2018 Jianbo Liu <jianbol@mellanox.com>

net/mlx5e: Support offloading double vlan push/pop tc actions

As we can configure two push/pop actions in one flow table entry,
add support to offload those double vlan actions in a rule to HW.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1482bd3d 02-Jul-2018 Jianbo Liu <jianbol@mellanox.com>

net/mlx5e: Refactor tc vlan push/pop actions offloading

Extract actions offloading code to a new function, and also extend data
structures for double vlan actions.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 699e96dd 01-May-2018 Jianbo Liu <jianbol@mellanox.com>

net/mlx5e: Support offloading tc double vlan headers match

We can match on both outer and inner vlan tags, add support for
offloading that.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c7f7ba8d 04-Dec-2017 Roi Dayan <roid@mellanox.com>

net/mlx5e: Remove redundant WARN when we cannot find neigh entry

It is possible for neigh entry not to exist if it was cleaned already.
When we bring down an interface the neigh gets deleted but it could be
that our listener for neigh event to clear the encap valid bit didn't
start yet and the neigh update last used work is started first.
In this scenario the encap entry has valid bit set but the neigh entry
doesn't exist.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7e29392e 12-Jul-2018 Roi Dayan <roid@mellanox.com>

net/mlx5e: Only allow offloading decap egress (egdev) flows

We get egress rules through the egdev mechanism when the ingress device
is not supporting offload, with the expected use-case of tunnel decap
ingress rule set on shared tunnel device.

Make sure to offload egress/egdev rules only if decap action (tunnel key
unset) exists there and err otherwise.

Fixes: 717503b9cf57 ("net: sched: convert cls_flower->egress_dev users to tc_setup_cb_egdev infra")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 01252a27 22-May-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Get the number of offloaded TC rules from the correct table

As we keep the offloaded TC rules for NIC and e-switch in two different
places, make sure to return the number of offloaded flows according
to the use-case and not blindly from the priv.

Fixes: 655dc3d2b91b ('net/mlx5e: Use shared table for offloaded TC eswitch flows')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e4ad91f2 16-May-2018 Chris Mi <chrism@mellanox.com>

net/mlx5e: Split offloaded eswitch TC rules for port mirroring

If a TC rule needs to be split for mirroring, create two HW rules,
in the first level and the second level flow tables accordingly.

In the first level flow table, forward the packet to the mirror
port and forward the packet to the second level flow table for
further processing, eg. encap, vlan push or header re-write.

Currently the matching is repeated in both stages.

While here, simplify the setup of the vhca id valid indicator also
in the existing code.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 592d3651 03-May-2018 Chris Mi <chrism@mellanox.com>

net/mlx5e: Parse mirroring action for offloaded TC eswitch flows

Currently, we only support the mirred redirect TC sub-action. In order
to support flow based vport mirroring, add support to parse the mirred
mirror sub-action.

For mirroring, user-space will typically set the action order such that
the mirror port (mirror VF) sees packets as the original port (VF under
mirroring) sent them or as it will receive them.

In the general case, it means that packets are potentially sent to the
mirror port before or after some actions were applied on them. To
properly do that, we should follow on the exact action order as set for
the flow and make sure this will also be the case when we program the HW
offload.

We introduce a counter for the output ports (attr->out_count), which we
increase when parsing each mirred redirect/mirror sub-action and when
dealing with encap.

We introduce a counter (attr->mirror_count) telling us if split is
needed. If no split is needed and mirroring is just multicasting to
vport, the mirror count is zero, all the actions of the TC flow should
apply on that single HW flow.

If split is needed, the mirror count tells where to do the split, all
non-mirred tc actions should apply only after the split.

The mirror count is set while parsing the following actions encap/decap,
header re-write, vlan push/pop.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2c81bfd5 22-Feb-2018 Huy Nguyen <huyn@mellanox.com>

net/mlx5e: Move port speed code from en_ethtool.c to en/port.c

Move four below functions from en_ethtool.c to en/port.c. These
functions are used by both en_ethtool.c and en_main.c. Future code
can use these functions without ethtool link mode dependency.
u32 mlx5e_port_ptys2speed(u32 eth_proto_oper);
int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
u32 mlx5e_port_speed2linkmodes(u32 speed);

Delete the speed field from table mlx5e_build_ptys2ethtool_map. This
table only keeps the mapping between the mlx5e link mode and
ethtool link mode. Add new table mlx5e_link_speed for translation
from mlx5e link mode to actual speed.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 8f8ae895 10-Apr-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Ignore attempts to offload multiple times a TC flow

For VF->VF and uplink->VF rules, the TC core (cls_api) attempts
to offload the same flow multiple times into the driver, b/c we
registered to the egdev callback.

Use the flow cookie to ignore attempts to add such flows, we can't
reject them (return error), b/c this will fail the offload attempt,
so we ignore that. We indentify wrong stat/del calls using the flow
ingress/egress flags, here we do return error to the core.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 655dc3d2 10-Apr-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Use shared table for offloaded TC eswitch flows

Currently, each representor netdev use their own hash table to keep
the mapping from TC flow (f->cookie) to the driver offloaded instance.
The table is the one which originally was added for offloading TC NIC
(not eswitch) rules.

This scheme breaks when the core TC code calls us to add the same flow
twice, (e.g under egdev use case) since we don't spot that and offload
a 2nd flow into the HW with the wrong source vport.

As a pre-step to solve that, we move to use a single table which keeps
all offloaded TC eswitch flows. The table is located at the eswitch
uplink representor object.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 05866c82 10-Apr-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Prepare for shared table to keep TC eswitch flows

This is a refactoring step to be able and store the hash table which
keeps track of offloaded TC flows in a different location for NIC
vs e-switch rules.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 60bd4af8 18-Apr-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add ingress/egress indication for offloaded TC flows

When an e-switch TC rule is offloaded through the egdev (egress
device) mechanism, we treat this as egress, all other cases (NIC
and e-switch) are considred ingress.

This is preparation step that will allow us to identify "wrong"
stat/del offload calls made by the TC core on egdev based flows and
ignore them.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b1d90e6b 11-Jan-2018 Rabie Loulou <rabiel@mellanox.com>

net/mlx5e: Offload TC eswitch rules for VFs belonging to different PFs

When the merged eswitch capability is supported, allow offloading rules
between VFs which belong to different PFs (and hence have different
eswitch affinity).

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 10ff5359 18-Mar-2018 Shahar Klein <shahark@mellanox.com>

net/mlx5e: Explicitly set source e-switch in offloaded TC rules

Set a specific source e-switch when setting a rule that matches on the
ingress port.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 56e858df 18-Mar-2018 Rabie Loulou <rabiel@mellanox.com>

net/mlx5e: Explicitly set destination e-switch in FDB rules

Set a specific destination e-switch when setting a destination vport.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 38aa51c1 05-Apr-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Support offloaded TC flows with no matches on headers

For example:
tc filter add dev ens2f0_0 parent ffff: flower skip_sw action drop

Note that for eswitch flows, we still always match on the source port.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d708f902 05-Apr-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Get the required HW match level while parsing TC flow matches

Introduce levels of matching on headers of offloaded flows
(none, L2, L3, L4) that follow the inline mode levels.

This is pre-step for us to offload flows without any
matches on headers.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 54782900 05-Apr-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Properly order min inline mode setup while parsing TC matches

Set the initial value to none instead of L2, and set to L2
where the previous initial value was assumed. Make sure to
parse L2 matches before L3 matches and L3 before L4.

This is a pre-step to get the match level for more purposes
other than the validating the needed vs. actual inline level.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1cab1cd7 05-Apr-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Use local actions var while processing offloaded TC flow actions

Use local actions variable while parsing the actions of offloaded TC flow.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 31c8eba5 05-Apr-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Return success when TC offloaded fdb actions parsed ok

Reaching here, means we didn't err anywhere, so lets just
return success.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c180f675 05-Apr-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Avoid redundant zeroing of offloaded TC flow attributes

This is not needed as the attributes are zeroed out on allocation.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b3a433de 08-Apr-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Clean static checker complaints on TC offload and VF reps code

Clean warning/check complaints made by checkpatch on en_{tc,rep}.c

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 46012660 08-Apr-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Remove double defined DMAC header re-write element

The firmware DMAC_47_16 header re-write token was defined twice,
clean it up.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e36d4810 13-Nov-2017 Roi Dayan <roid@mellanox.com>

net/mlx5e: Skip redundant checks when providing NUD lastuse feedback

It's redundant to continue the loop if we found one flow whose lastuse value
being newer than the last one we reported, since this is enough for us to
trigger a NUD update (neigh_event_send()).

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f85900c3 22-Mar-2018 Roi Dayan <roid@mellanox.com>

net/mlx5e: Err if asked to offload TC match on frag being first

The HW doesn't support matching on frag first/later, return error if we are
asked to offload that.

Fixes: 3f7d0eb42d59 ("net/mlx5e: Offload TC matching on packets being IP fragments")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1ccef350 27-Mar-2018 Jianbo Liu <jianbol@mellanox.com>

net/mlx5e: Allow offloading ipv4 header re-write for icmp

For ICMPv4, the checksum is calculated from the ICMP headers and data.
Since the ICMPv4 checksum doesn't cover the IP header, we can allow to
do L3 header re-write for this protocol.

Fixes: bdd66ac0aeed ('net/mlx5e: Disallow TC offloading of unsupported match/action combinations')
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 423c9db2 13-Mar-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Avoid using the ipv6 stub in the TC offload neigh update path

Currently we use the global ipv6_stub var to access the ipv6 global
nd table. This practice gets us to troubles when the stub is only partially
set e.g when ipv6 is loaded under the disabled policy. In this case, as of commit
343d60aada5a ("ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument")
the stub is not null, but stub->nd_tbl is and we crash.

As we can access the ipv6 nd_tbl directly, the fix is just to avoid the
reference through the stub. There is one place in the code where we
issue ipv6 route lookup and keep doing it through the stub, but that
mentioned commit makes sure we get -EAFNOSUPPORT from the stack.

Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# af1607c3 08-Mar-2018 Jianbo Liu <jianbol@mellanox.com>

net/mlx5e: Fix memory usage issues in offloading TC flows

For NIC flows, the parsed attributes are not freed when we exit
successfully from mlx5e_configure_flower().

There is possible double free for eswitch flows. If error is returned
from rhashtable_insert_fast(), the parse attrs will be freed in
mlx5e_tc_del_flow(), but they will be freed again before exiting
mlx5e_configure_flower().

To fix both issues we do the following:
(1) change the condition that determines if to issue the free call to
check if this flow is NIC flow, or it does not have encap action.
(2) reorder the code such that that the check and free calls are done
before we attempt to add into the hash table.

Fixes: 232c001398ae ('net/mlx5e: Add support to neighbour update flow')
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6acfbf38 31-Jan-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Offload tc vlan push/pop using HW action

Currently, we are emulating the offload of vlan push/pop actions using
global setup as done by commit f5f82476090f ("net/mlx5: E-Switch, Support
VLAN actions in the offloads mode"). With newer NICs, we can apply a flow
action for that matter, do that while keeping the emulated path for the
older HW brands.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a9db0ecf 16-Aug-2017 Matan Barak <matanb@mellanox.com>

{net,IB}/mlx5: Add has_tag to mlx5_flow_act

The has_tag member will indicate whether a tag action was specified
in flow specification.

A flow tag 0 = MLX5_FS_DEFAULT_FLOW_TAG is assumed a valid flow tag
that is currently used by mlx5 RDMA driver, whereas in HW flow_tag = 0
means that the user doesn't care about flow_tag. HW always provide
a flow_tag = 0 if all flow tags requested on a specific flow are 0.

So we need a way (in the driver) to differentiate between a user really
requesting flow_tag = 0 and a user who does not care, in order to be
able to report conflicting flow tags on a specific flow.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 001a2fc0 30-Jan-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Return error if prio is specified when offloading eswitch vlan push

This isn't supported when we emulate eswitch vlan push action which
is the current state of things.

Fixes: 8b32580df1cb ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# eb9180f7 03-Jan-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Set hairpin queue size

For a given hairpin packet buffer size, different queue sizes
(values of log_hairpin_num_packets) determine how the data is broken
to strides on the RQ. Currently the chosen value is set to 64B strides.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3f6d08d1 26-Nov-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add RSS support for hairpin

Support RSS for hairpin traffic. We create multiple hairpin RQ/SQ pairs
and RSS TTC table per hairpin instance and steer the related flows
through that table so they are spread between the pairs.

We open one pair per 50Gbs link speed, for all speeds <= 50Gbs, there
is one pair and no RSS while for 100Gbs ports two RSSed pairs.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ddae74ac 23-Nov-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: Vectorize the low level core hairpin object

Enhance the hairpin setup code at the core to support a set of N
(RQ,SQ) pairs. This will be later used by the caller to set RSS
spreading among the different RQs.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 106be53b 20-Dec-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Set per priority hairpin pairs

As part of the QoS model, on xmit, the HW mandates that all packets going
through a given SQ have the same priority. To align hairpin SQs with that,
we use the priority given as part of the matching for the hairpin hash key.

This ensures that flows/packets mapped to different HW priorities will
go through different hairpin instances. If no priority is given for
matching, we treat that as an 8th priority, this is in order not to
harm cases where priority is specified.

Only the PCP priority trust model is supported.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d8822868 20-Dec-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Use vhca id as the hairpin peer identifier

The peer vhca id spans less bits vs the ifindex and can
well serve for the hairpin hash key, move to use that.

This is a pre-step to put more info into the hairpin hash
key in downstream patch while keeping it at 32 bits.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 5c65c564 22-Nov-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Support offloading TC NIC hairpin flows

We refer to TC NIC rule that involves forwarding as "hairpin".

All hairpin rules from the current NIC device (called "func" in
the code) to a given NIC device ("peer") are steered into the
same hairpin RQ/SQ pair.

The hairpin pair is set on demand and removed when there are no
TC rules that need it.

Here's a TC rule that matches on icmp, does header re-write of the
dst mac and hairpin from RX/enp1s2f1 to TX/enp1s2f2 (enp1s2f1/2 are
two mlx5 devices):

tc filter add dev enp1s2f1 protocol ip parent ffff: prio 2
flower skip_sw ip_proto icmp
action pedit ex munge eth dst set 10:22:33:44:55:66 pipe
action mirred egress redirect dev enp1s2f2

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 77ab67b7 12-Nov-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Basic setup of hairpin object

Add the code to do basic setup for hairpin object which
will later serve offloading TC flows.

This includes calling the mlx5 core to create/destroy the hairpin
pair object and setting the HW transport objects that will be used
for steering matched flows to go through hairpin.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 74bd5d56 03-Jan-2018 Arnd Bergmann <arnd@arndb.de>

net/mlx5e: hide an unused variable

The uplink_rpriv variable was added at the start of the function but
only used inside of an #ifdef:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 'mlx5e_route_lookup_ipv6':
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:1549:25: error: unused variable 'uplink_rpriv' [-Werror=unused-variable]

This moves the declaration into that #ifdef as well.

Fixes: 5ed99fb421d4 ("net/mlx5e: Move ethernet representors data into separate struct")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a4b97ab4 07-Dec-2017 Mark Bloch <markb@mellanox.com>

net/mlx5: E-Switch, Create generic header struct to be used by representors

Now that we don't store type dependent data in struct mlx5_eswitch_rep
we can create a generic interface, and representor type.

struct mlx5_eswitch_rep will store an array of interfaces, each
interface is used by a different representor type.

Once we moved to a more generic interface, rdma driver representors can
be added and utilize the same mechanism as the Ethernet driver
representors use.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 5ed99fb4 07-Dec-2017 Mark Bloch <markb@mellanox.com>

net/mlx5e: Move ethernet representors data into separate struct

Ethernet representors have a need to store data which is applicable
only for them. Create a priv void pointer in struct mlx5_eswitch_rep
and move mlx5e to store the relevant data there. As part of this change
we also initialize rep_if in mlx5e_rep_register_vf_vports() as otherwise the
E-Switch code will copy a priv value which is garbage.

We also rename mlx5_eswitch_get_uplink_netdev() to
mlx5_eswitch_get_uplink_priv() and make it return void *.
This way E-Switch code doesn't need to deal with net devices and
we leave the task of getting it to mlx5e.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 9f8a739e 05-Dec-2017 Cong Wang <xiyou.wangcong@gmail.com>

act_mirred: get rid of tcfm_ifindex from struct tcf_mirred

tcfm_dev always points to the correct netdev and we already
hold a refcnt, so no need to use tcfm_ifindex to lookup again.

If we would support moving target netdev across netns, using
pointer would be better than ifindex.

This also fixes dumping obsolete ifindex, now after the
target device is gone we just dump 0 as ifindex.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 21b9c144 12-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: Enlarge the NIC TC offload table size

The NIC TC offload table size was hard coded to 1k. Change it to be

min(max NIC RX table size,
min(max flow counters, 64k) * num flow groups)

where the max values are read from the firmware and the number of
flow groups is hard-coded as before this change.

We don't know upfront the division of flows to groups (== different masks).
This setup allows each group to be of size up to the where we want to go
(when supported, all offloaded flows use counters). Thus, we don't expect
multiple occurences for a group which in turn would add steering hops.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3c37745e 16-Oct-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Properly deal with encap flows add/del under neigh update

Currently, the encap action offload is handled in the actions parse
function and not in mlx5e_tc_add_fdb_flow() where we deal with all
the other aspects of offloading actions (vlan, modify header) and
the rule itself.

When the neigh update code (mlx5e_tc_encap_flows_add()) recreates the
encap entry and offloads the related flows, we wrongly call again into
mlx5e_tc_add_fdb_flow(), this for itself would cause us to handle
again the offloading of vlans and header re-write which puts things
in non consistent state and step on freed memory (e.g the modify
header parse buffer which is already freed).

Since on error, mlx5e_tc_add_fdb_flow() detaches and may release the
encap entry, it causes a corruption at the neigh update code which goes
over the list of flows associated with this encap entry, or double free
when the tc flow is later deleted by user-space.

When neigh update (mlx5e_tc_encap_flows_del()) unoffloads the flows related
to an encap entry which is now invalid, we do a partial repeat of the eswitch
flow removal code which is wrong too.

To fix things up we do the following:

(1) handle the encap action offload in the eswitch flow add function
mlx5e_tc_add_fdb_flow() as done for the other actions and the rule itself.

(2) modify the neigh update code (mlx5e_tc_encap_flows_add/del) to only
deal with the encap entry and rules delete/add and not with any of
the other offloaded actions.

Fixes: 232c001398ae ('net/mlx5e: Add support to neighbour update flow')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b2812089 08-Aug-2017 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Check encap entry state when offloading tunneled flows

Encap entries cached by the driver could be invalidated due to
tunnel destination neighbour state changes.
When attempting to offload a flow that uses a cached encap entry,
we must check the entry validity and defer the offloading
if the entry exists but not valid.

When EAGAIN is returned, the flow offloading to hardware takes place
by the neigh update code when the tunnel destination neighbour
becomes connected.

Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# bdd66ac0 11-Jun-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Disallow TC offloading of unsupported match/action combinations

When offloading header re-write, the HW may need to adjust checksums along
the packet. For IP traffic, and a case where we are asked to modify fields in
the IP header, current HW supports that only for TCP and UDP. Enforce it, in
this case fail the offloading attempt for non TCP/UDP packets.

Fixes: d7e75a325cb2 ('net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions')
Fixes: 2f4fe4cab073 ('net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ace74321 05-Sep-2017 Paul Blakey <paulb@mellanox.com>

net/mlx5e: Fix erroneous freeing of encap header buffer

In case the neighbour for the tunnel destination isn't valid,
we send a neighbour update request but we free the encap
header buffer. This is wrong, because we still need it for
allocating a HW encap entry once the neighbour is available.

Fix that by skipping freeing it if we wait for neighbour.

Fixes: 232c001398ae ('net/mlx5e: Add support to neighbour update flow')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 08820528 22-Aug-2017 Paul Blakey <paulb@mellanox.com>

net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source address

Currently if vxlan tunnel ipv6 src isn't supplied the driver fails to
resolve it as part of the route lookup. The resulting encap header
is left with a zeroed out ipv6 src address so the packets are sent
with this src ip.

Use an appropriate route lookup API that also resolves the source
ipv6 address if it's not supplied.

Fixes: ce99f6b97fcd ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 5fd9fc4e 07-Aug-2017 Jiri Pirko <jiri@mellanox.com>

net: sched: push cls related args into cls_common structure

As ndo_setup_tc is generic offload op for whole tc subsystem, does not
really make sense to have cls-specific args. So move them under
cls_common structurure which is embedded in all cls structs.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3bcc0cec 04-Aug-2017 Jiri Pirko <jiri@mellanox.com>

net: sched: change names of action number helpers to be aligned with the rest

The rest of the helpers are named tcf_exts_*, so change the name of
the action number helpers to be aligned. While at it, change to inline
functions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0c0316f5 13-Jun-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add header re-write offloading of IPv6 hop-limit

For environments where flow-based ipv6 router is offloaded.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a8e4f0c4 13-Jun-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Use macro for TC header re-write offload field mapping

Use a macro for the static mapping between the enumeration of field
supported by the firmware for header re-write to the corresponding
network header field. This improves the readability of the code and
doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a8ade55f 07-Jun-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Offload TC matching on ip ttl

Enable offloading of TC matching on ip ttl / hop-limit

As matching on ttl is supported only by newer HW brands (ConnectX-5),
we should do capability check before attempting to offload that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1f97a526 06-Jun-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Relocate the TC match on ip tos offload code section

The code section for offloading matches on ip tos (L3) should come
before and not after the one that deals with tcp/udp (L4) matches.

Otherwise, we might come up with wrong min-inline requirement, when
one attempts to match on both L3 and L4.

Fixes: fd7da28b280d ('net/mlx5e: Offload TC matching on ip tos / traffic-class')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 9cfb4f71 11-Jun-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Remove TC header re-write offloading of ip tos

Currently the firmware API is partial and allows to offload only
the dscp part of the tos, also, ipv6 support isn't there yet.

As such, remove the offloading option of ipv4 dscp till the FW
APIs are more comprehensive.

Fixes: d79b6df6b10a ('net/mlx5e: Add parsing of TC pedit actions to HW format')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2b64beba 10-May-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Support header re-write of partial fields in TC pedit offload

Using a per field mask field, the TC pedit action supports modifying
partial fields. E.g if using the TC tool, the following example would
make the kernel to only re-write two bytes of the src ip address:

tc filter add dev enp1s0 protocol ip parent ffff: prio 30
flower skip_sw ip_proto udp dst_port 8001
action pedit ex munge ip src set 10.1.0.0 retain 0xffff0000

We add driver support for offload these partial re-writes, by setting
the per FW action offset-in-field and length-from-offset attributes.

The 1st bit set in the mask specifies both the offset and the right
shift to apply on the value such that the 1st bit which needs to be
set will reside in bit 0 of the FW data field.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3099eb5a 04-May-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Use modify header ID cache for offloaded TC NIC flows

Use the modify header ID cache for the header re-write part of offloading
TC NIC flows.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1a9527bb 04-May-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Use modify header ID cache for offloaded TC E-Switch flows

Use the modify header ID cache for the header re-write part of offloading
TC eswitch flows.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 11c9c548 04-May-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add cache for HW modify header IDs

Packets belonging to flows which are different by matching may still need
to go through the same header re-write. Add a cache for header re-write IDs
keyed by the binary chain of modify header actions.

The caching is supported for both eswitch and NIC use-cases, where the
actual conversion of the code to use caching comes in next patches, one
per use-case.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 513f8f7f 07-May-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Use short attribute form when adding/deleting offloaded TC flows

Instead of going through flow->nic/esw_attr for each usage, assign
an attr pointer per the context (nic or esw) and use that.

This patch doesn't add any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# de6ea923 28-Feb-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Remove limitation of single NIC offloaded TC action per rule

Remove the limitation that offloaded NIC filters can have only one
action. This allows us for example to provide flow tag as a note
to upper layers / apps that that HW header re-write was applied.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# fd7da28b 01-Jun-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Offload TC matching on ip tos / traffic-class

Enable offloading of TC matching on ipv4 tos or ipv6 traffic-class.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e77834ec 01-Jun-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Offload TC matching on tcp flags

Enable offloading of TC matching on tcp flags.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d897a638 31-May-2017 Jakub Kicinski <kuba@kernel.org>

sched: add helper for updating statistics on all actions

Forgetting to disable preemption around tcf_action_stats_update()
seems to be a common mistake. Add a helper function for updating
stats on all actions of a filter.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e3ca4e05 09-May-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Fix warnings around parsing of TC pedit actions

The sparse tool emits these correct complaints:

drivers/net/ethernet/mellanox/mlx5/core//en_tc.c:1005:25: warning: cast to restricted __be32
drivers/net/ethernet/mellanox/mlx5/core//en_tc.c:1007:25: warning: cast to restricted __be16

The value is provided from user-space in network order, but there's
no way for them to realize that, avoid the warnings by casting to the
appropriate type.

Fixes: d79b6df6b10a ('net/mlx5e: Add parsing of TC pedit actions to HW format')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d824bf3f 09-May-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Properly enforce disallowing of partial field re-write offload

Currently we don't support partial header re-writes through TC pedit
action offloading. However, the code that enforces that wasn't err-ing
on cases where the first and last bits of the mask are set but there is
some zero bit between them, such as in the below example, fix that!

tc filter add dev enp1s0 protocol ip parent ffff: prio 10 flower
ip_proto udp dst_port 2001 skip_sw
action pedit munge ip src set 1.0.0.1 retain 0xff0000ff

Fixes: d79b6df6b10a ('net/mlx5e: Add parsing of TC pedit actions to HW format')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 26c02749 10-May-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Allow TC csum offload if applied together with pedit action

When offloading header re-writes, the HW re-calculates the relevant L3/L4
checksums. Hence, when upper layers (as done by OVS) ask for TC checksum action
offload together with pedit offload, don't err. This command now works:

tc filter add dev ens1f0 protocol ip parent ffff: prio 20 flower skip_sw
ip_proto tcp dst_port 9001
action pedit ex munge tcp dport set 0x1234 pipe
action csum tcp

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# cdc5a7f3 09-May-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Use the correct delete call on offloaded TC encap entry detach

We wrongly direcly invoke hlist_del_rcu() and not hash_del_rcu() which
does a slightly different call now and may change later, fix that.

Fixes: a54e20b4fcae ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 27902f08 18-May-2017 Wei Yongjun <weiyongjun1@huawei.com>

net/mlx5e: Fix possible memory leak

'encap_header' is malloced and should be freed before leaving from
the error handling cases, otherwise it will cause memory leak.

Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1b9a07ee 10-May-2017 Leon Romanovsky <leon@kernel.org>

{net, IB}/mlx5: Replace mlx5_vzalloc with kvzalloc

Commit a7c3e901a46f ("mm: introduce kv[mz]alloc helpers") added
proper implementation of mlx5_vzalloc function to the MM core.

This made the mlx5_vzalloc function useless, so let's remove it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f6dfb4c3 23-Feb-2017 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5e: Update neighbour 'used' state using HW flow rules counters

When IP tunnel encapsulation rules are offloaded, the kernel can't see
the traffic of the offloaded flow. The neighbour for the IP tunnel
destination of the offloaded flow can mistakenly become STALE and
deleted by the kernel since its 'used' value wasn't changed.

To make sure that a neighbour which is used by the HW won't become
STALE, we proactively update the neighbour 'used' value every
DELAY_PROBE_TIME period, when packets were matched and counted by the HW
for one of the tunnel encap flows related to this neighbour.

The periodic task that updates the used neighbours is scheduled when a
tunnel encap rule is successfully offloaded into HW and keeps re-scheduling
itself as long as the representor's neighbours list isn't empty.

Add, remove, lookup and status change operations done over the
representor's neighbours list or the neighbour hash entry encaps list
are all serialized by RTNL lock.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 232c0013 19-Mar-2017 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5e: Add support to neighbour update flow

In order to offload TC encap rules, the driver does a lookup for the IP
tunnel neighbour according to the output device and the destination IP
given by the user.

To keep tracking after the validity state of such neighbours, we keep
the neighbours information (pair of device pointer and destination IP)
in a hash table maintained at the relevant egress representor and
register to get NETEVENT_NEIGH_UPDATE events. When getting neighbour update
netevent, we search for a match among the cached neighbours entries used for
encapsulation.

In case the neighbour isn't valid, we can't offload the flow into the
HW. We cache the flow (requested matching and actions) in the driver and
offload the rule later, when the neighbour is resolved and becomes
valid.

When a flow is only cached in the driver and not offloaded into HW
yet, we use EAGAIN return value to mark it internally, the TC ndo still
returns success.

Listen to kernel neighbour update netevents to trace relevant neighbours
validity state:

1. If a neighbour becomes valid, offload the related rules to HW.

2. If the neighbour becomes invalid, remove the related rules from HW.

3. If the neighbour mac address was changed, update the encap header.
Remove all the offloaded rules using the old encap header from the HW
and insert new rules to HW with updated encap header.

Access to the neighbors hash table is protected by RTNL lock of its
caller or by the table's spinlock.

Details of the locking/synchronization among the different actions
applied on the neighbour table:

Add/remove operations - protected by RTNL lock of its caller (all TC
commands are protected by RTNL lock). Add and remove operations are
initiated only when the user inserts/removes a TC rule into/from the driver.

Lookup/remove operations - since the lookup operation is done from
netevent notifier block, RTNL lock can't be used (atomic context).
Use the table's spin lock to protect lookups from TC user removal operation.
bh is used since netevent can be called from a softirq context.

Lookup/add operations - The hash table access functions are taking
care of the protection between lookup and add operations.

When adding/removing encap headers and rules to/from the HW, RTNL lock
is used. It can happen when:

1. The user inserts/removes a TC rule into/from the driver (TC commands
are protected by RTNL lock of it's caller).

2. The driver gets neighbour notification event, which reports about
neighbour validity status change. Before adding/removing encap headers
and rules to/from the HW, RTNL lock is taken.

A neighbour hash table entry should be freed when its encap list is empty.
Since The neighbour update netevent notification schedules a neighbour
update work that uses the neighbour hash entry, it can't be freed
unconditionally when the encap list becomes empty during TC delete rule flow.
Use reference count to protect from freeing neighbour hash table entry
while it's still in use.

When the user asks to unregister a netdvice used by one of the neigbours,
neighbour removal notification is received. Then we take a reference on the
neighbour and don't free it until the relevant encap entries (and flows) are
marked as invalid (not offloaded) and removed from HW.
As long as the encap entry is still valid (checked under RTNL lock) we
can safely access the neighbour device saved on mlx5e_neigh struct.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 033354d5 02-Feb-2017 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5e: Read neigh parameters with proper locking

The nud_state and hardware address fields are protected by the neighbour
lock, we should acquire it before accessing those parameters.

Use this lock to avoid inconsistency between the neighbour validity state
and it's hardware address.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 0b67a38f 14-Feb-2017 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5e: Use flag to properly monitor a flow rule offloading state

Instead of relaying on the 'flow->rule' pointer value which can be
valid or invalid (in case the FW returns an error while trying to offload
the rule), monitor the rule state using a flag.

In downstream patch which adds support to IP tunneling neigh update
flow, a TC rule could be cached in the driver and not offloaded into the
HW. In this case, the flow handle pointer stays NULL.

Check the offloaded flag to properly deal with rules which are currently
not offloaded when querying rule statistics.

This patch doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1a8552bd 02-Feb-2017 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5e: Remove output device parameter from create encap header helpers definition

Passing output device parameter to the helper functions that deal with
creation of encapsulation headers is redundant. Output device parameter
can be defined inside those helpers, no need to pass it. Refactor the code by
removing the parameter from the function signature.

This patch doesn't change any functionality.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c1ae1152 25-Apr-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Move the encap entry structure from the eswitch header

The encap entry structure isn't manipulated by the eswitch code,
hence it can/needs to be removed from the eswitch header.

Do that, and change it to have mlx5e_ prefix.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 45247bf2 25-Apr-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: Remove encap entry pointer from the eswitch flow attributes

Encap wise, the tc eswitch flow attribute struct needs to have
only the encap ID which is programmed later to the HW and none
of the higher level encap params, fix that.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1d447a39 23-Apr-2017 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Extendable vport representor netdev private data

Make representor netdev private data extendable by adding new struct
"mlx5e_rep_priv" and use it as the rep netdev private data struct
instead of directly pointing to mlx5_eswitch_rep.

Added new en_rep.h header file to contain all representor related
definitions and prototypes, and moved all representor specific logic
into en_rep.c.

Needed for downstream patches to extend representor functionality to
support neighbour update.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>


# 225aabaf 06-Apr-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Make sure the FW max encap size is enough for ipv6 tunnels

Otherwise the code that fills the ipv6 encapsulation headers could be writing
beyond the allocated headers buffer.

Fixes: ce99f6b97fcd ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 32f3671f 06-Apr-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Make sure the FW max encap size is enough for ipv4 tunnels

Otherwise the code that fills the ipv4 encapsulation headers could be writing
beyond the allocated headers buffer.

Fixes: a54e20b4fcae ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c415f704 30-Mar-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5

On ConnectX5 the wqe inline mode is "none" and hence the FW
reports MLX5_CAP_INLINE_MODE_NOT_REQUIRED.

Fix our devlink callbacks to deal with that on get and set.

Also fix the tc flow parsing code not to fail anything when
inline isn't required.

Fixes: bffaa916588e ('net/mlx5: E-Switch, Add control for inline mode')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d7e75a32 25-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions

This includes calling the parsing code that translates from pedit
speak to the HW API, allocation (deallocation) of a modify header
context and setting the modify header id associated with this
context to the FTE of that flow.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2f4fe4ca 25-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions

This includes calling the parsing code that translates from pedit
speak to the HW API, allocation (deallocation) of a modify header
context and setting the modify header id associated with this
context to the FTE of that flow.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d79b6df6 22-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add parsing of TC pedit actions to HW format

Parse/translate a set of TC pedit actions to be formed in the HW API format.

User-space provides set of keys where each one of them is made of: command (add or
set), header-type, byte offset within that header along with a 32 bit mask and value.

The mask dictates what bits in the 32 bit word that starts on the offset we should
be dealing with, but under negative polarity (unset bits are to be modified).

We do a 1st pass over the set of keys while using the header-type and offset to
fill the masks and the values into a data-structure containting all the
supported network headers.

We then do a 2nd pass over the set of fields to re-write supported by the HW,
where for each such candidate field, we use the masks filled on the 1st pass to
realize if we should offloading re-write it.

In case offloading is required, we fill a HW descriptor with the following:

(1) the header field to modify
(2) the bit offset within the field from where to modify (set command only)
(3) the value to set/add
(4) the length in bits 1...32 to modify (set command only)

Note that it's possible for a given pedit mask to dictate modifying the
same header field multiple times or to modify multiple header fields.
Currently such combinations are not supported for offloading, hence, for set
commands, the offset within the field is always zero, and the length to modify
is the field size.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# aa0cbbae 14-Mar-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Properly deal with resource cleanup when adding TC flow fails

The code for adding tc fdb flows leaves things half set when it fails
in the middle. Currently we are not leaking things (e.g eswitch
vlan reference, encap reference and HW resources) since the main
code to add flower rules does a cleanup by calling mlx5e_tc_del_flow().

This cleanup further works just b/c we're checking there if the HW rule
for the flow we are attempting to delete is valid before touching it, and
since under the current possible combinations of supported actions it's okay
to go and blidnly deref or delete all the action related resources (encap, vlan).

Instead, do things properly, namely make sure that if add flow fails we
clean all what was allocated or referenced. Now, the flow delete code can
blindly deref/deallocate both the rule and the actions related resources and
when more action combinations are introduced (such as the upcoming header
re-write) we are fine with clear and robust code.

While here, align all of nic/fdb parse actions/add flow functions to get
mlx5e_tc_flow struct param and pick the attributes or whatever else needed
from there.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 17091853 13-Mar-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add intermediate struct for TC flow parsing attributes

Add intermediate structure to store attributes parsed from TC filter
matching/actions parts which are soon to be configured into the HW.

Currently put there the flow matching spec after being parsed. More
content to be added in down-stream patch.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3bc4b7bf 24-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add NIC attributes for offloaded TC flows

Add structure that contains the attributes related to offloaded
NIC flows. Currently it has the actions and flow tag.

While here, do xmas tree cleanup of the TC configure function.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ecf5bb79 24-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add prefix for e-switch offloaded TC flow attributes

Add esw_ prefix to the flow attributes attached to offloaded e-switch
TC flows. This is a pre-step to add attributes to offloaded NIC TC flows.

Also, save one pointer space by using gcc's zero size array, this would
be beneficial for environments where 100Ks (or Ms) of flows are offloaded.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1ad9a00a 21-Mar-2017 Paul Blakey <paulb@mellanox.com>

net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps

This was added to allow the TC offloading code to identify offloading
encap/decap vxlan rules.

The VF reps are effectively related to the same mlx5 PCI device as the
PF. Since the kernel invokes the (say) delete ndo for each netdev, the
FW erred on multiple vxlan dst port deletes when the port was deleted
from the system.

We fix that by keeping the registration to be carried out only by the
PF. Since the PF serves as the uplink device, the VF reps will look
up a port there and realize if they are ok to offload that.

Tested:
<SETUP VFS>
<SETUP switchdev mode to have representors>
ip link add vxlan1 type vxlan id 44 dev ens5f0 dstport 9999
ip link set vxlan1 up
ip link del dev vxlan1

Fixes: 4a25730eb202 ('net/mlx5e: Add ndo_udp_tunnel_add to VF representors')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 09c91ddf 21-Mar-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Use the proper UAPI values when offloading TC vlan actions

Currently we use the non UAPI values and we miss erring on
the modify action which is not supported, fix that.

Fixes: 8b32580df1cb ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d85cdccb 21-Mar-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Change the TC offload rule add/del code path to be per NIC or E-Switch

Refactor the code to deal with add/del TC rules to have handler per NIC/E-switch
offloading use case, and push the latter into the e-switch code. This provides
better separation and is to be used in down-stream patch for applying a fix.

Fixes: bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode")
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 65ba8fb7 10-Mar-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Avoid wrong identification of rules on deletion

When deleting offloaded TC flows, we must correctly identify E-switch
rules. The current check could get us wrong w.r.t to rules set on the
PF. Since it's possible to set NIC rules on the PF, switch to SRIOV
offloads mode and then attempt to delete a NIC rule.

To solve that, we add a flags field to offloaded rules, set it on
creation time and use that over the code where currently needed.

Fixes: 8b32580df1cb ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fed06ee8 12-Feb-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Disable preemption when doing TC statistics upcall

When called by HW offloading drivers, the TC action (e.g
net/sched/act_mirred.c) code uses this_cpu logic, e.g

_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets)

per the kernel documention, preemption should be disabled, add that.

Before the fix, when running with CONFIG_PREEMPT set, we get a

BUG: using smp_processor_id() in preemptible [00000000] code: tc/3793

asserion from the TC action (mirred) stats_update callback.

Fixes: aad7e08d39bd ('net/mlx5e: Hardware offloaded flower filter statistics support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 321fa4ff 03-Feb-2017 Arnd Bergmann <arnd@arndb.de>

net/mlx5e: fix another maybe-uninitialized false-positive

In commit abeffce ("net/mlx5e: Fix a -Wmaybe-uninitialized warning"), I fixed a
gcc warning for the ipv4 offload handling. Now we get the same warning for the
added ipv6 support:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:815:40: warning: 'out_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]

We can apply the same workaround here as well.

Fixes: ce99f6b97fcd ("net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3e621b19 12-Jan-2017 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5e: Support TC encapsulation offloads with upper devices

When tunneling is used, some virtualizations systems set the (mlx5e) uplink
device to be stacked under upper devices such as bridge or ovs internal
port, where the VTEP IP address used for the encapsulation is set on
that upper device.

In order to support such use-cases, we also deal with a setup where the
egress mirred device isn't representing a port on the HW e-switch to where
the ingress device belongs. We use eswitch service function which returns
the uplink and set it as the egress device of the tc encap rule.

Fixes: a54e20b4fcae ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads")
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ce99f6b9 11-Dec-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels

Add the missing parts for offloading IPv6 tunnels. This includes
route and neigh lookups and construnction of the IPv6 tunnel headers.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 9a941117 03-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Maximize ip tunnel key usage on the TC offloading path

Use more fields out of the tunnel key (e.g the tunnel source IP address)
provided by upper layers for the route lookup done on the encap offload path.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 76f7444d 05-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Use the full tunnel key info for encapsulation offload house-keeping

Currently we use subset of the input tunnel key fields (id, ip daddr,
dst port) which are provided by upper layers to indentify flows that should
go through the same encapsulation and maintain the HW encapsulation table.

This is redundant and can get us wrong.

Instead, keep a copy of the ip tunnel info provided by the user
through TC and have the tunnel key part as the key to our internal hash.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 75c33da8 21-Dec-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: TC ipv4 tunnel encap offload cosmetic changes

Move around some settings of variables as pre-step to make things
more robust and clear for the ipv6 case in down-stream patch.
This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 19f44401 10-Dec-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add TC offloads matching on IPv6 encapsulation headers

Enhance the parsing of offloaded TC rules to set HW matching on outer
IPv6 encapsulation headers. This effectively adds support for TC tunnel
key release action (decapsulation) of SRIOV offloads over IPv6 tunnels.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 10543365 09-Oct-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Add support to s-tag in mlx5 firmware interface

Add svlan_tag and rename vlan_tag to cvlan_tag in flow table entry
match param.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# abeffce9 15-Jan-2017 Arnd Bergmann <arnd@arndb.de>

net/mlx5e: Fix a -Wmaybe-uninitialized warning

As found by Olof's build bot, we gain a harmless warning about a
potential uninitialized variable reference in mlx5:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 'parse_tc_fdb_actions':
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:769:13: warning: 'out_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:811:21: note: 'out_dev' was declared here

This was introduced through the addition of an 'IS_ERR/PTR_ERR' pair
that gcc is unfortunately unable to completely figure out.

The problem being gcc cannot tell that if(IS_ERR()) in
mlx5e_route_lookup_ipv4() is equivalent to checking if(err) later,
so it assumes that 'out_dev' is used after the 'return PTR_ERR(rt)'.

The PTR_ERR_OR_ZERO() case by comparison is fairly easy to detect
by gcc, so it can't get that wrong, so it no longer warns.

Hadar Hen Zion already attempted to fix the warning earlier by adding fake
initializations, but that ended up not fully addressing all warnings, so
I'm reverting it now that it is no longer needed.

Link: http://arm-soc.lixom.net/buildlogs/mainline/v4.10-rc3-98-gcff3b2c/
Fixes: a42485eb0ee4 ("net/mlx5e: TC ipv4 tunnel encap offload error flow fixes")
Fixes: a757d108dc1a ("net/mlx5e: Fix kbuild warnings for uninitialized parameters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5e86397a 10-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Properly handle FW errors while adding TC rules

When the firmware returns an error (common example is an attempt to
add twice the same rule which is refused by the some FWs), we are not
properly derefing/cleaning few resources allocated on the way.
Examples are vport vlan deref under eswitch vlan offloads, and encap
entry/neighbour deref under eswitch encapsulation offloads, fix that.

Fixes: a54e20b4fcae ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Fixes: 8b32580df1cb ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a757d108 10-Jan-2017 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5e: Fix kbuild warnings for uninitialized parameters

kbuild warn about parameters that may be used uninitialized, fix it.

Fixes: a54e20b4fcae ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0827444d 10-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Set inline mode requirements for matching on IP fragments

For e-switch level matching on packets being an IP fragment, we
need to make sure the source vport inline mode is L3, fix that.

Fixes: 3f7d0eb42d59 ('net/mlx5e: Offload TC matching on packets being IP fragments')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2e72eb43 10-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Properly get address type of encapsulation IP headers

As done elsewhere in our TC/flower offload code, the address type of
the encapsulation IP headers should be realized accroding to the
addr_type field of the encapsulation control dissector key, do that.

Fixes: bbd00f7e2349 ('net/mlx5e: Add TC tunnel release action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a42485eb 10-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: TC ipv4 tunnel encap offload error flow fixes

When the route lookup fails we should return the actual error.

When the neigh isn't valid, we should return -EOPNOTSUPP as done
in similar cases along the code.

When the offload can't take place as of invalid neigh etc, we
must release the neigh.

Fixes: a54e20b4fcae ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2fcd82e9 10-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Warn when rejecting offload attempts of IP tunnels

We silently reject offloading of IPv6 tunnels, non vxlan tunnels,
vxlan tunnels where the dst port to match is not provided, etc.

Be a bit more verbose and print a warning so the user better
realizes what went wrong here and can fix it.

Fixes: a54e20b4fcae ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Fixes: bbd00f7e2349 ('net/mlx5e: Add TC tunnel release action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cd377663 10-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Properly handle offloading of source udp port for IP tunnels

We can offload the matching on source udp port of ip tunnels for
decapsulation. We can not offload setting source udp port for tunnels
as part of encapsulation. Fix both the code that deals with matching
offload (decap) and the code that deal with encap offload to align with
that.

Fixes: a54e20b4fcae ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Fixes: bbd00f7e2349 ('net/mlx5e: Add TC tunnel release action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3f7d0eb4 07-Dec-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Offload TC matching on packets being IP fragments

Enable offloading of matching on packets being fragments.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5067b602 30-Nov-2016 Roi Dayan <roid@mellanox.com>

net/mlx5e: Remove flow encap entry in the correct place

Handling flow encap entry should be inside tc del flow
and is only relevant for offloaded eswitch TC rules.

Fixes: 11a457e9b6c1 ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 961e8979 30-Nov-2016 Roi Dayan <roid@mellanox.com>

net/mlx5e: Refactor tc del flow to accept mlx5e_tc_flow instance

Change the function that deletes offloaded TC rule to get
struct mlx5e_tc_flow instance which contains both the flow
handle and flow attributes. This is a cleanup needed for
downstream patches, it doesn't change any functionality.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 86a33ae1 30-Nov-2016 Roi Dayan <roid@mellanox.com>

net/mlx5e: Correct cleanup order when deleting offloaded TC rules

According to the reverse unwinding principle, on delete time we should
first handle deletion of the steering rule and later handle the vlan
deletion from the eswitch.

Fixes: 8b32580df1cb ("net/mlx5e: Add TC vlan action for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 53636068 30-Nov-2016 Roi Dayan <roid@mellanox.com>

net/mlx5e: Remove redundant hashtable lookup in configure flower

We will never find a flow with the same cookie as cls_flower always
allocates a new flow and the cookie is the allocated memory address.

Fixes: e3a2b7ed018e ("net/mlx5e: Support offload cls_flower with drop action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# de0af0bf 22-Nov-2016 Roi Dayan <roid@mellanox.com>

net/mlx5e: Enforce min inline mode when offloading flows

A flow should be offloaded only if the matches are
allowed according to min inline mode.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a54e20b4 07-Nov-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5e: Add basic TC tunnel set action for SRIOV offloads

In mlx5 HW, encapsulation is offloaded by the steering rule having
index into an encapsulation table containing the entire set of headers
to be added by the HW. The driver sets these headers in a buffer when we
are offloading the action.

The code maintains mlx5_encap_entry for each encap header it has
encountered when attempted to offload TC tunnel set action.

This entry maintains a linked list of all the flows sharing the same
encap header, when the last flow is removed from the list the encap
entry is removed.

The actual encap_header is allocated by the driver in the hardware only
if we have layer two neighbour info when the encap entry is created.
While the flow is in the driver, the driver holds a reference on the
neighbour.

When a new flow with encap action is inserted, the code first checks if
the required encap entry exists according to the tunnel set parameters.
If it does the encap is shared, otherwise a new mlx5_encap_entry is
created.

TC action parsing implementation in the driver assumes that tunnel set
action is provided in the same order set by the user, e.g before the
mirred_redirect action.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bbd00f7e 07-Nov-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5e: Add TC tunnel release action for SRIOV offloads

Enhance the parsing of offloaded TC rules to set HW matching on outer
(encapsulation) headers.
Parse TC tunnel release action and set it as mlx5 decap action when the
required capabilities are supported.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 66958ed9 07-Nov-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5: Support encap id when setting new steering entry

In order to support steering rules which add encapsulation headers,
encap_id parameter is needed.

Add new mlx5_flow_act struct which holds action related parameter:
action, flow_tag and encap_id. Use mlx5_flow_act struct when adding a new
steering rule.
This patch doesn't change any functionality.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c9f1b073 07-Nov-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5: Add creation flags when adding new flow table

When creating flow tables, allow the caller to specify creation flags.
Currently no flags are used and as such this patch doesn't add any new
functionality.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 358d79a4 03-Nov-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Handle matching on vlan priority for offloaded TC rules

We ignored the vlan priority in offloaded TC rules matching part,
fix that.

Fixes: 095b6cfd69ce ('net/mlx5e: Add TC vlan match parsing')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e37a79e5 20-Sep-2016 Mark Bloch <markb@mellanox.com>

net/mlx5e: Add tc support for FWD rule with counter

When creating a FWD rule using tc create also a HW counter
for this rule.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 74491de9 31-Aug-2016 Mark Bloch <markb@mellanox.com>

net/mlx5: Add multi dest support

Currently when calling mlx5_add_flow_rule we accept
only one flow destination, this commit allows to pass
multiple destinations.

This change forces us to change the return structure to a more
flexible one. We introduce a flow handle (struct mlx5_flow_handle),
it holds internally the number for rules created and holds an array
where each cell points the to a flow rule.

From the consumers (of mlx5_add_flow_rule) point of view this
change is only cosmetic and requires only to change the type
of the returned value they store.

From the core point of view, we now need to use a loop when
allocating and deleting rules (e.g given to us a flow handler).

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 5724b8b5 13-Oct-2016 Shmulik Ladkani <shmulik.ladkani@gmail.com>

net/sched: tc_mirred: Rename public predicates 'is_tcf_mirred_redirect' and 'is_tcf_mirred_mirror'

These accessors are used in various drivers that support tc offloading,
to detect properties of a given 'tc_action'.

'is_tcf_mirred_redirect' tests that the action is TCA_EGRESS_REDIR.
'is_tcf_mirred_mirror' tests that the action is TCA_EGRESS_MIRROR.

As a prep towards supporting INGRESS redir/mirror, rename these
predicates to reflect their true meaning:
s/is_tcf_mirred_redirect/is_tcf_mirred_egress_redirect/
s/is_tcf_mirred_mirror/is_tcf_mirred_egress_mirror/

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d0debb76 30-Sep-2016 Arnd Bergmann <arnd@arndb.de>

net/mlx5e: shut up maybe-uninitialized warning

Build-testing this driver with -Wmaybe-uninitialized gives a new false-positive
warning that I can't really explain:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 'mlx5e_configure_flower':
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:509:3: error: 'old_attr' may be used uninitialized in this function [-Werror=maybe-uninitialized]

It's obvious from the code that 'old_attr' is initialized whenever 'old'
is non-NULL here. The warning appears with all versions I tested from gcc-4.7
through gcc-6.1, and I could not come up with a way to rewrite the function
in a more readable way that avoids the warning, so I'm adding another
initialization to shut it up.

Fixes: 8b32580df1cb ("net/mlx5e: Add TC vlan action for SRIOV offloads")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 095b6cfd 22-Sep-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add TC vlan match parsing

Enhance the parsing of offloaded TC rules matches to handle vlans.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8b32580d 22-Sep-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add TC vlan action for SRIOV offloads

Parse TC vlan actions and set the required elements to allow offloading.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 776b12b6 22-Sep-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: Put elements related to offloaded TC rule in one struct

Put the representors related to the source and dest vports and the
action in struct mlx5_esw_flow_attr which is used while setting the FDB rule.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9deb2241 22-Sep-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: E-Switch, Set the vport when registering the uplink rep

Set the vport value in the PF entry to be that of the uplink so
we can use it blindly over the tc / eswitch offload code without
translating it each time we deal with the uplink representor.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1dbd0d37 18-Aug-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5e: Use correct flow dissector key on flower offloading

The wrong key is used when extracting the address type field set by
the flower offload code. We have to use the control key and not the
basic key, fix that.

Fixes: e3a2b7ed018e ('net/mlx5e: Support offload cls_flower with drop action')
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 22dc13c8 13-Aug-2016 WANG Cong <xiyou.wangcong@gmail.com>

net_sched: convert tcf_exts from list to pointer array

As pointed out by Jamal, an action could be shared by
multiple filters, so we can't use list to chain them
any more after we get rid of the original tc_action.
Instead, we could just save pointers to these actions
in tcf_exts, since they are refcount'ed, so convert
the list to an array of pointers.

The "ugly" part is the action API still accepts list
as a parameter, I just introduce a helper function to
convert the array of pointers to a list, instead of
relying on the C99 feature to iterate the array.

Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# adb4c123 14-Jul-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add TC HW support for FDB (SRIOV e-switch) offloads

Enhance the TC offload code such that when the eswitch exists and it's
mode being SRIOV offloads, we do TC actions parsing and setup targeted
for eswitch. Next, we add the offloaded flow to the HW e-switch (fdb).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 03a9d11e 14-Jul-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads

Add the setup code that parses the TC actions needed to support offloading drop
and mirred/redirect for SRIOV e-switch. We can redirect between two devices if
they belong to the same HW switch, compare the switchdev HW ID attribute to
enforce that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5c40348c 14-Jul-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Adjustments in the TC offload code towards reuse for SRIOV

Towards reusing the TC offloads code for an SRIOV use-case, change some of the
helper functions to have _nic in their names so it's clear what's NIC unique
and what's general. Also group together the NIC related helpers so we can easily
branch per the use-case in downstream patch.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 55130287 14-Jul-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Offload TC flow counters only when supported

Currenly, the code that programs the flow actions into the firmware
doesn't check if was actually asked to offload the statistics, fix that.

Fixes: aad7e08d39bd ('net/mlx5e: Hardware offloaded flower filter statistics support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c5bb1730 04-Jul-2016 Maor Gottlieb <maorg@mellanox.com>

net/mlx5: Refactor mlx5_add_flow_rule

Reduce the set of arguments passed to mlx5_add_flow_rule
by introducing flow_spec structure.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aad7e08d 12-May-2016 Amir Vadai <amirva@mellanox.com>

net/mlx5e: Hardware offloaded flower filter statistics support

Introduce support in updating statistics of offloaded TC flower
classifiers. Currently only the DROP action is supported.

Signed-off-by: Amir Vadai <amirva@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# acff797c 28-Apr-2016 Maor Gottlieb <maorg@mellanox.com>

net/mlx5e: Refactor mlx5e flow steering structs

Slightly refactor and re-order the flow steering structs,
tables and data-bases for better self-containment and
flexibility to add more future steering phases
(tables/rules/data bases) e.g: aRFS.

Changes:
1. Move the vlan DB and address DB into their table structs.
2. Rename steering table structs to unique format: mlx5e_*_table,
e.g: mlx5e_vlan_table.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d63cd286 28-Apr-2016 Maor Gottlieb <maorg@mellanox.com>

net/mlx5: Add user chosen levels when allocating flow tables

Currently, consumers of the flow steering infrastructure can't
choose their own flow table levels and are limited to one
flow table per level. This just waste levels.
Instead, we introduce here the possibility to use multiple
flow tables in a level. The user is free to connect these
flow tables, while following the rule (FTEs in FT of level x
could only point to FTs of level y where y > x).

In addition this patch switch the order of the create/destroy
flow tables of the NIC(vlan and main).

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 12185a9f 07-Mar-2016 Amir Vadai <amir@vadai.me>

net/mlx5e: Support offload cls_flower with skbedit mark action

Introduce offloading of skbedit mark action.

For example, to mark with 0x1234, all TCP (ip_proto 6) packets arriving
to interface ens9:

# tc qdisc add dev ens9 ingress
# tc filter add dev ens9 protocol ip parent ffff: \
flower ip_proto 6 \
indev ens9 \
action skbedit mark 0x1234

Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e3a2b7ed 07-Mar-2016 Amir Vadai <amir@vadai.me>

net/mlx5e: Support offload cls_flower with drop action

Parse tc_cls_flower_offload into device specific commands and program
the hardware to classify and act accordingly.

For example, to drop ICMP (ip_proto 1) packets from specific smac, dmac,
src_ip, src_ip, arriving to interface ens9:

# tc qdisc add dev ens9 ingress

# tc filter add dev ens9 protocol ip parent ffff: \
flower ip_proto 1 \
dst_mac 7c:fe:90:69:81:62 src_mac 7c:fe:90:69:81:56 \
dst_ip 11.11.11.11 src_ip 11.11.11.12 indev ens9 \
action drop

Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e8f887ac 07-Mar-2016 Amir Vadai <amir@vadai.me>

net/mlx5e: Introduce tc offload support

Extend ndo_setup_tc() to support ingress tc offloading. Will be used by
later patches to offload tc flower filter.

Feature is off by default and could be enabled by issuing:
# ethtool -K eth0 hw-tc-offload on

Offloads flow table is dynamically created when first filter is
added.
Rules are saved in a hash table that is maintained by the consumer (for
example - the flower offload in the next patch).
When last filter is removed and no filters exist in the hash table, the
offload flow table is destroyed.

Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>