History log of /linux-master/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
Revision Date Author Comments
# 0553e753 09-Apr-2024 Shay Drory <shayd@nvidia.com>

net/mlx5: E-switch, store eswitch pointer before registering devlink_param

Next patch will move devlink register to be first. Therefore, whenever
mlx5 will register a param, the user will be notified.
In order to notify the user, devlink is using the get() callback of
the param. Hence, resources that are being used by the get() callback
must be set before the devlink param is registered.

Therefore, store eswitch pointer inside mdev before registering the
param.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 3fbf6120 07-Jan-2024 Jakub Kicinski <kuba@kernel.org>

Revert "mlx5 updates 2023-12-20"

Revert "net/mlx5: Implement management PF Ethernet profile"
This reverts commit 22c4640698a1d47606b5a4264a584e8046641784.
Revert "net/mlx5: Enable SD feature"
This reverts commit c88c49ac9c18fb7c3fa431126de1d8f8f555e912.
Revert "net/mlx5e: Block TLS device offload on combined SD netdev"
This reverts commit 83a59ce0057b7753d7fbece194b89622c663b2a6.
Revert "net/mlx5e: Support per-mdev queue counter"
This reverts commit d72baceb92539a178d2610b0e9ceb75706a75b55.
Revert "net/mlx5e: Support cross-vhca RSS"
This reverts commit c73a3ab8fa6e93a783bd563938d7cf00d62d5d34.
Revert "net/mlx5e: Let channels be SD-aware"
This reverts commit e4f9686bdee7b4dd89e0ed63cd03606e4bda4ced.
Revert "net/mlx5e: Create EN core HW resources for all secondary devices"
This reverts commit c4fb94aa822d6c9d05fc3c5aee35c7e339061dc1.
Revert "net/mlx5e: Create single netdev per SD group"
This reverts commit e2578b4f983cfcd47837bbe3bcdbf5920e50b2ad.
Revert "net/mlx5: SD, Add informative prints in kernel log"
This reverts commit c82d360325112ccc512fc11a3b68cdcdf04a1478.
Revert "net/mlx5: SD, Implement steering for primary and secondaries"
This reverts commit 605fcce33b2d1beb0139b6e5913fa0b2062116b2.
Revert "net/mlx5: SD, Implement devcom communication and primary election"
This reverts commit a45af9a96740873db9a4b5bb493ce2ad81ccb4d5.
Revert "net/mlx5: SD, Implement basic query and instantiation"
This reverts commit 63b9ce944c0e26c44c42cdd5095c2e9851c1a8ff.
Revert "net/mlx5: SD, Introduce SD lib"
This reverts commit 4a04a31f49320d078b8078e1da4b0e2faca5dfa3.
Revert "net/mlx5: Fix query of sd_group field"
This reverts commit e04984a37398b3f4f5a79c993b94c6b1224184cc.
Revert "net/mlx5e: Use the correct lag ports number when creating TISes"
This reverts commit a7e7b40c4bc115dbf2a2bb453d7bbb2e0ea99703.

There are some unanswered questions on the list, and we don't
have any docs. Given the lack of replies so far and the fact
that v6.8 merge window has started - let's revert this and
revisit for v6.9.

Link: https://lore.kernel.org/all/20231221005721.186607-1-saeed@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 22c46406 08-Sep-2023 Armen Ratner <armeng@nvidia.com>

net/mlx5: Implement management PF Ethernet profile

Add management PF modules, which introduce support for the structures
needed to create the resources for the MGMT PF to work.
Also, add the necessary calls and functions to establish this
functionality.

Signed-off-by: Armen Ratner <armeng@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>


# baac8351 10-Oct-2023 Jianbo Liu <jianbol@nvidia.com>

net/mlx5e: Reduce eswitch mode_lock protection context

Currently eswitch mode_lock is so heavy, for example, it's locked
during the whole process of the mode change, which may need to hold
other locks. As the mode_lock is also used by IPSec to block mode and
encap change now, it is easy to cause lock dependency.

Since some of protections are also done by devlink lock, the eswitch
mode_lock is not needed at those places, and thus the possibility of
lockdep issue is reduced.

Fixes: c8e350e62fc5 ("net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>


# 7624e58a 27-Aug-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: E-switch, register event handler before arming the event

Currently, mlx5 is registering event handler for vport context change
event some time after arming the event. this can lead to missing an
event, which will result in wrong rules in the FDB.
Hence, register the event handler before arming the event.

This solution is valid since FW is sending vport context change event
only on vports which SW armed, and SW arming the vport when enabling
it, which is done after the FDB has been created.

Fixes: 6933a9379559 ("net/mlx5: E-Switch, Use async events chain")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 34413460 05-Sep-2023 Bodong Wang <bodong@nvidia.com>

mlx5/core: E-Switch, Create ACL FT for eswitch manager in switchdev mode

ACL flow table is required in switchdev mode when metadata is enabled,
driver creates such table when loading each vport. However, not every
vport is loaded in switchdev mode. Such as ECPF if it's the eswitch manager.
In this case, ACL flow table is still needed.

To make it modularized, create ACL flow table for eswitch manager as
default and skip such operations when loading manager vport.

Also, there is no need to load the eswitch manager vport in switchdev mode.
This means there is no need to load it on regular connect-x HCAs where
the PF is the eswitch manager. This will avoid creating duplicate ACL
flow table for host PF vport.

Fixes: 29bcb6e4fe70 ("net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rules")
Fixes: eb8e9fae0a22 ("mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager")
Fixes: 5019833d661f ("net/mlx5: E-switch, Introduce helper function to enable/disable vports")
Signed-off-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b691b111 25-Aug-2023 Dima Chumak <dchumak@nvidia.com>

net/mlx5: Implement devlink port function cmds to control ipsec_packet

Implement devlink port function commands to enable / disable IPsec
packet offloads. This is used to control the IPsec capability of the
device.

When ipsec_offload is enabled for a VF, it prevents adding IPsec packet
offloads on the PF, because the two cannot be active simultaneously due
to HW constraints. Conversely, if there are any active IPsec packet
offloads on the PF, it's not allowed to enable ipsec_packet on a VF,
until PF IPsec offloads are cleared.

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-9-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 06bab696 25-Aug-2023 Dima Chumak <dchumak@nvidia.com>

net/mlx5: Implement devlink port function cmds to control ipsec_crypto

Implement devlink port function commands to enable / disable IPsec
crypto offloads. This is used to control the IPsec capability of the
device.

When ipsec_crypto is enabled for a VF, it prevents adding IPsec crypto
offloads on the PF, because the two cannot be active simultaneously due
to HW constraints. Conversely, if there are any active IPsec crypto
offloads on the PF, it's not allowed to enable ipsec_crypto on a VF,
until PF IPsec offloads are cleared.

Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-8-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 8efd7b17 25-Aug-2023 Leon Romanovsky <leon@kernel.org>

net/mlx5: Provide an interface to block change of IPsec capabilities

mlx5 HW can't perform IPsec offload operation simultaneously both on PF
and VFs at the same time. While the previous patches added devlink knobs
to change IPsec capabilities dynamically, there is a need to add a logic
to block such IPsec capabilities for the cases when IPsec is already
configured.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230825062836.103744-7-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5c632cc3 01-Jun-2023 Jiri Pirko <jiri@resnulli.us>

net/mlx5: Relax mlx5_devlink_eswitch_get() return value checking

If called from port ops, it is not needed to perform the checks in
mlx5_devlink_eswitch_get(). The reason is devlink port would not be
registered if the checks are not true. Introduce relaxed version
mlx5_devlink_eswitch_nocheck_get() and use it in port ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2caa2a39 31-May-2023 Jiri Pirko <jiri@resnulli.us>

net/mlx5: Reduce number of vport lookups passing vport pointer instead of index

During devlink port init/cleanup and register/unregister calls, there
are many lookups of vport. Instead of passing vport_num as argument to
functions, pass the vport struct pointer directly and avoid repeated
lookups.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2c5f33f6 31-May-2023 Jiri Pirko <jiri@resnulli.us>

net/mlx5: Embed struct devlink_port into driver structure

Struct devlink_port is usually embedded in a driver-specific struct
which allows to carry driver context to devlink port ops.

Introduce a container struct to include devlink_port struct
in preparation to also include driver context for devlink port ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 13f878a2 01-Jun-2023 Jiri Pirko <jiri@resnulli.us>

net/mlx5: Don't register ops for non-PF/VF/SF port and avoid checks in ops

Currently each PF/VF/SF devlink port op called into mlx5 code calls
is_port_function_supported() to check if the port is either
PF, VF or SF. So make sure that the ops are registered with devlink
port only for those and avoid the is_port_function_supported() checks
in ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e855afd7 26-May-2023 Jiri Pirko <jiri@resnulli.us>

net/mlx5: Introduce mlx5_eswitch_load/unload_sf_vport() and use it from SF code

Similar to the PF/VF helpers, introduce a set of load/unload helpers
for SF vports. From there, call mlx5_eswitch_load/unload_vport() which
are common for PFs/VFs and newly introduced SF helpers.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d9833bcf 25-May-2023 Jiri Pirko <jiri@resnulli.us>

net/mlx5: Push devlink port PF/VF init/cleanup calls out of devlink_port_register/unregister()

In order to prepare for
mlx5_esw_offloads_devlink_port_register/unregister() to be used
for SFs as well, push out the PF/VF specific init/cleanup calls outside.
Introduce mlx5_eswitch_load/unload_pf_vf_vport() and call them from
there. Use these new helpers of PF/VF loading and make
mlx5_eswitch_local/unload_vport() reusable for SFs.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9eca8bb8 25-May-2023 Jiri Pirko <jiri@resnulli.us>

net/mlx5: Give esw_offloads_load/unload_rep() "mlx5_" prefix

As esw_offloads_load/unload_rep() are used outside eswitch.c it is nicer
for them to have "mlx5_" prefix. Add it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 329980d0 25-May-2023 Jiri Pirko <jiri@resnulli.us>

net/mlx5: Make mlx5_eswitch_load/unload_vport() static

mlx5_eswitch_load/unload_vport()() functions are not used
outside of eswitch.c. Make them static.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6496357a 20-Jun-2023 Maher Sanalla <msanalla@nvidia.com>

net/mlx5: Query hca_cap_2 only when supported

On vport enable, where fw's hca caps are queried, the driver queries
hca_caps_2 without checking if fw truly supports them, causing a false
failure of vfs vport load and blocking SRIOV enablement on old devices
such as CX4 where hca_caps_2 support is missing.

Thus, add a check for the said caps support before accessing them.

Fixes: e5b9642a33be ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 29e4c95f 01-Jun-2023 Jiri Pirko <jiri@resnulli.us>

net/mlx5: Remove pointless vport lookup from mlx5_esw_check_port_type()

As xa_get_mark() returns false in case the entry is not present,
no need to redundantly check if vport is present. Remove the lookup.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# da744fd1 14-Jun-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Fix UAF in mlx5_eswitch_cleanup()

mlx5_eswitch_cleanup() is using esw right after freeing it for
releasing devlink_param.
Fix it by releasing the devlink_param before freeing the esw, and
adjust the create function accordingly.

Fixes: 3f90840305e2 ("net/mlx5: Move esw multiport devlink param to eswitch code")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Automatic Verification <verifier@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f405787a 01-Jun-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Create eswitch debugfs root directory

Following patch in series uses the new directory for bridge FDB debugfs.
The new directory is intended for all future eswitch-specific debugfs
files.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3f908403 17-May-2023 Shay Drory <shayd@nvidia.com>

net/mlx5: Move esw multiport devlink param to eswitch code

Move the param registration and handling code into the eswitch
code as they are related to each other. No point in having the
devlink param registration done in separate file.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a7719b29 07-Mar-2023 Daniel Jurgens <danielj@nvidia.com>

net/mlx5: Add management of EC VF vports

Add init, load, unload, and cleanup of the EC VF vports. This includes
changes in how eswitch SRIOV is managed. Previous on an embedded CPU
platform the number of VFs provided when enabling the eswitch was always
0, host VFs vports are handled in the eswitch functions change event
handler. Now track the number of EC VFs as well, so they can be handled
properly in the enable/disable flows.

There are only 3 marks available for use in xarrays, all 3 were already
in use for this use case. EC VF vports are in a known range so we can
access them by index instead of marks.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# eb8e9fae 06-Jun-2023 Bodong Wang <bodong@nvidia.com>

mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager

Eswitch vport is needed for eswitch manager when creating LAG,
to create egress rules. However, this was not handled when ECPF is
an eswitch manager.

Signed-off-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f5d87b47 23-Apr-2023 Roi Dayan <roid@nvidia.com>

net/mlx5e: E-Switch, Initialize E-Switch for eswitch manager

Initialize eswitch instance for a function which is eswitch manager
but not a vport group manager.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3d7c5f78 19-Apr-2023 Roi Dayan <roid@nvidia.com>

net/mlx5e: E-Switch, Add a check that log_max_l2_table is valid

If log_max_l2_table is 0 there is no really room for one L2 address.
and should be treated as not supported.
Do the check in MPFS init and for vport context events which
both used to update L2 address.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 292243d1 23-Apr-2023 Roi Dayan <roid@nvidia.com>

net/mlx5e: E-Switch: move debug print of adding mac to correct place

Move the debug print inside the if clause that actually does the change.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 99db5669 02-Apr-2023 Roi Dayan <roid@nvidia.com>

net/mlx5e: E-Switch, Allow get vport api if esw exists

We could have an esw manager device which is not a vport group manager.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c97c9fe4 02-Apr-2023 Roi Dayan <roid@nvidia.com>

net/mlx5e: E-Switch, Update when to set other vport context

Other vport context should be set if vport number is not 0.
In case of ECPF, vport 0 represents the host PF representor so also
need to set other vport context.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8ca52ada 21-Mar-2023 Roi Dayan <roid@nvidia.com>

net/mlx5: E-Switch, Remove redundant dev arg from mlx5_esw_vport_alloc()

The passded esw->dev is redundant as esw being passed and esw->dev being
used inside.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 0a431418 20-Mar-2023 Maher Sanalla <msanalla@nvidia.com>

Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports"

This reverts commit 606e6a72e29dff9e3341c4cc9b554420e4793f401 which exposes
the vnic diagnostic counters via debugfs. Instead, The upcoming series will
expose the same counters through devlink health reporter.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f52cc627 13-Apr-2023 Jakub Kicinski <kuba@kernel.org>

Revert "net/mlx5: Enable management PF initialization"

This reverts commit fe998a3c77b9f989a30a2a01fb00d3729a6d53a4.

Paul reports that it causes a regression with IB on CX4
and FW 12.18.1000. In addition I think that the concept
of "management PF" is not fully accepted and requires
a discussion.

Fixes: fe998a3c77b9 ("net/mlx5: Enable management PF initialization")
Reported-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/all/CAHC9VhQ7A4+msL38WpbOMYjAqLp0EtOjeLh4Dc6SQtD6OUvCQg@mail.gmail.com/
Link: https://lore.kernel.org/r/20230413222547.56901-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 922f56e9 31-Jan-2023 Lama Kayal <lkayal@nvidia.com>

net/mlx5: Fix steering rules cleanup

vport's mc, uc and multicast rules are not deleted in teardown path when
EEH happens. Since the vport's promisc settings(uc, mc and all) in
firmware are reset after EEH, mlx5 driver will try to delete the above
rules in the initialization path. This cause kernel crash because these
software rules are no longer valid.

Fix by nullifying these rules right after delete to avoid accessing any dangling
pointers.

Call Trace:
__list_del_entry_valid+0xcc/0x100 (unreliable)
tree_put_node+0xf4/0x1b0 [mlx5_core]
tree_remove_node+0x30/0x70 [mlx5_core]
mlx5_del_flow_rules+0x14c/0x1f0 [mlx5_core]
esw_apply_vport_rx_mode+0x10c/0x200 [mlx5_core]
esw_update_vport_rx_mode+0xb4/0x180 [mlx5_core]
esw_vport_change_handle_locked+0x1ec/0x230 [mlx5_core]
esw_enable_vport+0x130/0x260 [mlx5_core]
mlx5_eswitch_enable_sriov+0x2a0/0x2f0 [mlx5_core]
mlx5_device_enable_sriov+0x74/0x440 [mlx5_core]
mlx5_load_one+0x114c/0x1550 [mlx5_core]
mlx5_pci_resume+0x68/0xf0 [mlx5_core]
eeh_report_resume+0x1a4/0x230
eeh_pe_dev_traverse+0x98/0x170
eeh_handle_normal_event+0x3e4/0x640
eeh_handle_event+0x4c/0x370
eeh_event_handler+0x14c/0x210
kthread+0x168/0x1b0
ret_from_kernel_thread+0x5c/0x84

Fixes: a35f71f27a61 ("net/mlx5: E-Switch, Implement promiscuous rx modes vf request handling")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d2a651ef 26-Jan-2023 Jiri Pirko <jiri@nvidia.com>

net/mlx5: Move eswitch port metadata devlink param to flow eswitch code

Move the param registration and handling code into the eswitch offloads
code as they are related to each other. No point in having the
devlink param registration done in separate file.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 075935f0 26-Jan-2023 Jiri Pirko <jiri@nvidia.com>

devlink: protect devlink param list by instance lock

Commit 1d18bb1a4ddd ("devlink: allow registering parameters after
the instance") as the subject implies introduced possibility to register
devlink params even for already registered devlink instance. This is a
bit problematic, as the consistency or params list was originally
secured by the fact it is static during devlink lifetime. So in order to
protect the params list, take devlink instance lock during the params
operations. Introduce unlocked function variants and use them in drivers
in locked context. Put lock assertions to appropriate places.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 55b45848 16-Jan-2023 Roi Dayan <roid@nvidia.com>

net/mlx5: E-Switch, Fix typo for egress

Fix engress to egress.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 1158b7d1 17-Nov-2022 Roi Dayan <roid@nvidia.com>

net/mlx5: E-switch, Remove redundant comment about meta rules

Meta rules are created/destroyed per vport and not in eswitch
init/destroy.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# fe998a3c 29-Jun-2022 Shay Drory <shayd@nvidia.com>

net/mlx5: Enable management PF initialization

Enable initialization of DPU Management PF, which is a new loopback PF
designed for communication with BMC.
For now Management PF doesn't support nor require most upper layer
protocols so avoid them.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7c83d1f4 21-Dec-2022 Chris Mi <cmi@nvidia.com>

net/mlx5: E-switch, Fix switchdev mode after devlink reload

The cited commit removes eswitch mode none. So after devlink reload
in switchdev mode, eswitch mode is not changed. But actually eswitch
is disabled during devlink reload.

Fix it by setting eswitch mode to legacy when disabling eswitch
which is called by reload_down.

Fixes: f019679ea5f2 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 1f0ae22a 12-Dec-2022 Moshe Shemesh <moshe@nvidia.com>

net/mlx5: E-Switch, properly handle ingress tagged packets on VST

Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already
present in the frame. Previous VST mode behavior was to drop packets or
override existing tag, depending on the device version.

In this patch we fix this behavior by correctly building the HW steering
rule with a push vlan action, or for older devices we ask the FW to stack
the vlan when a vlan is already present.

Fixes: 07bab9502641 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: dfcb1ed3c331 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e5b9642a 06-Dec-2022 Shay Drory <shayd@nvidia.com>

net/mlx5: E-Switch, Implement devlink port function cmds to control migratable

Implement devlink port function commands to enable / disable migratable.
This is used to control the migratable capability of the device.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 7db98396 06-Dec-2022 Yishai Hadas <yishaih@nvidia.com>

net/mlx5: E-Switch, Implement devlink port function cmds to control RoCE

Implement devlink port function commands to enable / disable RoCE.
This is used to control the RoCE device capabilities.

This patch implement infrastructure which will be used by downstream
patches that will add additional capabilities.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 2318b8bb 17-Nov-2022 Chris Mi <cmi@nvidia.com>

net/mlx5: E-switch, Destroy legacy fdb table when needed

The cited commit removes eswitch mode none. But when disabling
sriov in legacy mode or changing from switchdev to legacy mode
without sriov enabled, the legacy fdb table is not destroyed.

It is not the right behavior. Destroy legacy fdb table in above
two caes.

Fixes: f019679ea5f2 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e12de39c 03-Nov-2022 Chris Mi <cmi@nvidia.com>

net/mlx5: E-switch, Set to legacy mode if failed to change switchdev mode

No need to rollback to the other mode because probably will fail
again. Just set to legacy mode and clear fdb table created flag.
So that fdb table will not be cleared again.

Fixes: f019679ea5f2 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 430e2d5e 18-Jul-2022 Roi Dayan <roid@nvidia.com>

net/mlx5: E-Switch, Move send to vport meta rule creation

Move the creation of the rules from offloads fdb table init to
per rep vport init.
This way the driver will creating the send to vport meta rule
on any representor, e.g. SF representors.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 84a433a4 28-Jul-2022 Moshe Shemesh <moshe@nvidia.com>

net/mlx5: Lock mlx5 devlink reload callbacks

Change devlink instance locks in mlx5 driver to have devlink reload
callbacks locked, while keeping all driver paths which lead to devl_ API
functions called by the driver locked.

Add mlx5_load_one_devl_locked() and mlx5_unload_one_devl_locked() which
are used by the paths which are already locked such as devlink reload
callbacks.

This patch makes the driver use devl_ API also for traps register as
these functions are called from the driver paths parallel to reload that
requires locking now.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 606e6a72 18-May-2022 Michael Guralnik <michaelgur@nvidia.com>

net/mlx5: Expose vnic diagnostic counters for eswitch managed vports

Expose on vport group managers debug counters for their managed vports.

Counters are exposed through debugfs, the directory will be present only
for functions that are eswitch managers and only counters that are
supported on their specific HW/FW will be exposed.

Example:
$ ls /sys/kernel/debug/mlx5/0000:08:00.0/esw/
pf sf_8 vf_0 vf_1

$ ls -l /sys/kernel/debug/mlx5/0000:08:00.0/esw/vf_0/vnic_diag/
cq_overrun
quota_exceeded_command
total_q_under_processor_handle
invalid_command
send_queue_priority_update_flow

List of all counter added:
total_q_under_processor_handle - number of queues in error state due to an
async error or errored command.
send_queue_priority_update_flow - number of QP/SQ priority/SL update
events.
cq_overrun - number of times CQ entered an error state due to an
overflow.
async_eq_overrun -number of time an EQ mapped to async events was
overrun.
comp_eq_overrun - number of time an EQ mapped to completion events was
overrun.
quota_exceeded_command - number of commands issued and failed due to quota
exceeded.
invalid_command - number of commands issued and failed dues to any reason
other than quota exceeded.

Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f1bc646c 11-Jul-2022 Moshe Shemesh <moshe@nvidia.com>

net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register

The function mlx5_esw_offloads_devlink_port_register() calls
devlink_port_register() and devlink_rate_leaf_create(). Use devl_ API to
call devl_port_register() and devl_rate_leaf_create() accordingly and
add devlink instance lock in driver paths to this function.

Similarly, use devl_ API to call devl_port_unregister() and
devl_rate_leaf_destroy() in mlx5_esw_offloads_devlink_port_unregister()
and ensure locking devlink instance lock on the paths to this function
too.

This will be used by the downstream patch to invoke
mlx5_devlink_eswitch_mode_set() with devlink lock held.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 03f9c47d 11-Jul-2022 Moshe Shemesh <moshe@nvidia.com>

net/mlx5: Use devl_ API for rate nodes destroy

Use devl_rate_nodes_destroy() instead of devlink_rate_nodes_destroy().
Add devlink instance lock in the driver paths to this function to have
it locked while calling devl_ API function.

This will be used by the downstream patch to invoke
mlx5_devlink_eswitch_mode_set() with devlink lock held.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# b6f2846a 29-May-2022 Chris Mi <cmi@nvidia.com>

net/mlx5: E-switch: Change eswitch mode only via devlink command

Enable or disable switchdev according to the eswitch mode set by
devlink command. So it is not changed by other functions anymore.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# f019679e 29-May-2022 Chris Mi <cmi@nvidia.com>

net/mlx5: E-switch, Remove dependency between sriov and eswitch mode

Currently, there are three eswitch modes, none, legacy and switchdev.
None is the default mode. Remove redundant none mode as eswitch mode
should always be either legacy mode or switchdev mode.

With this patch, there are two behavior changes:

1. Legacy becomes the default mode. When querying eswitch mode using
devlink, a valid mode is always returned.
2. When disabling sriov, the eswitch mode will not change, only vfs
are unloaded.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# fbd43b72 05-May-2022 Chris Mi <cmi@nvidia.com>

net/mlx5: E-switch, Introduce flag to indicate if fdb table is created

Introduce flag to indicate if fdb table is created as a pre-step
to prepare for removing dependency between sriov and eswitch mode
in the downstream patches.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ea5872dd 10-Feb-2022 Chris Mi <cmi@nvidia.com>

net/mlx5: E-switch, Introduce flag to indicate if vport acl namespace is created

Eswitch vport acl namespace is needed when loading vfs. There is
no need to free and reallocate it when switching eswitch mode.
Introduce flag to indicate if it is created or not. When needed,
create it. Only free it when the driver is unloaded or in bare
metal mode.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8e755f7a 30-May-2022 Dan Carpenter <dan.carpenter@oracle.com>

net/mlx5: delete dead code in mlx5_esw_unlock()

Smatch complains about this function:

drivers/net/ethernet/mellanox/mlx5/core/eswitch.c:2000 mlx5_esw_unlock()
warn: inconsistent returns '&esw->mode_lock'.

Before commit ec2fa47d7b98 ("net/mlx5: Lag, use lag lock") there
used to be a matching mlx5_esw_lock() function and the lock and
unlock functions were symmetric. But now we take the lock
unconditionally and must unlock unconditionally as well.

As near as I can tell this is dead code and can just be deleted.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ec2fa47d 14-Dec-2021 Mark Bloch <mbloch@nvidia.com>

net/mlx5: Lag, use lag lock

Use a lag specific lock instead of depending on external locks to
synchronise the lag creation/destruction.

With this, taking E-Switch mode lock is no longer needed for syncing
lag logic.

Cleanup any dead code that is left over and don't export functions that
aren't used outside the E-Switch core code.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 4202ea95 01-Mar-2022 Mark Bloch <mbloch@nvidia.com>

net/mlx5: Lag, move E-Switch prerequisite check into lag code

There is no need to expose E-Switch function for something that can be
checked with already present API inside lag code.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b5235a99 02-Nov-2021 Shay Drory <shayd@nvidia.com>

net/mlx5: Delete redundant default assignment of runtime devlink params

Runtime devlink params always read their values from the get() callbacks.
Also, it is an error to set driverinit_value for params which don't
support driverinit cmode. Delete such assignments.

In addition, move the set of default matching mode inside eswitch code.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 85c5f7c9 21-Sep-2021 Dmytro Linkin <dlinkin@nvidia.com>

net/mlx5: E-switch, Create QoS on demand

Don't create eswitch QoS (root TSAR) on switch mode change. Create it on
first child TSAR object creation - vport or rate group. Keep track
root TSAR references and release root TSAR with last object deletion.
No need to check for QoS is enabled when installing tc matchall filter.
Remove related helper function due to no users of it.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d7df09f5 21-Sep-2021 Dmytro Linkin <dlinkin@nvidia.com>

net/mlx5: E-switch, Enable vport QoS on demand

Vports' QoS is not commonly used but consume SW/HW resources, which
becomes an issue on BlueField SoC systems.
Don't enable QoS on vports by default on eswitch mode change and enable
when it's going to be used by one of the top level users:
- configuring TC matchall filter with police action;
- setting rate with legacy NDO API;
- calling devlink ops->rate_leaf_*() callbacks.

Disable vport QoS on vport cleanup.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e9d491a6 21-Oct-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-switch, move offloads mode callbacks to offloads file

eswitch.c is mainly for common code between legacy and offloads mode.
MAC address get and set via devlink is applicable only in offloads mode.

Hence, move it to eswitch_offloads.c file.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b22fd438 21-Oct-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-switch, Reuse mlx5_eswitch_set_vport_mac

mlx5_eswitch_set_vport_mac() routine already does necessary checks which
are duplicated in implementation of
mlx5_devlink_port_function_hw_addr_set().

Hence, reuse mlx5_eswitch_set_vport_mac() and cut down the code.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# fcf8ec54 19-Oct-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-switch, Remove vport enabled check

An eswitch vport of the devlink port is always enabled before a
devlink port is registered. And a eswitch vport is always disabled
after a devlink port is unregistered.
Hence avoid the vport enabled check in the devlink callback routine.
Such check is only applicable in the legacy SR-IOV callbacks.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Sunil Sudhakar Rani <sunrani@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2eb0cb31 10-Nov-2021 Mark Bloch <mbloch@nvidia.com>

net/mlx5: E-Switch, rebuild lag only when needed

A user can enable VFs without changing E-Switch mode, this can happen
when a user moves straight to switchdev mode and only once in switchdev
VFs are enabled via the sysfs interface.

The cited commit assumed this isn't possible and exposed a single
API function where the E-switch calls into the lag code, breaks the lag
and prevents any other lag operations to take place until the
E-switch update has ended.

Breaking the hardware lag when it isn't needed can make it such that
hardware lag can't be enabled again.

In the sysfs call path check if the current E-Switch mode is NONE,
in the context of the function it can only mean the E-Switch is moving
out of NONE mode and the hardware lag should be disabled and enabled
once the mode change has ended. If the mode isn't NONE it means
VFs are about to be enabled and such operation doesn't require
toggling the hardware lag.

Fixes: cac1eb2cf2e3 ("net/mlx5: Lag, properly lock eswitch if needed")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d7751d64 20-May-2021 Paul Blakey <paulb@nvidia.com>

net/mlx5: E-Switch, Fix resetting of encap mode when entering switchdev

E-Switch encap mode is relevant only when in switchdev mode.
The RDMA driver can query the encap configuration via
mlx5_eswitch_get_encap_mode(). Make sure it returns the currently
used mode and not the set one.

This reverts the cited commit which reset the encap mode
on entering switchdev and fixes the original issue properly.

Fixes: 9a64144d683a ("net/mlx5: E-Switch, Fix default encap mode")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2d116e3e 28-May-2021 Dmytro Linkin <dlinkin@nvidia.com>

net/mlx5: E-switch, Move QoS related code to dedicated file

Move eswitch QoS related code into dedicated file. Provide eswitch API
to access this code meaning it is isolated and restricted to be used
only by eswitch.c. Exception is legacy NDO vf set rate, which moved to
esw/legacy.c.

Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>


# 39c538d6 29-Jul-2021 Cai Huoqing <caihuoqing@baidu.com>

net/mlx5: Fix typo in comments

Fix typo:
*vectores ==> vectors
*realeased ==> released
*erros ==> errors
*namepsace ==> namespace
*trafic ==> traffic
*proccessed ==> processed
*retore ==> restore
*Currenlty ==> Currently
*crated ==> created
*chane ==> change
*cannnot ==> cannot
*usuallly ==> usually
*failes ==> fails
*importent ==> important
*reenabled ==> re-enabled
*alocation ==> allocation
*recived ==> received
*tanslation ==> translation

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 82564f6c 08-Aug-2021 Leon Romanovsky <leon@kernel.org>

devlink: Simplify devlink port API calls

Devlink port already has pointer to the devlink instance and all API
calls that forward these devlink ports to the drivers perform same
"devlink_port->devlink" assignment before actual call.

This patch removes useless parameter and allows us in the future
to create specific devlink_port_ops to manage user space access with
reliable ops assignment.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cac1eb2c 03-Aug-2021 Mark Bloch <mbloch@nvidia.com>

net/mlx5: Lag, properly lock eswitch if needed

Currently when doing hardware lag we check the eswitch mode
but as this isn't done under a lock the check isn't valid.

As the code needs to sync between two different devices an extra
care is needed.

- When going to change eswitch mode, if hardware lag is active destroy it.
- While changing eswitch modes block any hardware bond creation.
- Delay handling bonding events until there are no mode changes in
progress.
- When attaching a new mdev to lag, block until there is no mode change
in progress. In order for the mode change to finish the interface lock
will have to be taken. Release the lock and sleep for 100ms to
allow forward progress. As this is a very rare condition (can happen if
the user unbinds and binds a PCI function while also changing eswitch
mode of the other PCI function) it has no real world impact.

As taking multiple eswitch mode locks is now required lockdep will
complain about a possible deadlock. Register a key per eswitch to make
lockdep happy.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 97a8a8c1 03-Aug-2021 Mark Bloch <mbloch@nvidia.com>

net/mlx5: Return mdev from eswitch

Export a function so users can retrieve the mellanox device that manages
the eswitch from the eswitch device.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# bbc8222d 08-Jun-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-Switch, Read PF mac address

External controller PF's MAC address is not read from the device during
vport setup. Fail to read this results in showing all zeros to user
while the factory programmed MAC is a valid value.

$ devlink port show eth1 -jp
{
"port": {
"pci/0000:03:00.0/196608": {
"type": "eth",
"netdev": "eth1",
"flavour": "pcipf",
"controller": 1,
"pfnum": 0,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:00:00"
}
}
}
}

Hence, read it when enabling a vport.

After the fix,

$ devlink port show eth1 -jp
{
"port": {
"pci/0000:03:00.0/196608": {
"type": "eth",
"netdev": "eth1",
"flavour": "pcipf",
"controller": 1,
"pfnum": 0,
"splittable": false,
"function": {
"hw_addr": "98:03:9b:a0:60:11"
}
}
}
}

Fixes: f099fde16db3 ("net/mlx5: E-switch, Support querying port function mac address")
Signed-off-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7c9f131f 22-Apr-2021 Eli Cohen <elic@nvidia.com>

{net,vdpa}/mlx5: Configure interface MAC into mpfs L2 table

net/mlx5: Expose MPFS configuration API

MPFS is the multi physical function switch that bridges traffic between
the physical port and any physical functions associated with it. The
driver is required to add or remove MAC entries to properly forward
incoming traffic to the correct physical function.

We export the API to control MPFS so that other drivers, such as
mlx5_vdpa are able to add MAC addresses of their network interfaces.

The MAC address of the vdpa interface must be configured into the MPFS L2
address. Failing to do so could cause, in some NIC configurations, failure
to forward packets to the vdpa network device instance.

Fix this by adding calls to update the MPFS table.

CC: <mst@redhat.com>
CC: <jasowang@redhat.com>
CC: <virtualization@lists.linux-foundation.org>
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 87bd418e 02-Mar-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-Switch, Consider SF ports of host PF

Query SF vports count and base id of host PF from the firmware.

Account these ports in the total port calculation whenever it is non
zero.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 47dd7e60 18-Mar-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-Switch, Use xarray for vport number to vport and rep mapping

Currently vport number to vport and its representor are mapped using an
array and an index.

Vport numbers of different types of functions are not contiguous. Adding
new such discontiguous range using index and number mapping is increasingly
complex and hard to maintain.

Hence, maintain an xarray of vport and rep whose lookup is done based on
the vport number.
Each VF and SF entry is marked with a xarray mark to identify the function
type. Additionally PF and VF needs special handling for legacy inline
mode. They are additionally marked as host function using additional
HOST_FN mark.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9f8c7100 02-Mar-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-Switch, Prepare to return total vports from eswitch struct

Total vports are already stored during eswitch initialization. Instead
of calculating everytime, read directly from eswitch.

Additionally, host PF's SF vport information is available using
QUERY_HCA_CAP command. It is not available through HCA_CAP of the
eswitch manager PF.
Hence, this patch prepares the return total eswitch vport count from the
existing eswitch struct.

This further helps to keep eswitch port counting macros and logic within
eswitch.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 06ec5acc 02-Mar-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-Switch, Return eswitch max ports when eswitch is supported

mlx5_eswitch_get_total_vports() doesn't honor MLX5_ESWICH Kconfig flag.

When MLX5_ESWITCH is disabled, FS layer continues to initialize eswitch
specific ACL namespaces.
Instead, start honoring MLX5_ESWITCH flag and perform vport specific
initialization only when vport count is non zero.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 57b92bdd 01-Mar-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-Switch, Initialize eswitch acls ns when eswitch is enabled

Currently eswitch flow steering (FS) namespace of vport's ingress and
egress ACL are enabled when FS layer is initialized. This is done even
when eswitch is diabled. This demands that total eswitch ports to be
known to FS layer without eswitch in use.

Given the FS core is not dependent on eswitch, make namespace init and
cleanup routines as helper routines to be invoked only when eswitch is
needed.

With this change, ingress and egress ACL namespaces are created only
when eswitch legacy/offloads mode is enabled.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b55b3538 26-Feb-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-Switch, Move legacy code to a individual file

Currently eswitch offers two modes. Legacy and offloads.
Offloads code is already in its own file eswitch_offloads.c

However eswitch.c contains the eswitch legacy code and common
infrastructure code.

To enable future extensions and to better manage generic common eswitch
infrastructure code, move the legacy code to its own legacy.c file.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b16f2bb6 26-Feb-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-Switch, Convert a macro to a helper routine

Convert ESW_ALLOWED macro to a helper routine so that it can be used in
other eswitch files.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 13795553 02-Mar-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-Switch Make cleanup sequence mirror of init

Make cleanup sequence mirror of init sequence for cleaning up reps
and freeing vports.

Also when reps initialization fails, there is no need to perform reps
cleanup.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7d5ae478 08-Mar-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-Switch, Skip querying SF enabled bits

With vhca events, SF state is queried through the VHCA events. Device no
longer expects SF bitmap in the query eswitch functions command.

Hence, remove it to simplify the code.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# e591605f 03-Feb-2021 Parav Pandit <parav@nvidia.com>

net/mlx5: E-Switch, move QoS specific fields to existing qos struct

Function QoS related fields are already defined in qos related struct.
min and max rate are left out to mlx5_vport_info struct.

Move them to existing qos struct.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7dc84de9 16-Sep-2020 Roi Dayan <roid@nvidia.com>

net/mlx5: E-Switch, Protect changing mode while adding rules

We re-use the native NIC port net device instance for the Uplink
representor, a driver currently cannot unbind TC setup callback
actively, hence protect changing E-Switch mode while adding rules.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c55479d0 16-Sep-2020 Roi Dayan <roid@nvidia.com>

net/mlx5: E-Switch, Change mode lock from mutex to rw semaphore

E-Switch mode change routine will take the write lock to prevent any
consumer to access the E-Switch resources while E-Switch is going
through a mode change.

In the next patch
E-Switch consumers (e.g vport representors) will take read_lock prior to
accessing E-Switch resources to prevent E-Switch mode changing in the
middle of the operation.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 1aa48ca6 16-Sep-2020 Roi Dayan <roid@nvidia.com>

net/mlx5e: Allow legacy vf ndos only if in legacy mode

We will re-use the native NIC port net device instance for the Uplink
representor. Several VF ndo ops are not relevant in switchdev mode.
Disallow them when eswitch mode is not legacy as a preparation.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7bef147a 03-Dec-2020 Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Don't skip vport check

Users of mlx5_eswitch_get_vport() are required to check return value
prior to passing mlx5_vport further. Fix all the places to do not skip
that check.

Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8914add2 25-Jan-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Handle FIB events to update tunnel endpoint device

Process FIB route update events to dynamically update the stack device
rules when tunnel routing changes. Use rtnl lock to prevent FIB event
handler from running concurrently with neigh update and neigh stats
workqueue tasks. Use encap_tbl_lock mutex to synchronize with TC rule
update path that doesn't use rtnl lock.

FIB event workflow for encap flows:

- Unoffload all flows attached to route encaps from slow or fast path
depending on encap destination endpoint neigh state.

- Update encap IP header according to new route dev.

- Update flows mod_hdr action that is responsible for overwriting reg_c0
source port bits to source port of new underlying VF of new route dev. This
step requires changing flow create/delete code to save flow parse attribute
mod_hdr_acts structure for whole flow lifetime instead of deallocating it
after flow creation. Refactor mod_hdr code to allow saving id of individual
mod_hdr actions and updating them with dedicated helper.

- Offload all flows to either slow or fast path depending on encap
destination endpoint neigh state.

FIB event workflow for decap flows:

- Unoffload all route flows from hardware. When last route flow is deleted
all indirect table rules for the route dev will also be deleted.

- Update flow attr decap_vport and destination MAC according to underlying
VF of new rote dev.

- Offload all route flows back to hardware creating new indirect table
rules according to updated flow attribute data.

Extract some neigh update code to helper functions to be used by both neigh
update and route update infrastructure.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 777bb800 21-Sep-2020 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: Create route entry infrastructure

Implement dedicated route entry infrastructure to be used in following
patch by route update event. Both encap (indirectly through their
corresponding encap entries) and decap (directly) flows are attached to
routing entry. Since route update also requires updating encap (route
device MAC address is a source MAC address of tunnel encapsulation), same
encap_tbl_lock mutex is used for synchronization.

The new infrastructure looks similar to existing infrastructures for shared
encap, mod_hdr and hairpin entries:

- Per-eswitch hash table is used for quick entry lookup.

- Flows are attached to per-entry linked list and hold reference to entry
during their lifetime.

- Atomic reference counting and rcu mechanisms are used as synchronization
primitives for concurrent access.

The infrastructure also enables connection tracking on stacked devices
topology by attaching CT chain 0 flow on tunneling dev to decap route
entry.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 84ae9c1f 23-Sep-2020 Vlad Buslov <vladbu@nvidia.com>

net/mlx5e: E-Switch, Maintain vhca_id to vport_num mapping

Following patches in the series need to be able to map VF netdev to vport.
Since it is trivial to obtain vhca_id from netdev, maintain mapping from
vhca_id to vport_num inside eswitch offloads using xarray. Provide function
mlx5_eswitch_vhca_id_to_vport() to be used by TC code in following patches
to obtain the mapping.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 8f010541 11-Dec-2020 Parav Pandit <parav@nvidia.com>

net/mlx5: SF, Add port add delete functionality

To handle SF port management outside of the eswitch as independent
software layer, introduce eswitch notifier APIs so that mlx5 upper
layer who wish to support sf port management in switchdev mode can
perform its task whenever eswitch mode is set to switchdev or before
eswitch is disabled.

Initialize sf port table on such eswitch event.

Add SF port add and delete functionality in switchdev mode.
Destroy all SF ports when eswitch is disabled.
Expose SF port add and delete to user via devlink commands.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached

or by its unique port index:
$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show ens2f0npf0sf88 -jp
{
"port": {
"pci/0000:06:00.0/32768": {
"type": "eth",
"netdev": "ens2f0npf0sf88",
"flavour": "pcisf",
"controller": 0,
"pfnum": 0,
"sfnum": 88,
"external": false,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:00:00",
"state": "inactive",
"opstate": "detached"
}
}
}
}

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d970812b 11-Dec-2020 Parav Pandit <parav@nvidia.com>

net/mlx5: E-switch, Add eswitch helpers for SF vport

Add helpers to enable/disable eswitch port, register its devlink port and
load its representor.

Signed-off-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d7f33a45 11-Dec-2020 Vu Pham <vuhuong@nvidia.com>

net/mlx5: E-switch, Prepare eswitch to handle SF vport

Prepare eswitch to handle SF vport during
(a) querying eswitch functions
(b) egress ACL creation
(c) account for SF vports in total vports calculation

Assign a dedicated placeholder for SFs vports and their representors.
They are placed after VFs vports and before ECPF vports as below:
[PF,VF0,...,VFn,SF0,...SFm,ECPF,UPLINK].

Change functions to map SF's vport numbers to indices when
accessing the vports or representors arrays, and vice versa.

Signed-off-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 21bad8da 12-Jan-2021 Eli Cohen <elic@nvidia.com>

net/mlx5e: Simplify condition on esw_vport_enable_qos()

esw->qos.enabled will only be true if both MLX5_CAP_GEN(dev, qos) and
MLX5_CAP_QOS(dev, esw_scheduling) are true. Therefore, remove them from
the condition in and rely only on esw->qos.enabled.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# e8711402 10-Oct-2020 Leon Romanovsky <leon@kernel.org>

net/mlx5: Simplify eswitch mode check

Provide mlx5_core device instead of "priv" pointer while checking
eswith mode.

Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>


# 93f82444 03-Oct-2020 Leon Romanovsky <leon@kernel.org>

RDMA/mlx5: Convert mlx5_ib to use auxiliary bus

The conversion to auxiliary bus solves long standing issue with
existing mlx5_ib<->mlx5_core coupling. It required to have both
modules in initramfs if one of them needed for the boot.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>


# 912cebf4 04-Oct-2020 Leon Romanovsky <leon@kernel.org>

net/mlx5e: Connect ethernet part to auxiliary bus

Reuse auxiliary bus to perform device management of the
ethernet part of the mlx5 driver.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>


# 5bef709d 20-Nov-2020 Parav Pandit <parav@nvidia.com>

net/mlx5: Enable host PF HCA after eswitch is initialized

Currently ECPF enables external host PF too early in the initialization
sequence for Ethernet links when ECPF is eswitch manager.

Due to this, when external host PF driver is loaded, host PF's HCA CAP has
inner_ip_version supported by NIC RX flow table.
This capability is later updated by firmware after ECPF driver enables
ENCAP/DECAP as eswitch manager.

This results into a timing race condition, where CREATE_TIR command
fails with a below syndrome on host PF.

mlx5_cmd_check:775:(pid 510): CREATE_TIR(0x900) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0x562b00)

Hence, enable the external host PF after necessary eswitch and per vport
initialization is completed.
Continue to enable host PF when eswitch manager capability is off for a
ECPF.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 5b8631c7 09-Nov-2020 Eli Cohen <elic@nvidia.com>

net/mlx5: E-Switch, Fail mlx5_esw_modify_vport_rate if qos disabled

Avoid calling mlx5_esw_modify_vport_rate() if qos is not enabled and
avoid unnecessary syndrome messages from firmware.

Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 470b7475 21-Oct-2020 Vladyslav Tarasiuk <vladyslavt@nvidia.com>

net/mlx5: Disable QoS when min_rates on all VFs are zero

Currently when QoS is enabled for VF and any min_rate is configured,
the driver sets bw_share value to at least 1 and doesn’t allow to set
it to 0 to make minimal rate unlimited. It means there is always a
minimal rate configured for every VF, even if user tries to remove it.

In order to make QoS disable possible, check whether all vports have
configured min_rate = 0. If this is true, set their bw_share to 0 to
disable min_rate limitations.

Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 1ce5fc72 02-Nov-2020 Vladyslav Tarasiuk <vladyslavt@nvidia.com>

net/mlx5: Clear bw_share upon VF disable

Currently, if user disables VFs with some min and max rates configured,
they are cleared. But QoS data is not cleared and restored upon next VF
enable placing limits on minimal rate for given VF, when user expects
none.

To match cleared vport->info struct with QoS-related min and max rates
upon VF disable, clear vport->qos struct too.

Fixes: 556b9d16d3f5 ("net/mlx5: Clear VF's configuration on disabling SRIOV")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ae358594 01-Nov-2020 Parav Pandit <parav@nvidia.com>

net/mlx5: E-switch, Avoid extack error log for disabled vport

When E-switch vport is disabled, querying its hardware address is
unsupported.
Avoid setting extack error log message in such case.

Fixes: f099fde16db3 ("net/mlx5: E-switch, Support querying port function mac address")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 350a6324 15-Jul-2020 Alaa Hleihel <alaa@mellanox.com>

net/mlx5e: Fix kernel crash when setting vf VLANID on a VF dev

After the cited commit, function 'mlx5_eswitch_set_vport_vlan' started
to acquire esw->state_lock.
However, esw is not defined for VF devices, hence attempting to set vf
VLANID on a VF dev will cause a kernel panic.

Fix it by moving up the (redundant) esw validation from function
'__mlx5_eswitch_set_vport_vlan' since the rest of the callers now have
and use a valid esw.

For example with vf device eth4:
# ip link set dev eth4 vf 0 vlan 0

Trace of the panic:
[ 411.409842] BUG: unable to handle page fault for address: 00000000000011b8
[ 411.449745] #PF: supervisor read access in kernel mode
[ 411.452348] #PF: error_code(0x0000) - not-present page
[ 411.454938] PGD 80000004189c9067 P4D 80000004189c9067 PUD 41899a067 PMD 0
[ 411.458382] Oops: 0000 [#1] SMP PTI
[ 411.460268] CPU: 4 PID: 5711 Comm: ip Not tainted 5.8.0-rc4_for_upstream_min_debug_2020_07_08_22_04 #1
[ 411.462447] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 411.464158] RIP: 0010:__mutex_lock+0x4e/0x940
[ 411.464928] Code: fd 41 54 49 89 f4 41 52 53 89 d3 48 83 ec 70 44 8b 1d ee 03 b0 01 65 48 8b 04 25 28 00 00 00 48 89 45 c8 31 c0 45 85 db 75 0a <48> 3b 7f 60 0f 85 7e 05 00 00 49 8d 45 68 41 56 41 b8 01 00 00 00
[ 411.467678] RSP: 0018:ffff88841fcd74b0 EFLAGS: 00010246
[ 411.468562] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 411.469715] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000001158
[ 411.470812] RBP: ffff88841fcd7550 R08: ffffffffa00fa1ce R09: 0000000000000000
[ 411.471835] R10: ffff88841fcd7570 R11: 0000000000000000 R12: 0000000000000002
[ 411.472862] R13: 0000000000001158 R14: ffffffffa00fa1ce R15: 0000000000000000
[ 411.474004] FS: 00007faee7ca6b80(0000) GS:ffff88846fc00000(0000) knlGS:0000000000000000
[ 411.475237] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 411.476129] CR2: 00000000000011b8 CR3: 000000041909c006 CR4: 0000000000360ea0
[ 411.477260] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 411.478340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 411.479332] Call Trace:
[ 411.479760] ? __nla_validate_parse.part.6+0x57/0x8f0
[ 411.482825] ? mlx5_eswitch_set_vport_vlan+0x3e/0xa0 [mlx5_core]
[ 411.483804] mlx5_eswitch_set_vport_vlan+0x3e/0xa0 [mlx5_core]
[ 411.484733] mlx5e_set_vf_vlan+0x41/0x50 [mlx5_core]
[ 411.485545] do_setlink+0x613/0x1000
[ 411.486165] __rtnl_newlink+0x53d/0x8c0
[ 411.486791] ? mark_held_locks+0x49/0x70
[ 411.487429] ? __lock_acquire+0x8fe/0x1eb0
[ 411.488085] ? rcu_read_lock_sched_held+0x52/0x60
[ 411.488998] ? kmem_cache_alloc_trace+0x16d/0x2d0
[ 411.489759] rtnl_newlink+0x47/0x70
[ 411.490357] rtnetlink_rcv_msg+0x24e/0x450
[ 411.490978] ? netlink_deliver_tap+0x92/0x3d0
[ 411.491631] ? validate_linkmsg+0x330/0x330
[ 411.492262] netlink_rcv_skb+0x47/0x110
[ 411.492852] netlink_unicast+0x1ac/0x270
[ 411.493551] netlink_sendmsg+0x336/0x450
[ 411.494209] sock_sendmsg+0x30/0x40
[ 411.494779] ____sys_sendmsg+0x1dd/0x1f0
[ 411.495378] ? copy_msghdr_from_user+0x5c/0x90
[ 411.496082] ___sys_sendmsg+0x87/0xd0
[ 411.496683] ? lock_acquire+0xb9/0x3a0
[ 411.497322] ? lru_cache_add+0x5/0x170
[ 411.497944] ? find_held_lock+0x2d/0x90
[ 411.498568] ? handle_mm_fault+0xe46/0x18c0
[ 411.499205] ? __sys_sendmsg+0x51/0x90
[ 411.499784] __sys_sendmsg+0x51/0x90
[ 411.500341] do_syscall_64+0x59/0x2e0
[ 411.500938] ? asm_exc_page_fault+0x8/0x30
[ 411.501609] ? rcu_read_lock_sched_held+0x52/0x60
[ 411.502350] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 411.503093] RIP: 0033:0x7faee73b85a7
[ 411.503654] Code: Bad RIP value.

Fixes: 0e18134f4f9f ("net/mlx5e: Eswitch, use state_lock to synchronize vlan change")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7d0314b1 05-Apr-2020 Ron Diskin <rondi@mellanox.com>

net/mlx5e: Modify uplink state on interface up/down

When setting the PF interface up/down, notify the firmware to update
uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled.

This behavior will prevent sending traffic out on uplink port when PF is
down, such as sending traffic from a VF interface which is still up.
Currently when calling mlx5e_open/close(), the driver only sends PAOS
command to notify the firmware to set the physical port state to
up/down, however, it is not sufficient. When VF is in "auto" state, it
follows the uplink state, which was not updated on mlx5e_open/close()
before this patch.

When switchdev mode is enabled and uplink representor is first enabled,
set the uplink port state value back to its FW default "AUTO".

Fixes: 63bfd399de55 ("net/mlx5e: Send PAOS command on interface up/down")
Signed-off-by: Ron Diskin <rondi@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 0c2600c6 27-Jun-2020 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Destroy TSAR after reload interface

When eswitch offloads is enabled, TSAR is created before reloading
the interfaces.
However when eswitch offloads mode is disabled, TSAR is disabled before
reloading the interfaces.

To keep the eswitch enable/disable sequence as mirror, destroy TSAR
after reloading the interfaces.

Fixes: 1bd27b11c1df ("net/mlx5: Introduce E-switch QoS management")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2b8e9c7c 27-Jun-2020 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Destroy TSAR when fail to enable the mode

When either esw_legacy_enable() or esw_offloads_enable() fails,
code missed to destroy the created TSAR.

Hence, add the missing call to destroy the TSAR.

Fixes: 610090ebce92 ("net/mlx5: E-switch, Initialize TSAR Qos hardware block before its user vports")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ea2128fd 26-Jun-2020 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Reduce dependency on num_vfs during mode set

Currently only ECPF allows enabling eswitch when SR-IOV is disabled.

Enable PF also to enable eswitch when SR-IOV is disabled.
Load VF vports when eswitch is already enabled.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b2fdf3d0 17-Mar-2020 Paul Blakey <paulb@mellanox.com>

net/mlx5e: Export sharing of mod headers to a new file

Refactor sharing of mod headers to new file and while there,
remove spin lock and flows list, as this is only used for warn on.

Use the generic API in the next patch to re-use tuple modify headers
for identical modify actions,

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e9716afd 24-Jun-2020 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, When eswitch is unsupported, return -EOPNOTSUPP

When eswitch is unsupported, currently -EPERM error code is returned
instead of -EOPNOTSUPP.

Due to this VF device's devlink virtual port is not enumerated because
port_function_get() callback returned -EPERM instead of -EOPNOTSUPP.

Hence, return the error code -EOPNOTSUPP when eswitch is unsupported.

Fixes: bd93975353d5 ("net/mlx5: E-switch, Introduce and use eswitch support check helper")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 330077d1 18-Jun-2020 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Supporting setting devlink port function mac address

Enable user to set mac address of the PCI PF and VF port function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1094795c 18-Jun-2020 Parav Pandit <parav@mellanox.com>

net/mlx5: Split mac address setting function for using state_lock

Refactor mac address setting function to let caller hold the necessary
state_lock mutex, so that subsequent patch and use this helper routine.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f099fde1 18-Jun-2020 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Support querying port function mac address

Support querying mac address of the eswitch devlink port function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bd939753 18-Jun-2020 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Introduce and use eswitch support check helper

Introduce an helper routine to get esw from a devlink device and use it
at eswitch callbacks and in subsequent patch.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fa997825 18-Jun-2020 Parav Pandit <parav@mellanox.com>

net/mlx5: Constify mac address pointer

Since none of the functions need to modify the input mac address,
constify them.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 133dcfc5 28-Feb-2020 Vu Pham <vuhuong@mellanox.com>

net/mlx5: E-Switch, Alloc and free unique metadata for match

Introduce infrastructure to create unique metadata for match
for vport without depending on vport_num. Vport uses its
default metadata for match in standalone configuration but
will share a different unique "bond_metadata" for match with
other vports in bond configuration.

Using ida to generate unique metadata for match for vports
in default and bond configurations.

Introduce APIs to generate, free metadata for match.
Introduce APIs to set vport's bond_metadata and replace its
ingress acl rules with bond_metatada.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 07bab950 28-Mar-2020 Vu Pham <vuhuong@mellanox.com>

net/mlx5: E-Switch, Refactor eswitch ingress acl codes

Restructure the eswitch ingress acl codes into eswitch directory
and different files:
. Acl ingress helper functions to acl_helper.c/h
. Acl ingress functions used in offloads mode to acl_ingress_ofld.c
. Acl ingress functions used in legacy mode to acl_ingress_lgy.c

This patch does not change any functionality.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>


# ea651a86 06-Nov-2019 Vu Pham <vuhuong@mellanox.com>

net/mlx5: E-Switch, Refactor eswitch egress acl codes

Refactor the egress acl codes so that offloads and legacy modes
can configure specifically their own needs of egress acl table,
groups and rules. While at it, restructure the eswitch egress
acl codes into eswitch directory and different files:
. Acl egress helper functions to acl_helper.c/h
. Acl egress functions used in offloads mode to acl_egress_ofld.c
. Acl egress functions used in legacy mode to acl_egress_lgy.c

This patch does not change any functionality.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 14e6b038 03-Feb-2020 Eli Cohen <eli@mellanox.com>

net/mlx5e: Add support for hw decapsulation of MPLS over UDP

MPLS over UDP is supported in hardware by using a packet reformat object
with reformat type equal L3_TUNNEL_TO_L2 which both decapsulates the
outer L3, L4 and MPLS headers, and allows for setting the L2 headers of
the resulting decapsulated packet. For the hardware to operate
correctly, the configuration of the firmware must have
FLEX_PARSER_PROFILE_ENABLE = 1.

Example tc rule:
tc filter add dev bareudp0 protocol all prio 1 root flower enc_dst_port \
6635 enc_src_ip 8.8.8.23 action mpls pop protocol ip pipe \
action pedit ex munge eth dst set 00:11:22:33:44:21 pipe action \
mirred egress redirect dev enp59s0f0_0

We use pedit to set the correct destination MAC.

For MPLS over UDP decapsulation to take place, the driver logic requires
the following:

1. flower filter added on bareudp device.
2. action mpls pop
3. zero or more pedit munge actions
4. one redirect action

Current implementation supports only IPv4 and no VLAN.

tc filter show output looks like this:
filter protocol all pref 1 flower chain 0
filter protocol all pref 1 flower chain 0 handle 0x1
enc_src_ip 8.8.8.24
enc_dst_port 6635
in_hw in_hw_count 1
action order 1: mpls pop protocol ip pipe
index 2 ref 1 bind 1

action order 2: pedit action pipe keys 2
index 1 ref 1 bind 1
key #0 at eth+0: val 00112233 mask 00000000
key #1 at eth+4: val 44210000 mask 0000ffff

action order 3: mirred (Egress Redirect to device enp59s0f0_0) stolen
index 2 ref 1 bind 1

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e08a6832 08-Apr-2020 Leon Romanovsky <leon@kernel.org>

net/mlx5: Update eswitch to new cmd interface

Do mass update of eswitch to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 8e0aa4bc 18-Dec-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Protect eswitch mode changes

Currently eswitch mode change is occurring from 2 different execution
contexts as below.
1. sriov sysfs enable/disable
2. devlink eswitch set commands

Both of them need to access eswitch related data structures in
synchronized manner.
Without any synchronization below race condition exist.

SR-IOV enable/disable with devlink eswitch mode change:

cpu-0 cpu-1
----- -----
mlx5_device_disable_sriov() mlx5_devlink_eswitch_mode_set()
mlx5_eswitch_disable() esw_offloads_stop()
esw_offloads_disable() mlx5_eswitch_disable()
esw_offloads_disable()

Hence, they are synchronized using a new mode_lock.
eswitch's state_lock is not used as it can lead to a deadlock scenario
below and state_lock is only for vport and fdb exclusive access.

ip link set vf <param>
netlink rcv_msg() - Lock A
rtnl_lock
vfinfo()
esw->state_lock() - Lock B
devlink eswitch_set
devlink_mutex
esw->state_lock() - Lock B
attach_netdev()
register_netdev()
rtnl_lock - Lock A

Alternatives considered:
1. Acquiring rtnl lock before taking esw->state_lock to follow similar
locking sequence as ip link flow during eswitch mode set.
rtnl lock is not good idea for two reasons.
(a) Holding rtnl lock for several hundred device commands is not good
idea.
(b) It leads to below and more similar deadlocks.

devlink eswitch_set
devlink_mutex
rtnl_lock - Lock A
esw->state_lock() - Lock B
eswitch_disable()
reload()
ib_register_device()
ib_cache_setup_one()
rtnl_lock()

2. Exporting devlink lock may lead to undesired use of it in vendor
driver(s) in future.

3. Unloading representors outside of the mode_lock requires
serialization with other process trying to enable the eswitch.

4. Differing the representors life cycle to a different workqueue
requires synchronization with func_change_handler workqueue.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ebf77bb8 18-Dec-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Extend eswitch enable to handle num_vfs change

Subsequent patch protects eswitch mode changes across sriov and devlink
interfaces. It is desirable for eswitch to provide thread safe eswitch
enable and disable APIs.
Hence, extend eswitch enable API to optionally update num_vfs when
requested.

In subsequent patch, eswitch num_vfs are updated after all the eswitch
users eswitch drops its reference count.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d6c8022d 17-Dec-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Annotate esw state_lock mutex destroy

Invoke mutex_destroy() to catch any esw state_lock errors.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 5c2aa8ae 17-Jan-2020 Mark Bloch <markb@mellanox.com>

net/mlx5: Accept flow rules without match

Allow passing NULL spec when creating a flow rule. Such rules will act
as "catch all" flow rules.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 23bb50cf 12-Nov-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Update VF vports config when num of VFs changed

Currently, ECPF eswitch manager does one-time only configuration for
VF vports when device switches to offloads mode. However, when num of
VFs changed from host side, driver doesn't update VF vports
configurations.

Hence, perform VFs vport configuration update whenever num_vfs change
event occurs.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c2d7712c 11-Nov-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Introduce per vport configuration for eswitch modes

Both legacy and offload modes require vport setup, only offload mode
requires rep setup. Before this patch, vport and rep operations are
separated applied to all relevant vports in different stages.

Change to use per vport configuration, so that vport and rep operations
are modularized per vport.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d7c92cb5 16-Oct-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-switch, Make vport setup/cleanup sequence symmetric

Vport enable and disable sequence is incorrect. It should be:
enable()
esw_vport_setup_acl,
esw_vport_setup,
esw_vport_enable_qos.

disable()
esw_vport_disable_qos,
esw_vport_cleanup,
esw_vport_cleanup_acl.

Instead of having two setup functions for port and acl, merge
acl setup to port setup function.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 878a7331 30-Aug-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Prepare for vport enable/disable refactor

Rename esw_apply_vport_config() to esw_vport_setup(), and add new
helper function esw_vport_cleanup() to make them symmetric.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a9814d7f 17-Oct-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Remove redundant warning when QoS enable failed

esw_vport_enable_qos can return error in cases below:
1. QoS is already enabled. Warnning is useless in this case.
2. Create scheduling element cmd failed. There is already a warning.

Remove the redundant warnning if esw_vport_enable_qos returns err.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 14c844cb 13-Sep-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Hold mutex when querying drop counter in legacy mode

Consider scenario below, CPU 1 is at risk to query already destroyed
drop counters. Need to apply the same state mutex when disabling vport.

+-------------------------------+-------------------------------------+
| CPU 0 | CPU 1 |
+-------------------------------+-------------------------------------+
| mlx5_device_disable_sriov | mlx5e_get_vf_stats |
| mlx5_eswitch_disable | mlx5_eswitch_get_vport_stats |
| esw_disable_vport | mlx5_eswitch_query_vport_drop_stats |
| mlx5_fc_destroy(drop_counter) | mlx5_fc_query(drop_counter) |
+-------------------------------+-------------------------------------+

Fixes: b8a0dbe3a90b ("net/mlx5e: E-switch, Add steering drop counters")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 86f9453c 02-Jan-2020 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Remove redundant check of eswitch manager cap

esw_vport_create_legacy_acl_tables bails out immediately for eswitch
manager, hence remove all the check of esw manager cap after.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 87dac697 26-Dec-2019 Jianbo Liu <jianbol@mellanox.com>

net/mlx5e: Add devlink fdb_large_groups parameter

Add a devlink parameter to control the number of large groups in a
autogrouped flow table. The default value is 15, and the range is between 1
and 1024.

The size of each large group can be calculated according to the following
formula: size = 4M / (fdb_large_groups + 1).

Examples:
- Set the number of large groups to 20.
$ devlink dev param set pci/0000:82:00.0 name fdb_large_groups \
cmode driverinit value 20

Then run devlink reload command to apply the new value.
$ devlink dev reload pci/0000:82:00.0

- Read the number of large groups in flow table.
$ devlink dev param show pci/0000:82:00.0 name fdb_large_groups
pci/0000:82:00.0:
name fdb_large_groups type driver-specific
values:
cmode driverinit value 20

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 383de108 12-Feb-2020 Dmytro Linkin <dmitrolin@mellanox.com>

net/mlx5e: Don't clear the whole vf config when switching modes

There is no need to reset all vf config (except link state) between
legacy and switchdev modes changes.
Also, set link state to AUTO, when legacy enabled.

Fixes: 3b83b6c2e024 ("net/mlx5e: Clear VF config when switching modes")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3d9c5e02 03-Feb-2020 Huy Nguyen <huyn@mellanox.com>

net/mlx5: Fix sleep while atomic in mlx5_eswitch_get_vepa

rtnl_bridge_getlink is protected by rcu lock, so mlx5_eswitch_get_vepa
cannot take mutex lock. Two possible issues can happen:
1. User at the same time change vepa mode via RTM_SETLINK command.
2. User at the same time change the switchdev mode via devlink netlink
interface.

Case 1 cannot happen because rtnl executes one message in order.
Case 2 can happen but we do not expect user to change the switchdev mode
when changing vepa. Even if a user does it, so he will read a value
which is no longer valid.

Fixes: 8da202b24913 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3b83b6c2 13-Jan-2020 Dmytro Linkin <dmitrolin@mellanox.com>

net/mlx5e: Clear VF config when switching modes

Currently VF in LEGACY mode are not able to go up. Also in OFFLOADS
mode, when switching to it first time, VF can go up independently to
his representor, which is not expected.
Perform clearing of VF config when switching modes and set link state
to AUTO as default value. Also, when switching to OFFLOADS mode set
link state to DOWN, which allow VF link state to be controlled by its
REP.

Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore")
Fixes: 556b9d16d3f5 ("net/mlx5: Clear VF's configuration on disabling SRIOV")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 61dc7b01 14-Nov-2019 Paul Blakey <paulb@mellanox.com>

net/mlx5: Refactor mlx5_create_auto_grouped_flow_table

Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param
which already carries the max_fte, prio and flags memebers, and is
used the same in similar mlx5_create_flow_table() function.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 75102121 13-Nov-2019 Roi Dayan <roid@mellanox.com>

net/mlx5e: Fix set vf link state error flow

Before this commit the ndo always returned success.
Fix that.

Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 8b3f2eb0 05-Nov-2019 Colin Ian King <colin.king@canonical.com>

net/mlx5: fix kvfree of uninitialized pointer spec

Currently when a call to esw_vport_create_legacy_ingress_acl_group
fails the error exit path to label 'out' will cause a kvfree on the
uninitialized pointer spec. Fix this by ensuring pointer spec is
initialized to NULL to avoid this issue.

Addresses-Coverity: ("Uninitialized pointer read")
Fixes: 10652f39943e ("net/mlx5: Refactor ingress acl configuration")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 556b9d16 03-Sep-2019 Aya Levin <ayal@mellanox.com>

net/mlx5: Clear VF's configuration on disabling SRIOV

When setting number of VFs to 0 (disable SRIOV), clear VF's
configuration.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 238302fa 28-Oct-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Enable metadata on own vport

Currently on ECPF, metadata is enabled on the ECPF vport = 0xfffe
(manager vport).
Metadata when supported, must be enabled on own vport which is
used to pass metadata to vport of NIC Rx Flow Table.

Due to this error, traffic tagged by ingress ACL is not processed
correctly at NIC rx flow table level which is supposed to work
on metadata tag.

Hence, instead of working on eswitch manager vport, always working on
eswitch own vport regardless of PF or ECPF.

Given that mlx5_eswitch_query/modify_esw_vport_context() is used to
access other vport in legacy mode and own vport settings in switchdev mode,
extend low level API to explicitly specify other_vport.

Fixes: c1286050cf47 ("net/mlx5: E-Switch, Pass metadata from FDB to eswitch manager")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 10652f39 28-Oct-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: Refactor ingress acl configuration

Drop, untagged, spoof check and untagged spoof check flow groups are
limited to legacy mode only.

Therefore, following refactoring is done to
(a) improve code readability
(b) have better code split between legacy and offloads mode

1. Move legacy flow groups under legacy structure
2. Add validity check for group deletion
3. Restrict scope of esw_vport_disable_ingress_acl to legacy mode
4. Rename esw_vport_enable_ingress_acl() to
esw_vport_create_ingress_acl_table() and limit its scope to
table creation
5. Introduce legacy flow groups creation helper
esw_legacy_create_ingress_acl_groups() and keep its scope to legacy mode
6. Reduce offloads ingress groups from 4 to just 1 metadata group
per vport
7. Removed redundant IS_ERR_OR_NULL as entries are marked NULL on free.
8. Shortern error message to remove redundant 'E-switch'

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a962d7a6 28-Oct-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: Restrict metadata disablement to offloads mode

Now that there is clear separation for acl setup/cleanup between legacy
and offloads mode, limit metdata disablement to offloads mode.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 748da30b 28-Oct-2019 Vu Pham <vuhuong@mellanox.com>

net/mlx5: E-switch, Offloads shift ACL programming during enable/disable vport

Currently legacy mode enables ACL while enabling vport, while offloads
mode enable ACL when moving to offloads mode.

Bring consistency to both modes by enabling/disabling ACL when
enabling/disabling a vport.

It also eliminates creating ingress ACL table on unused ECPF vport in
offloads mode.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b7752f83 28-Oct-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: Move ACL drop counters life cycle close to ACL lifecycle

It is better to create/destroy ACL related drop counters where the actual
drop rule ACLs are created/destroyed, so that ACL configuration is self
contained for ingress and egress.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f5d0c01d 28-Oct-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Legacy introduce and use per vport acl tables APIs

Introduce and use per vport ACL tables creation and destroy APIs, so that
subsequently patch can use them during enabling/disabling a vport in
unified way for legacy vs offloads mode.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 925a6acc 28-Oct-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Prepare code to handle vport enable error

In subsequent patch, esw_enable_vport() could fail and return error.
Prepare code to handle such error.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 77b09430 28-Oct-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: Tide up state_lock and vport enabled flag usage

When eswitch is disabled, vport event handler is unregistered.
This unregistration already synchronizes with running EQ event handler
in below code flow.

mlx5_eswitch_disable()
mlx5_eswitch_event_handlers_unregister()
mlx5_eq_notifier_unregister()
atomic_notifier_chain_unregister()
synchronize_rcu()

notifier_callchain
eswitch_vport_event()
queue_work()

Additionally vport->enabled flag is set under state_lock during
esw_enable_vport() but is not read under state_lock in
(a) esw_disable_vport() and (b) under atomic context
eswitch_vport_event().

It is also necessary to synchronize with already scheduled vport event.
This is already achieved using below sequence.

mlx5_eswitch_event_handlers_unregister()
[..]
flush_workqueue()

Hence,
(a) Remove vport->enabled check in eswitch_vport_event() which
doesn't make any sense.
(b) Remove redundant flush_workqueue() on every vport disable.
(c) Keep esw_disable_vport() symmetric with esw_enable_vport() for
state_lock.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 853b5352 28-Oct-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: Move legacy drop counter and rule under legacy structure

To improve code readability, move legacy drop counters and droup rule
under legacy structure.

While at it,
(a) prefix drop flow counters helper with legacy_.
(b) nullify the rule pointers only if they were valid.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ea2300e0 28-Oct-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: Introduce and use mlx5_esw_is_manager_vport()

Currently esw_enable_vport() does vport check for zero to enable drop
counters regardless of execution on ECPF/PF.
While esw_disable_vport() considers such scenario.

To keep consistency across code for checking for manager_vport,
introduce and use mlx5_esw_is_manager_vport() to check if a specified
vport is eswitch manager vport or not.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# fdde49e0 28-Oct-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Introduce and use vlan rule config helper

Between legacy mode and switchdev mode, only two fields are changed,
vlan_tag and flow action.
Hence to avoid duplicte code between two modes, introduce and and use
helper function to configure allowed VLAN rule.

While at it, get rid of duplicate debug message.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e019cb53 28-Oct-2019 Qing Huang <qing.huang@oracle.com>

net/mlx5: Fixed a typo in a comment in esw_del_uc_addr()

Changed "managerss" to "managers".

Fixes: a1b3839ac4a4 ("net/mlx5: E-Switch, Properly refer to the esw manager vport")
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 61086f39 02-Aug-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Protect encap hash table with mutex

To remove dependency on rtnl lock, protect encap hash table from concurrent
modifications with new "encap_tbl_lock" mutex. Use the mutex to protect
internal encap entry state from concurrent modification. This is necessary
because a flow can be attached to multiple encap entries simultaneously,
which significantly complicates using finer grained per-entry lock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# d2faae25 09-Aug-2019 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Protect mod_hdr hash table with mutex

To remove dependency on rtnl lock, protect mod_hdr hash table from
concurrent modifications with new mutex.

Implement helper function to get flow namespace to prevent code
duplication.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# dd58edc3 01-Jun-2018 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Extend mod header entry with reference counter

List of flows attached to mod header entry is used as implicit reference
counter (mod header entry is deallocated when list becomes free) and as a
mechanism to obtain mod header entry that flow is attached to (through list
head). This is not safe when concurrent modification of list of flows
attached to mod header entry is possible. Proper atomic reference counter
is required to support concurrent access.

As a preparation for extending mod header with reference counting, extract
code that lookups and deletes mod header entry into standalone put/get
helpers. In order to remove this dependency on external locking, extend mod
header entry with reference counter to manage its lifetime and extend flow
structure with direct pointer to mod header entry that flow is attached to.

To remove code duplication between legacy and switchdev mode
implementations that both support mod_hdr functionality, store mod_hdr
table in dedicated structure used by both fdb and kernel namespaces. New
table structure is extended with table lock by one of the following patches
in this series. Implement helper function to get correct mod_hdr table
depending on flow namespace.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 694a2960 02-Aug-2019 Colin Ian King <colin.king@canonical.com>

net/mlx5: remove self-assignment on esw->dev

There is a self assignment of esw->dev to itself, clean this up by
removing it. Also make dev a const pointer.

Addresses-Coverity: ("Self assignment")
Fixes: 6cedde451399 ("net/mlx5: E-Switch, Verify support QoS element type")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# fcb64c0f 08-May-2019 Eli Cohen <eli@mellanox.com>

net/mlx5: E-Switch, add ingress rate support

Use the scheduling elements to implement ingress rate limiter on an
eswitch ports ingress traffic. Since the ingress of eswitch port is the
egress of VF port, we control eswitch ingress by controlling VF egress.

Configuration is done using the ports' representor net devices.

Please note that burst size configuration is not supported by devices
ConnectX-5 and earlier generations.

Configuration examples:
tc:
tc filter add dev enp59s0f0_0 root protocol ip matchall action police rate 1mbit burst 20k

ovs:
ovs-vsctl set interface eth0 ingress_policing_rate=1000

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 5896b972 29-Jul-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Tide up eswitch config sequence

Currently for PF and ECPF vports, representors are created before
their eswitch hardware ports are initialized in below flow.

mlx5_eswitch_enable()
esw_offloads_init()
esw_offloads_load_all_reps()
[..]
esw_enable_vport()

However for VFs, vports are initialized before creating their
respective netdev represnetors in event handling context.

Similarly while disabling eswitch, first hardware vports are disabled,
followed by destroying their representors.
Here while underlying vports gets destroyed but its respective user
facing netdevice can still exist on which user can continue to perform
more offload operations.

Instead, its more accurate to do
enable_eswitch switchdev mode:
1. perform FDB tables initialization
2. initialize hw vport
3. create and publish representor for this vport

disable_eswitch switchdev mode:
1. destroy user facing representor for the vport
2. disable hw vport
3. perform FDB tables cleanup

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 131ce701 29-Jul-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: E-Switch, Remove redundant mc_promisc NULL check

mc_promisc pointer points to an instance of struct esw_mc_addr allocated
as part of the esw structure.
Hence it cannot be NULL.
Removed such redundant check and assign where it is actually used.

While at it, add comment around legacy mode fields and move mc_promisc
close to other legacy mode structures to improve code redability.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 9ddb830a 29-Jul-2019 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: E-Switch, remove redundant error handling

We don't need to handle error flow of esw_create_legacy_table() in the
same branch, it is already being handled directly after the if statement,
for both legacy and switchdev modes in one place.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 5019833d 29-Jul-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Introduce helper function to enable/disable vports

vports needs to be enabled in switchdev and legacy mode.

In switchdev mode, vports should be enabled after initializing
the FDB tables and before creating their represntors so that
representor works on an initialized vport object.

Prepare a helper function which can be called when enabling either of
the eswitch modes.

Similarly, have disable vports helper function.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 610090eb 29-Jul-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: E-switch, Initialize TSAR Qos hardware block before its user vports

First enable TSAR Qos hardware block in device before enabling its
user vports.

This refactor is needed so that vports can be enabled before their
representor netdevice can be created.

While at it, esw_create_tsar() returns error code which was used only to
print error. However esw_create_tsar() already prints warning if it hits
an error.
Hence, remove the redundant warning.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6cedde45 29-Jul-2019 Eli Cohen <eli@mellanox.com>

net/mlx5: E-Switch, Verify support QoS element type

Check if firmware supports the requested element type before
attempting to create the element type.
In addition, explicitly specify the request element type and tsar type.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 0e18134f 11-Sep-2018 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Eswitch, use state_lock to synchronize vlan change

esw->state_lock is already used to protect vlan vport configuration change.
However, all preparation and correctness checks, and code that sets vport
data are not protected by this lock and assume external synchronization by
rtnl lock. In order to remove dependency on rtnl lock, extend
esw->state_lock protection to whole eswitch vlan add/del functions.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 525e84be 18-Nov-2018 Vlad Buslov <vladbu@mellanox.com>

net/mlx5e: Eswitch, change offloads num_flows type to atomic64

Eswitch implements its own locking by means of state_lock mutex and
multiple fine-grained lock in containing data structures, and is supposed
to not rely on rtnl lock. However, eswitch offloads num_flows type is a
regular long long integer and cannot be modified concurrently. This is an
implicit assumptions that mlx5 tc is serialized (by rtnl lock or any other
means). In order to remove implicit dependency on rtnl lock, change
num_flows type to atomic64 to allow concurrent modifications.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 3a5ee3b3 14-Jul-2019 Fuqian Huang <huangfq.daxian@gmail.com>

ethernet: remove redundant memset

kvzalloc already zeroes the memory during the allocation.
pci_alloc_consistent calls dma_alloc_coherent directly.
In commit 518a2f1925c3
("dma-mapping: zero memory returned from dma_alloc_*"),
dma_alloc_coherent has already zeroed the memory.
So the memset after these function is not needed.

Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9a64144d 17-Jun-2019 Maor Gottlieb <maorg@mellanox.com>

net/mlx5: E-Switch, Fix default encap mode

Encap mode is related to switchdev mode only. Move the init of
the encap mode to eswitch_offloads. Before this change, we reported
that eswitch supports encap, even tough the device was in non
SRIOV mode.

Fixes: 7768d1971de67 ('net/mlx5: E-Switch, Add control for encapsulation')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# dd28087c 07-Jun-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: Refactor mlx5_esw_query_functions for modularity

Functions change event output data size changes when functions other
than VFs will be enabled in HCA CAP.
With current API, multiple callers needs to align, calculate accurate
size of the output data depending on number on non VF functions enabled
in the device.
Instead of duplicating such math at multiple places, refactor
mlx5_esw_query_functions() to return raw output allocated by itself.

Caller must free the allocated memory using kvfree() as described in the
function comment section.
This hides calcuation within mlx5_esw_query_functions() and provides
simpler API.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 2752b823 14-May-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()

Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports().
mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF
vports as well.
Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to
more generic vport.h header file. Such exposure is not desired.
Hence a mlx5_eswitch_get_total_vports() is introduced.

Given that mlx5_eswitch_get_total_vports() API wants to work on const
mlx5_core_dev*, change its helper functions also to accept const *dev.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 4a3929b2 28-Jun-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Handle UC address change in switchdev mode

When NVME device emulation mode is enabled, more than one PFs use the
same physical port. In this case, MPFS is required to program L2
addresses.

It used to rely on netdev set_rx_mode in switchdev mode, but driver
later changed to not create netdev for eswitch manager once in
switchdev mode. So, UC address event should be handled.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 411ec9e0 28-Jun-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop

When ECPF is the eswitch manager, host PF is treated like other VFs.
Driver should do the same for inline mode and vlan pop.

Add new iterators to include host PF if ECPF is the eswitch manager.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 16fff98a 28-Jun-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Reg/unreg function changed event at correct stage

When driver is doing eswitch mode change, it's critical to keep number
of enabled VFs unchanged. However, it can be changed on the fly once
function changed event is registered.

To remove this uncertainty, function changed event should not be
registered before all setups, and first be unregistered before all
cleanups. Wrap this functionality together with vport event handler.

Fixes: 61fc880839e6 ("net/mlx5: E-Switch, Handle representors creation in handler context")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 062f4bf4 28-Jun-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Consolidate eswitch function number of VFs

Enabled number of VFs is key for eswich manager to do flow steering
initialization and vport configurations. However, the number of
enabled VFs may come from two sources as below.

PF: num of VFs is provided by enabled SR-IOV of itself.
ECPF: num of VFs is provided by enabled SR-IOV from its peer PF. And
SR-IOV can't be enabled from ECPF itself.

Current driver handles the two cases in different stages and passing
the number of enabled VFs among a large scope of internal functions.
It is usually hard to find out where is the real number of VFs from
due to layers of argument pass-in.

This patch consolidated that number from the entry point of doing
eswitch setup, and maintained a copy so that eswitch functions can
refer to it directly.

Eswitch driver shall always use this number when referring to enabled
number of VFs, don't use other numbers such as from SR-IOV.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f6455de0 28-Jun-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Refactor eswitch SR-IOV interface

Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF
can be at offload mode when SR-IOV is not enabled.

Rename the interface and eswitch mode names to decouple from SR-IOV,
and cleanup eswitch messages accordingly.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e1d974d0 28-Jun-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: Handle host PF vport mac/guid for ECPF

When ECPF is eswitch manager, it has the privilege to query and
configure the mac and node guid of host PF.

While vport number of host PF is 0, the vport command should be
issued with other_vport set in this case as the cmd is issued by
ECPF vport(0xfffe).

Add a specific function to query own vport mac. Low level functions
are used by vport manager to query/modify any vport mac and node guid.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 5f5d2536 28-Jun-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Use correct flags when configuring vlan

Before the offending commit, vlan will be configured if either vlan
or qos is set. After the change with new set flags, function callers
should provide flags accordingly.

Fixes: e33dfe316cf3 ("net/mlx5: E-Switch, Allow fine tuning of eswitch vport push/pop vlan")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 57843868 25-Jun-2019 Jianbo Liu <jianbol@mellanox.com>

net/mlx5: E-Switch, Add query and modify esw vport context functions

Add esw vport query and modify functions, and exposing them is needed for
enabling or disabling registers passed as metatdata to vport NIC_RX table
in slow path.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7445cfb1 25-Jun-2019 Jianbo Liu <jianbol@mellanox.com>

net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ingress ACLs

When a dual-port VHCA sends a RoCE packet on its non-native port, and the
packet arrives to its affiliated vport FDB, a mismatch might occur on the
rules that match the packet source vport as it is not represented by single
VHCA only in this case. So we change to match on metadata instead of source
vport.
To do that, a rule is created in all vports and uplink ingress ACLs, to
save the source vport number and vhca id in the packet's metadata in order
to match on it later.
The metadata register used is the first of the 32-bit type C registers. It
can be used for matching and header modify operations. The higher 16 bits
of this register are for vhca id, and the lower 16 ones is for vport
number.
This change is not for dual-port RoCE only. If HW and FW allow, the vport
metadata matching is enabled by default.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# f53297d6 25-Jun-2019 Jianbo Liu <jianbol@mellanox.com>

net/mlx5: Get vport ACL namespace by vport index

The ingress and egress ACL root namespaces are created per vport and
stored into arrays. However, the vport number is not the same as the
index. Passing the array index, instead of vport number, to get the
correct ingress and egress acl namespace.

Fixes: 9b93ab981e3b ("net/mlx5: Separate ingress/egress namespaces for each vport")
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 82b11f07 12-Jun-2019 Maor Gottlieb <maorg@mellanox.com>

net/mlx5: Expose eswitch encap mode

Add API to get the current Eswitch encap mode.
It will be used in downstream patches to check if
flow table can be created with encap support or not.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>


# 10ee82ce 10-Jun-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Return raw output for query esw functions

Current function only returns host num of VFs, later patch requires
other params such as host maximum num of VFs.

Return the raw output so that caller can extract info as needed.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ac35dcd6 10-Jun-2019 Vu Pham <vuhuong@mellanox.com>

net/mlx5: E-Switch, Handle representors creation in handler context

Unified representors creation in esw_functions_changed context
handler. Emulate the esw_function_changed event for FW/HW that
does not support this event.

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# cd56f929 29-May-2019 Vu Pham <vuhuong@mellanox.com>

net/mlx5: E-Switch, Replace host_params event with functions_changed event

To support sriov on a E-Switch manager, num_vfs are queried
to the firmware whenever E-Switch manager is notified by
esw_functions_changed event.

Replace host_params event with esw_functions_changed event that reflects
more appropriate naming.

While at it, also correct num_vfs type from int to u16 as expected by
the function mlx5_esw_query_functions().

Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 02f3afd9 05-Apr-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index

To avoid any ambiguity between vport index and vport number,
rename functions that had vport, to vport_num or vport_index appropriately.

vport_num is u16 hence change mlx5_eswitch_index_to_vport_num() return
type to u16.

vport_index is an int in vport array. Hence change input type of vport
index in mlx5_eswitch_index_to_vport_num() to int.

Correct multiple eswitch representor interfaces use type u16 of
rep->vport as type int vport_index.

Send vport FW commands with correct eswitch u16 vport_num instead
host int vport_index.

Fixes: 5ae5162066d8 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 5d9986a3 15-Apr-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Fix the check of legal vport

The check of legal vport is to ensure the vport number falls between
0 and total number of vports. Along with the introduction of uplink
rep, enabled vports are not consecutive any more.
Therefore, rely on the eswitch vport getter function to check if it's
a valid vport.

As the getter function relies on eswitch, add the check of vport
group manager and validation the presence of eswitch structure.
Remove the redundant check in the function calls.

Since the vport array will be allocated once eswitch is initialized
and will be kept alive if eswitch presents, no need to protect it with
the state lock.

Fixes: 5ae5162066d8 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 4314ebaa 15-Apr-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Use getter to access all vport array

Some functions issue vport commands and access vport array using
vport_index/vport_num interchangeably which is OK for VFs vports.
However, this creates potential bug if those vports are not VFs
(E.g, uplink, sf) where their vport_index don't equal to vport_num.

Prepare code to access mlx5_vport structure using a getter function.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# ee813f31 21-Apr-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: Use available mlx5_vport struct

Several functions need to access mlx5_vport and vport_num.
When these functions are called, caller already has mlx5_vport*
available.
Hence pass such mlx5_vport pointer.

This is preparation patch to add error checks to
mlx5_eswitch_get_vport() and to return error status.
By doing so, reduce places where error check of mlx5_eswitch_get_vport()
can be avoided.

While doing such change, mlx5_eswitch_query_vport_drop_stats() gets
corrected to work on vport, instead of vport_idx.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 786ef904 20-Apr-2019 Parav Pandit <parav@mellanox.com>

net/mlx5: Reuse mlx5_esw_for_each_vf_vport macro in two files

Currently mlx5_esw_for_each_vf_vport iterates over mlx5_vport entries in
eswitch.c
Same macro in eswitch_offloads.c iterates over vport number in
eswitch_offloads.c

Instead of duplicate macro names, to avoid confusion and to reuse the
same macro in both files, move it to eswitch.h.

To iterate over vport numbers where there is no need to iterate over
mlx5_vport, but only a vport number is needed, rename those macros in
eswitch_offloads.c to mlx5_esw_for_each_vf_num_vport*.

While at it, keep all vport and vport rep iterators together.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 18486737 03-Mar-2019 Eli Britstein <elibr@mellanox.com>

net/mlx5e: ACLs for priority tag mode

Current ConnectX HW is unable to perform VLAN pop in TX path and VLAN
push on RX path. As a workaround, untagged packets are tagged with
VID 0x000 allowing pop/push actions to be exchanged with VLAN rewrite
actions.
Use the ingress ACL table, preceding the FDB, to push VLAN 0x000 ID tag
for untagged packets and the egress ACL table, succeeding the FDB, to
pop VLAN 0x000 ID tag.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 36acf63a 22-Mar-2019 Huy Nguyen <huyn@mellanox.com>

net/mlx5: E-Switch, fix syndrome (0x678139) when turn on vepa

Make sure the struct mlx5_flow_destination is zero before
filling in the field.

Fixes: 8da202b24913 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# eca4a928 24-Feb-2019 Omri Kahalon <omrik@mellanox.com>

net/mlx5: E-Switch, Fix esw manager vport indication for more vport commands

Traditionally, the PF (Physical Function) which resides on vport 0 was
the E-switch manager. Since the ECPF (Embedded CPU Physical Function),
which resides on vport 0xfffe, was introduced as the E-Switch manager,
the assumption that the E-switch manager is on vport 0 is incorrect.

Since the eswitch code already uses the actual vport value, all we
need is to always set other_vport=1.

Signed-off-by: Omri Kahalon <omrik@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 8a91ad93 07-Mar-2019 Roi Dayan <roid@mellanox.com>

net/mlx5: E-Switch, Fix access to invalid memory when toggling esw modes

The esw fdb table has a union of legacy and offloads members.
So if we were in a certain esw mode we could set some memebers and not
set null which is fine as on destroy path and don't care.
But then moving from legacy to switchdev a second time, the cleanup flow
of legacy mode checks if a struct member was in use if it's not null so
we need to make sure to reset the code to null when we init legacy mode.

Fixes: 8da202b24913 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 24319258 04-Mar-2019 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net/mlx5: Avoid panic when setting vport rate

If we try to set VFs rate on a VF (not PF) net device, the kernel
will be crash. The commands are show as below:

$ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
$ ip link set $MLX_VF0 vf 0 max_tx_rate 2 min_tx_rate 1

If not applied the first patch ("net/mlx5: Avoid panic when setting
vport mac, getting vport config"), the command:

$ ip link set $MLX_VF0 vf 0 rate 100

can also crash the kernel.

[ 1650.006388] RIP: 0010:mlx5_eswitch_set_vport_rate+0x1f/0x260 [mlx5_core]
[ 1650.007092] do_setlink+0x982/0xd20
[ 1650.007129] __rtnl_newlink+0x528/0x7d0
[ 1650.007374] rtnl_newlink+0x43/0x60
[ 1650.007407] rtnetlink_rcv_msg+0x2a2/0x320
[ 1650.007484] netlink_rcv_skb+0xcb/0x100
[ 1650.007519] netlink_unicast+0x17f/0x230
[ 1650.007554] netlink_sendmsg+0x2d2/0x3d0
[ 1650.007592] sock_sendmsg+0x36/0x50
[ 1650.007625] ___sys_sendmsg+0x280/0x2a0
[ 1650.007963] __sys_sendmsg+0x58/0xa0
[ 1650.007998] do_syscall_64+0x5b/0x180
[ 1650.009438] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
Cc: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6e77c413 04-Mar-2019 Tonghao Zhang <xiangxia.m.yue@gmail.com>

net/mlx5: Avoid panic when setting vport mac, getting vport config

If we try to set VFs mac address on a VF (not PF) net device,
the kernel will be crash. The commands are show as below:

$ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
$ ip link set $MLX_VF0 vf 0 mac 00:11:22:33:44:00

[exception RIP: mlx5_eswitch_set_vport_mac+41]
[ffffb8b7079e3688] do_setlink at ffffffff8f67f85b
[ffffb8b7079e37a8] __rtnl_newlink at ffffffff8f683778
[ffffb8b7079e3b68] rtnl_newlink at ffffffff8f683a63
[ffffb8b7079e3b90] rtnetlink_rcv_msg at ffffffff8f67d812
[ffffb8b7079e3c10] netlink_rcv_skb at ffffffff8f6b88ab
[ffffb8b7079e3c60] netlink_unicast at ffffffff8f6b808f
[ffffb8b7079e3ca0] netlink_sendmsg at ffffffff8f6b8412
[ffffb8b7079e3d18] sock_sendmsg at ffffffff8f6452f6
[ffffb8b7079e3d30] ___sys_sendmsg at ffffffff8f645860
[ffffb8b7079e3eb0] __sys_sendmsg at ffffffff8f647a38
[ffffb8b7079e3f38] do_syscall_64 at ffffffff8f00401b
[ffffb8b7079e3f50] entry_SYSCALL_64_after_hwframe at ffffffff8f80008c

and

[exception RIP: mlx5_eswitch_get_vport_config+12]
[ffffa70607e57678] mlx5e_get_vf_config at ffffffffc03c7f8f [mlx5_core]
[ffffa70607e57688] do_setlink at ffffffffbc67fa59
[ffffa70607e577a8] __rtnl_newlink at ffffffffbc683778
[ffffa70607e57b68] rtnl_newlink at ffffffffbc683a63
[ffffa70607e57b90] rtnetlink_rcv_msg at ffffffffbc67d812
[ffffa70607e57c10] netlink_rcv_skb at ffffffffbc6b88ab
[ffffa70607e57c60] netlink_unicast at ffffffffbc6b808f
[ffffa70607e57ca0] netlink_sendmsg at ffffffffbc6b8412
[ffffa70607e57d18] sock_sendmsg at ffffffffbc6452f6
[ffffa70607e57d30] ___sys_sendmsg at ffffffffbc645860
[ffffa70607e57eb0] __sys_sendmsg at ffffffffbc647a38
[ffffa70607e57f38] do_syscall_64 at ffffffffbc00401b
[ffffa70607e57f50] entry_SYSCALL_64_after_hwframe at ffffffffbc80008c

Fixes: a8d70a054a718 ("net/mlx5: E-Switch, Disallow vlan/spoofcheck setup if not being esw manager")
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 544fe7c2 17-Feb-2019 Roi Dayan <roid@mellanox.com>

net/mlx5e: Activate HW multipath and handle port affinity based on FIB events

To support multipath offload we are going to track SW multipath route
and related nexthops. To do that we register to FIB notifier and handle
the route and next-hops events and reflect that as port affinity to HW.

When there is a new multipath route entry that all next-hops are the
ports of an HCA we will activate LAG in HW.

Egress wise, we use HW LAG as the means to emulate multipath on current
HW which doesn't support port selection based on xmit hash. In the
presence of multiple VFs which use multiple SQs (send queues) this
yields fairly good distribution.

HA wise, HW LAG buys us the ability for a given RQ (receive queue) to
receive traffic from both ports and for SQs to migrate xmitting over
the active port if their base port fails.

When the route entry is being updated to single path we will update
the HW port affinity to use that port only.

If a next-hop becomes dead we update the HW port affinity to the living
port.

When all next-hops are alive again we reset the affinity to default.

Due to FW/HW limitations, when a route is deleted we are not disabling
the HW LAG since doing so will not allow us to enable it again while
VFs are bounded. Typically this is just a temporary state when a
routing daemon removes dead routes and later adds them back as needed.

This patch only handles events for AF_INET.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 8da202b2 21-Jan-2019 Huy Nguyen <huyn@mellanox.com>

net/mlx5: E-Switch, Add support for VEPA in legacy mode.

In Virtual Ethernet Port Aggregator (VEPA) mode, the packet skips
the system internal virtual switch and forwards to external network
switch. In Mellanox HCA case, the virtual switch is the HCA's Eswitch.

To support this, an new FDB flow table are created with level 0 and
linked to the existing FDB flow table in legacy mode. By default,
VEPA is turned off and this FDB flow table is empty. When VEPA is
turned on, two rules are created. One rule to forward on uplink vport
traffic to the legacy FDB. The other rule forward all other traffic
to uplink vport.

Other design alternatives were not chosen as explained below:
1. Create a forward rule in ACL flow table (most efficient design).
This approach is the not chosen because firmware does not support
forward rule to uplink vport (0xffff) for ACL flow table.
2. Add additional source port criteria in all the FDB rules to make the
FDB rules to be received rules only. This approach is not chosen because
it is not efficient as there can many rules in the FDB and VEPA mode
cannot be controlled per vport.
3. Add a highest prioirty flow group in the existing legacy FDB Flow
Table instead of a new flow table. This approoach does not work because the
new flow group has the same match criteria as the promiscuous flow group
and mlx5_add_flow_rules does not allow specifying flow group.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1c50d369 17-Feb-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Disable esw manager vport correctly

When disabling vport, relevant vport configurations will be cleaned
up. These cleanups should be done to the vports which had these configs
applied at vport enablement. As esw manager vport didn't have such
vport config applied, cleanup should not touch it.

Fixes: de9e6a8136c5 ("net/mlx5: E-Switch, Properly refer to host PF vport as other vport")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reported-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# acad7073 17-Feb-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Fix the warning on vport index out of range

When eswitch gets vport data structure, the index should not be out
of the range of the vport array. Driver mistakenly used vport number
to check the range.

Fixes: 22b8ddc86bf4 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a3888f33 29-Jan-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Load/unload VF reps according to event from host PF

When host PF changes the number of VFs, the ECPF esw driver will get
a FW event. It should query the number of VFs enabled by host PF and
update the VF reps accordingly. Note that host PF can't change the
number of VFs dynamically, it has to reset the number of VFs to 0
before changing to a new positive number.

The host event is registered when driver is moving to switchdev mode,
and it's the last step to do in esw_offloads_init. It's unregistered
and the work queue is flushed when driver quits from switchdev mode.
In this way, the host event and devlink command are serialized.

When driver is enabling switchdev mode, pay attention to the following
two facts:
1. Host PF must not have VF initialized as the flow table in ECPF has
ENCAP enabled as default. Such flow table can't be created with
existing initialized VFs.
2. ECPF doesn't know how many VFs the host PF will enable, ECPF
offloads flow steering shall create the flow table/groups based on
the max number of VFs possibly supported by host PF.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 81cd229c 10-Dec-2018 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Consider ECPF vport depends on eswitch ownership

ECPF connects to the eswitch through vport 0xfffe. ECPF may or may
not be the eswitch manager depending on firmware configuration.

1. If ECPF is eswitch manager: ECPF will take over the eswitch manager
responsibility. A rep of the host PF shall be created at the ECPF
side for the eswitch manager to control.

2. If ECPF is not eswitch manager: host PF will be the eswitch manager,
ECPF acts similar as a VF to the host PF. Host PF will be aware
of the ECPF vport presence and control it's rep.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 5ae51620 14-Dec-2018 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Assign a different position for uplink rep and vport

In offloads mode, the current implementation puts the uplink
representor at index zero of the vport reps array. It is not "natural"
to place it at index 0 since we want to put the representor for vport
0 at index 0 with the introduction of SmartNIC. A separate patch will
handle the case whether a rep is needed for vport 0 (PF vport).

So, we want to have a different placeholder for uplink vport and
representor. It was placed at the end of vport and rep array. Since
vport number can no longer act as an index into the vport or
representors arrays, use functions to map vport numbers to indices
when accessing the vports or representors arrays, and vice versa.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 879c8f84 28-Jan-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Use getter and iterator to access vport/rep

With only PF and VF, it is sufficient to have the vport/rep array
index as the vport number. This is because PF and VF vports numbers
are consecutive serial numbers. In downstream patches with
introducing of ECPF and UPLINK vports, it's not consecutive any more.

Use getter to get specific vport/rep, and use iterator to traversal
a list of vport/rep. This hides the translation between array index
and vport number, and provides flexibility of using different
translation mechanism in the future.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Suggested-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c9b99abc 31-Jan-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Split VF and special vports for offloads mode

When driver is entering offloads mode, there are two major tasks to
do: initialize flow steering and create representors. Flow steering
should make sure enough flow table/group spaces are reserved for all
reps. Representors will be created in a group, all or none.

With the introduction of ECPF, flow steering should still reserve the
same spaces. But, the representors are not always loaded/unloaded in a
single piece. Once ECPF is in offloads mode, it will get the number
of VF changing event from host PF. In such scenario, only the VF reps
should be loaded/unloaded, not the reps for special vports (such as
the uplink vport).

Thus, when entering offloads mode, driver should specify the total
number of reps, and the number of VF reps separately. When leaving
offloads mode, the cleanup should use the information self-contained
in eswitch such as number of VFs.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# cbc44e76 01-Feb-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Properly refer to host PF vport as other vport

Commands referring to vports use the following scheme:

1. When referring to my own vport, put 0 in vport and 0 in other_vport.
2. When referring to another vport, put the vport number of the
referred vport and put 1 in other_vport. It was assumed that driver
is accessing other vport when vport number is greater than 0.

With the above scheme, the case that ECPF eswitch manager is trying
to access host PF vport will fall over with scheme 1 as the vport
number is 0. This is apparently wrong as driver is trying to refer
other vport.

As such usage can only happen in the eswitch context, change relevant
functions to provide other vport input properly.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# a1b3839a 08-Nov-2018 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Properly refer to the esw manager vport

In SmartNIC mode, the eswitch manager is not necessarily the PF
(vport 0). Use a helper function to get the correct eswitch manager
vport number and cache on the eswitch instance for fast reference.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# cd7e4186 12-Feb-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Avoid magic numbers when initializing offloads mode

When dealing with the offloads mode initialization, driver refers to
the number of VFs and add magic number one (1) to take account of the
uplink. This is not clear and will make the code less readable after
adding other vports (e.g. host PF). As these are special vports
compared to VF vports, add a helper macro to denote such special
vports and eliminate the use of magic number.

Moreover, when creating offloads flow table and groups, the driver
reserves two more slots for UC and MC miss rules. Replace this magic
number with a helper macro as well.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b05af6aa 12-Feb-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: E-Switch, Normalize the name of uplink vport number

Driver used to name uplink vport as FDB_UPLINK_VPORT, it's hard to
comply with the same naming convention along with the introduction of
other vports. Use MLX5_VPORT as the prefix for such vports and
relocate the uplink vport definition to public header file for the
benefits of both net and IB drivers.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7e4c4330 12-Feb-2019 Bodong Wang <bodong@mellanox.com>

net/mlx5: Use consistent vport num argument type

Use u16 for vport number, which matches how hardware refers to this
argument throughout commands.

This patch doesn't change any functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 9d2cbdc5 24-Dec-2018 Aya Levin <ayal@mellanox.com>

net/mlx5e: Allow MAC invalidation while spoofchk is ON

Prior to this patch the driver prohibited spoof checking on invalid MAC.
Now the user can set this configuration if it wishes to.

This is required since libvirt might invalidate the VF Mac by setting it
to zero, while spoofcheck is ON.

Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 4e046de0 13-Jan-2019 Bodong Wang <bodong@mellanox.com>

Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"

This reverts commit 5f5991f36dce1e69dd8bd7495763eec2e28f08e7.

With the original commit, eswitch instance will not be initialized for
a function which is vport group manager but not eswitch manager such as
host PF on SmartNIC (BlueField) card. This will result in a kernel crash
when such a vport group manager is trying to access vports in its group.
E.g, PF vport manager (not eswitch manager) tries to configure the MAC
of its VF vport, a kernel trace will happen similar as bellow:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
RIP: 0010:mlx5_eswitch_get_vport_config+0xc/0x180 [mlx5_core]
...

Fixes: 5f5991f36dce ("net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reported-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# aec002f6 07-Nov-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode

Now, when we have a dedicated uplink representor, the netdev instance
set over the esw manager vport (PF) is of no-use. As such, remove it
once we're on switchdev mode and get it back to life when off switchdev.

This is done by reloading the Ethernet interface as well (we already
do that for the IB interface) from the eswitch code while going in/out
of switchdev mode.

The Eth add/remove entries are modified to act differently when called in
switchdev mode. In this case we only deal with registration of the eth
vport representors. The rep netdevices are created from the eswitch call
to load the registered eth representors.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 8aaca197 23-May-2018 Rabie Loulou <rabiel@mellanox.com>

net/mlx5: Allow co-enablement of uplink LAG and SRIOV

Enable setting uplink LAG if sriov is enabled on both ports in switchdev
mode.

Once the sriov mode is changed from switchdev for any of the ports, the
LAG instance is disabled.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# eff849b2 06-Jun-2018 Rabie Loulou <rabiel@mellanox.com>

net/mlx5: Allow/disallow LAG according to pre-req only

Remove the lag forbid/allow functions, change the lag prereq check to
run in the do-bond logic, so every change in the prereq state will
cause LAG to be disabled/enabled accordingly after the next do-bond run.

Add lag update function, so every component which changes the prereq
state and want the LAG to re-calc the conditions can call the update
function.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 186599f8 12-Dec-2018 YueHaibing <yuehaibing@huawei.com>

net/mlx5: Remove duplicated include from eswitch.c

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 6933a937 20-Nov-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: E-Switch, Use async events chain

Remove the explicit call to mlx5_eswitch_vport_event on
MLX5_EVENT_TYPE_NIC_VPORT_CHANGE and let the eswitch register its own
handler when its ready.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 16d76083 19-Nov-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: EQ, Different EQ types

In mlx5 we have three types of usages for EQs,
1. Asynchronous EQs, used internally by mlx5 core for
a. FW command completions
b. FW page requests
c. one EQ for all other Asynchronous events

2. Completion EQs, used for CQ completion (we create one per core)

3. *Special type of EQ (page fault) used for RDMA on demand paging
(ODP).

*The 3rd type shouldn't be special at least in mlx5 core, it is yet
another async events EQ with specific use case, it will be removed in
the next two patches, and will completely move its logic to mlx5_ib,
as it is rdma specific.

In this patch we remove use case (eq type) specific fields from
struct mlx5_eq into a new eq type specific structures.

struct mlx5_eq_async;
truct mlx5_eq_comp;
struct mlx5_eq_pagefault;

Separate between their type specific flows.

In the future we will allow users to create there own generic EQs.
for now we will allow only one for ODP in next patches.

We will introduce event listeners registration API for those who
want to receive mlx5 async events.
After that mlx5 eq handling will be clean from feature/user specific
handling.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# f2f3df55 19-Nov-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: EQ, Privatize eq_table and friends

Move unnecessary EQ table structures and declaration from the
public include/linux/mlx5/driver.h into the private area of mlx5_core
and into eq.c/eq.h.

Introduce new mlx5 EQ APIs:

mlx5_comp_vectors_count(dev);
mlx5_comp_irq_get_affinity_mask(dev, vector);

And use them from mlx5_ib or mlx5e netdevice instead of direct access to
mlx5_core internal structures.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 328edb49 03-Jul-2018 Paul Blakey <paulb@mellanox.com>

net/mlx5: Split FDB fast path prio to multiple namespaces

Towards supporting multi-chains and priorities, split the FDB fast path
to multiple namespaces (sub namespaces), each with multiple priorities.

This patch adds a new flow steering type, FS_TYPE_PRIO_CHAINS, which is
like current FS_TYPE_PRIO, but may contain only namespaces, and those
will be in parallel to one another in terms of managing of the flow
tables connections inside them. Meaning, while searching for the next
or previous flow table to connect for a new table inside such namespace
we skip the parallel namespaces in the same level under the
FS_TYPE_PRIO_CHAINS prio we originated from.

We use this new type for splitting the fast path prio into multiple
parallel namespaces, each containing normal prios.
The prios inside them (and their tables) will be connected to one
another, but not from one parallel namespace to another, instead the
last prio in each namespace will be connected to the next prio in
the containing FDB namespace, which is the slow path prio.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 171c7625b 02-Oct-2018 Mark Bloch <markb@mellanox.com>

net/mlx5: Use flow counter IDs and not the wrapping cache object

Currently, when a flow rule is created using the FS core layer, the caller
has to pass the entire flow counter object and not just the counter HW
handle (ID). This requires both the FS core and the caller to have
knowledge about the inner implementation of the FS layer flow counters
cache and limits the possible users.

Move to use the counter ID across the place when dealing with flows.

Doing this decoupling, now can we privatize the inner implementation
of the flow counters.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 11aa5800 16-Sep-2018 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5: E-Switch, Fix out of bound access when setting vport rate

The code that deals with eswitch vport bw guarantee was going beyond the
eswitch vport array limit, fix that. This was pointed out by the kernel
address sanitizer (KASAN).

The error from KASAN log:
[2018-09-15 15:04:45] BUG: KASAN: slab-out-of-bounds in
mlx5_eswitch_set_vport_rate+0x8c1/0xae0 [mlx5_core]

Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 60786f09 28-Aug-2018 Mark Bloch <markb@mellanox.com>

{net, RDMA}/mlx5: Rename encap to reformat packet

Renames all encap mlx5_{core,ib} code to use the new naming of packet
reformat. This change doesn't introduce any function change and is
needed to properly reflect the operation being done by this action.
For example not only can we encapsulate a packet, but also decapsulate it.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# 8e3debc0 08-Aug-2018 Eli Cohen <eli@mellanox.com>

net/mlx5: E-Switch, Remove unused argument when creating legacy FDB

Remove unused nvports argument.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cc9c82a8 08-Aug-2018 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5: Rename modify/query_vport state related enums

Modify and query vport state commands share the same admin_state and
op_mod values, rename the enums to fit them both.

In addition, remove the esw prefix from the admin state enum as this
also applied for vnic.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5f5991f3 16-Jul-2018 Eli Cohen <eli@mellanox.com>

net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager

Execute mlx5_eswitch_init() only if we have MLX5_ESWITCH_MANAGER
capabilities.
Do the same for mlx5_eswitch_cleanup().

Fixes: a9f7705ffd66 ("net/mlx5: Unify vport manager capability check")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 443a8581 09-Jul-2018 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: E-Switch, UBSAN fix undefined behavior in mlx5_eswitch_mode

With debug kernel UBSAN detects the following issue, which might happen
when eswitch instance is not created, fix this by testing the eswitch
pointer before returning the eswitch mode, if not set return mode =
SRIOV_NONE.

[ 32.528951] UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlx5/core/eswitch.c:2219:12
[ 32.528951] member access within null pointer of type 'struct mlx5_eswitch'
[ 32.528951] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc3-dirty #181
[ 32.528951] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[ 32.528951] Call Trace:
[ 32.528951] dump_stack+0xc7/0x13b
[ 32.528951] ? show_regs_print_info+0x5/0x5
[ 32.528951] ? __pm_runtime_use_autosuspend+0x140/0x140
[ 32.528951] ubsan_epilogue+0x9/0x49
[ 32.528951] ubsan_type_mismatch_common+0x1f9/0x2c0
[ 32.528951] ? ucs2_as_utf8+0x310/0x310
[ 32.528951] ? device_initialize+0x229/0x2e0
[ 32.528951] __ubsan_handle_type_mismatch+0x9f/0xc9
[ 32.528951] ? __ubsan_handle_divrem_overflow+0x19b/0x19b
[ 32.578008] ? ib_device_get_by_index+0xf0/0xf0
[ 32.578008] mlx5_eswitch_mode+0x30/0x40
[ 32.578008] mlx5_ib_add+0x1e0/0x4a0

Fixes: 57cbd893c4c5 ("net/mlx5: E-Switch, Move representors definition to a global scope")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>


# a8d70a05 30-May-2018 Eli Cohen <eli@mellanox.com>

net/mlx5: E-Switch, Disallow vlan/spoofcheck setup if not being esw manager

In smartnic env, if the host (PF) driver is not an e-switch manager, we
are not allowed to apply eswitch ports setups such as vlan (VST),
spoof-checks, min/max rate or state.

Make sure we are eswitch manager when coming to issue these callbacks
and err otherwise.

Also fix the definition of ESW_ALLOWED to rely on eswitch_manager
capability and on the vport_group_manger.

Operations on the VF nic vport context, such as setting a mac or reading
the vport counters are allowed to the PF in this scheme.

The modify nic vport guid code was modified to omit checking the
nic_vport_node_guid_modify eswitch capability.
The reason for doing so is that modifying node guid requires vport group
manager capability, and there's no need to check further capabilities.

1. set_vf_vlan - disallowed
2. set_vf_spoofchk - disallowed
3. set_vf_mac - allowed
4. get_vf_config - allowed
5. set_vf_trust - disallowed
6. set_vf_rate - disallowed
7. get_vf_stat - allowed
8. set_vf_link_state - disallowed

Fixes: f942380c1239 ('net/mlx5: E-Switch, Vport ingress/egress ACLs rules for spoofchk')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Tested-by: Or Gerlitz <ogerlitz@mellanox.com>


# 0efc8562 31-May-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager

In smartnic env, the host (PF) driver might not be an e-switch
manager, hence the FW will err on driver attempts to deal with
setting/unsetting the eswitch and as a result the overall setup
of sriov will fail.

Fix that by avoiding the operation if e-switch management is not
allowed for this driver instance. While here, move to use the
correct name for the esw manager capability name.

Fixes: 81848731ff40 ('net/mlx5: E-Switch, Add SR-IOV (FDB) support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Guy Kushnir <guyk@mellanox.com>
Reviewed-by: Eli Cohen <eli@melloanox.com>
Tested-by: Eli Cohen <eli@melloanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 930821e3 31-May-2018 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: Use flow counter pointer as input to the query function

This allows to un-expose the details of struct mlx5_fc and keep it
internal to the core driver.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 52fff327 16-May-2018 Chris Mi <chrism@mellanox.com>

net/mlx5: E-Switch, Reorganize and rename fdb flow tables

We have several fdb flow tables for each of the legacy and switchdev
modes. In the switchdev mode, there are fast path and slow path flow
tables. Towards adding more flow tables in upcoming patches, reorganize
and rename the various existing ones to reflect their functionality.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b17f7fc1 21-Mar-2018 Shahar Klein <shahark@mellanox.com>

net/mlx5: Add destination e-switch owner

The destination e-switch owner allows a rule in namespace of one e-switch
owner to point to a vport that is natively associated with another
e-switch owner.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 88d725bb 25-Apr-2018 Adi Nissim <adin@mellanox.com>

net/mlx5: E-Switch, Include VF RDMA stats in vport statistics

The host side reporting of VF vport statistics didn't include the VF
RDMA traffic.

Fixes: 3b751a2a418a ("net/mlx5: E-Switch, Introduce get vf statistics")
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reported-by: Ariel Almog <ariela@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# aaabd078 13-Jan-2018 Moshe Shemesh <moshe@mellanox.com>

net/mlx5: Add packet dropped while vport down statistics

Added the following packets dropped while vport down statistics:

Rx dropped while vport down - counts packets which were steered by
e-switch to a vport, but dropped since the vport was down. This counter
will be shown on ip link tool as part of the vport rx_dropped counter.

Tx dropped while vport down - counts packets which were transmitted by
a vport, but dropped due to vport logical link down. This counter
will be shown on ip link tool as part of the vport tx_dropped counter.

The counters are read from FW by command QUERY_VNIC_ENV.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c5447c70 23-Jan-2018 Mark Bloch <markb@mellanox.com>

net/mlx5: E-Switch, Reload IB interface when switching devlink modes

Up until this point it wasn't possible to activate IB representors
when switching to switchdev mode, remove this limitation.

We trigger reload of the PF IB interface in order to make sure that
already allocated resources are invalid and new resources will be opened
correctly with all the limitations of switchdev mode applied (only raw
packet capabilities, without RoCE). We also move the remove/add to a
place where the E-Switch mode is set/unset to better control when to
trigger this action, this will allow the IB side to start in the correct
mode.

For better code reuse, create a function which reloads an interface and
export it.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 57cbd893 16-Jan-2018 Mark Bloch <markb@mellanox.com>

net/mlx5: E-Switch, Move representors definition to a global scope

In preparation for IB representors, move representors structs to a global
scope, also expose functions needed for registration, unregistration,
eswitch mode and creating a flow rule to direct traffic from SQs to the
right VF.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 26a0f6e8 31-Jan-2018 Eugenia Emantayev <eugenia@mellanox.com>

net/mlx5: E-Switch, Fix drop counters use before creation

First use of drop counters happens in esw_apply_vport_conf function,
while they are allocated later in the flow. Fix that by moving
esw_vport_create_drop_counters function to be called before the first use.

Fixes: b8a0dbe3a90b ("net/mlx5e: E-switch, Add steering drop counters")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b8a0dbe3 08-Nov-2017 Eugenia Emantayev <eugenia@mellanox.com>

net/mlx5e: E-switch, Add steering drop counters

Add flow counters to count packets dropped due to drop rules
configured in eswitch egress and ingress ACLs.
These counters will count VFs violations and incoming traffic drops.
Will be presented on hypervisor via standard 'ip -s link show' command.

Example: "ip -s link show dev enp5s0f0"

6: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 24:8a:07:a5:28:f0 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
0 0 0 0 0 2
TX: bytes packets errors dropped carrier collsns
1406 17 0 0 0 0
vf 0 MAC 00:00:ca:fe:ca:fe, vlan 5, spoof checking off, link-state auto, trust off, query_rss off
RX: bytes packets mcast bcast dropped
1666 29 14 32 0
TX: bytes packets dropped
2880 44 2412

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 9b93ab98 28-Nov-2017 Gal Pressman <galp@mellanox.com>

net/mlx5: Separate ingress/egress namespaces for each vport

Each vport has its own root flow table for the ACL flow tables and root
flow table is per namespace, therefore we should create a namespace for
each vport.

Fixes: efdc810ba39d ("net/mlx5: Flow steering, Add vport ACL support")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 18a89ab7 26-Oct-2017 Gal Pressman <galp@mellanox.com>

net/mlx5e: E-Switch, Use the name of static array instead of its address

Using the address of a static array is the same as using its name (in
this specific use-case), but it's confusing and makes the code less
readable.

Fixes: 1bd27b11c1df ("net/mlx5: Introduce E-switch QoS management")
Fixes: bd77bf1cb595 ("net/mlx5: Add SRIOV VF max rate configuration support")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# e8d31c4d 09-Aug-2017 Mark Bloch <markb@mellanox.com>

net/mlx5: E-Switch, Refactor vport representors initialization

Refactor the init stage of vport representors registration.
vport number and hw id can be assigned by the E-Switch driver and not by
the netdevice driver. While here, make the error path of mlx5_eswitch_init()
a reverse order of the good path, also use kcalloc to allocate an array
instead of kzalloc.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 4c5009c5 18-Oct-2017 Rabie Loulou <rabiel@mellanox.com>

net/mlx5: Initialize destination_flow struct to 0

This is needed in order to enlarge it with more members that will get
value of 0 when not set.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1547f538 18-Aug-2017 Colin Ian King <colin.king@canonical.com>

mlx5: ensure 0 is returned when vport is zero

Currently, if vport is zero then then an uninialized return status
in err is returned. Since the only return status at the end of the
function esw_add_uc_addr is zero for the current set of return paths
we may as well just return 0 rather than err to fix this issue.

Detected by CoverityScan, CID#1452698 ("Uninitialized scalar variable")

Fixes: eeb66cdb6826 ("net/mlx5: Separate between E-Switch and MPFS")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 78249c42 13-Jul-2017 Sagi Grimberg <sagi@grimberg.me>

mlx5: convert to generic pci_alloc_irq_vectors

Now that we have a generic code to allocate an array
of irq vectors and even correctly spread their affinity,
correctly handle cpu hotplug events and more, were much
better off using it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>


# eeb66cdb 04-Jun-2017 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Separate between E-Switch and MPFS

Multi-Physical Function Switch (MPFs) is required for when multi-PF
configuration is enabled to allow passing user configured unicast MAC
addresses to the requesting PF.

Before this patch eswitch.c used to manage the HW MPFS l2 table,
E-Switch always (regardless of sriov) enabled vport(0) (NIC PF) vport's
contexts update on unicast mac address list changes, to populate the PF's
MPFS L2 table accordingly.

In downstream patch we would like to allow compiling the driver without
E-Switch functionalities, for that we move MPFS l2 table logic out
of eswitch.c into its own file, and provide Kconfig flag (MLX5_MPFS) to
allow compiling out MPFS for those who don't want Multi-PF support.

NIC PF netdevice will now directly update MPFS l2 table via the new MPFS
API. VF netdevice has no access to MPFS L2 table, so E-Switch will remain
responsible of updating its MPFS l2 table on behalf of its VFs.

Due to this change we also don't require enabling vport(0) (PF vport)
unicast mac changes events anymore, for when SRIOV is not enabled.
Which means E-Switch is now activated only on SRIOV activation, and not
required otherwise.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jes Sorensen <jsorensen@fb.com>
Cc: kernel-team@fb.com


# a9f7705f 11-Jun-2017 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Unify vport manager capability check

Expose MLX5_VPORT_MANAGER macro to check for strict vport manager
E-switch and MPFS (Multi Physical Function Switch) abilities.

VPORT manager must be a PF with an ethernet link and with FW advertised
vport group manager capability

Replace older checks with the new macro and use it where needed in
eswitch.c and mlx5e netdev eswitch related flows.

The same macro will be reused in MPFS separation downstream patch.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 079adf05 01-Jun-2017 Eran Ben Elisha <eranbe@mellanox.com>

net/mlx5: Clean SRIOV eswitch resources upon VF creation failure

Upon sriov enable, eswitch is always enabled.
Currently, if enable hca failed over all VFs, we would skip eswitch
disable as part of sriov disable, which will lead to resources leak.

Fix it by disabling eswitch if it was enabled (use indication from
eswitch mode).

Fixes: 6b6adee3dad2 ('net/mlx5: SRIOV core code refactoring')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 8963ca45 28-May-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: Avoid blank lines before/after closing/opening braces

Fixed checkpatch complaints on that:

CHECK: Blank lines aren't necessary before a close brace '}'
CHECK: Blank lines aren't necessary after an open brace '{'

and one on missing blank line..

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 11c9c548 04-May-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add cache for HW modify header IDs

Packets belonging to flows which are different by matching may still need
to go through the same header re-write. Add a cache for header re-write IDs
keyed by the binary chain of modify header actions.

The caching is supported for both eswitch and NIC use-cases, where the
actual conversion of the code to use caching comes in next patches, one
per use-case.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 1b9a07ee 10-May-2017 Leon Romanovsky <leon@kernel.org>

{net, IB}/mlx5: Replace mlx5_vzalloc with kvzalloc

Commit a7c3e901a46f ("mm: introduce kv[mz]alloc helpers") added
proper implementation of mlx5_vzalloc function to the MM core.

This made the mlx5_vzalloc function useless, so let's remove it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 0a0ab1d2 28-Feb-2017 Eli Cohen <eli@mellanox.com>

net/mlx5: E-Switch, Avoid redundant memory allocation

struct esw_mc_addr is a small struct that can be part of struct
mlx5_eswitch. Define it as a field and not as a pointer and save the
kzalloc call and then error flow handling.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 7768d197 25-Sep-2016 Roi Dayan <roid@mellanox.com>

net/mlx5: E-Switch, Add control for encapsulation

Implement the devlink e-switch encapsulation control set and get
callbacks. Apply the value set by the user on the switchdev offloads
mode when creating the fast FDB table where offloaded rules will be set.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# b3ba5149 12-Apr-2017 Erez Shitrit <erezsh@mellanox.com>

net/mlx5: Refactor create flow table method to accept underlay QP

IB flow tables need the underlay qp to perform flow steering.
Here we change the API of the flow tables creation to accept the
underlay QP number as a parameter in order to support IB (IPoIB) flow
steering.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eff596da 12-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: Return EOPNOTSUPP when failing to get steering name-space

When we fail to retrieve a hardware steering name-space, the returned error
code should say that this operation is not supported. Align the various
places in the driver where this call is made to this convention.

Also, make sure to warn when we fail to retrieve a SW (ANCHOR) name-space.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 9eb78923 11-Jan-2017 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: Change ENOTSUPP to EOPNOTSUPP

As ENOTSUPP is specific to NFS, change the return error value to
EOPNOTSUPP in various places in the mlx5 driver.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Suggested-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# c9497c98 15-Dec-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Add support for setting VF min rate

Add support for SRIOV VF min rate guarantee by using the TSAR BW share
weights mechanism.

The TSAR BW share vport attribute represents the weight of that vport
among the other vports weights which means that the actual vport BW
percentage is the same vport weight percentage among the total vports
weights sum.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>


# 10543365 09-Oct-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Add support to s-tag in mlx5 firmware interface

Add svlan_tag and rename vlan_tag to cvlan_tag in flow table entry
match param.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>


# ccce1700 28-Dec-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Prevent setting multicast macs for VFs

Need to check that VF mac address entered by the admin user is either
zero or unicast mac.
Multicast mac addresses are prohibited.

Fixes: 77256579c6b4 ('net/mlx5: E-Switch, Introduce Vport administration functions')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bffaa916 22-Nov-2016 Roi Dayan <roid@mellanox.com>

net/mlx5: E-Switch, Add control for inline mode

Implement devlink show and set of HW inline-mode.
The supported modes: none, link, network, transport.
We currently support one mode for all vports so set is done on all vports.
When eswitch is first initialized the inline-mode is queried from the FW.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a54e20b4 07-Nov-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5e: Add basic TC tunnel set action for SRIOV offloads

In mlx5 HW, encapsulation is offloaded by the steering rule having
index into an encapsulation table containing the entire set of headers
to be added by the HW. The driver sets these headers in a buffer when we
are offloading the action.

The code maintains mlx5_encap_entry for each encap header it has
encountered when attempted to offload TC tunnel set action.

This entry maintains a linked list of all the flows sharing the same
encap header, when the last flow is removed from the list the encap
entry is removed.

The actual encap_header is allocated by the driver in the hardware only
if we have layer two neighbour info when the encap entry is created.
While the flow is in the driver, the driver holds a reference on the
neighbour.

When a new flow with encap action is inserted, the code first checks if
the required encap entry exists according to the tunnel set parameters.
If it does the encap is shared, otherwise a new mlx5_encap_entry is
created.

TC action parsing implementation in the driver assumes that tunnel set
action is provided in the same order set by the user, e.g before the
mirred_redirect action.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 66958ed9 07-Nov-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5: Support encap id when setting new steering entry

In order to support steering rules which add encapsulation headers,
encap_id parameter is needed.

Add new mlx5_flow_act struct which holds action related parameter:
action, flow_tag and encap_id. Use mlx5_flow_act struct when adding a new
steering rule.
This patch doesn't change any functionality.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c9f1b073 07-Nov-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5: Add creation flags when adding new flow table

When creating flow tables, allow the caller to specify creation flags.
Currently no flags are used and as such this patch doesn't add any new
functionality.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 74491de9 31-Aug-2016 Mark Bloch <markb@mellanox.com>

net/mlx5: Add multi dest support

Currently when calling mlx5_add_flow_rule we accept
only one flow destination, this commit allows to pass
multiple destinations.

This change forces us to change the return structure to a more
flexible one. We introduce a flow handle (struct mlx5_flow_handle),
it holds internally the number for rules created and holds an array
where each cell points the to a flow rule.

From the consumers (of mlx5_add_flow_rule) point of view this
change is only cosmetic and requires only to change the type
of the returned value they store.

From the core point of view, we now need to use a loop when
allocating and deleting rules (e.g given to us a flow handler).

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# bd77bf1c 11-Aug-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Add SRIOV VF max rate configuration support

Implement the vf set rate ndo by modifying the TSAR vport rate limit.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 1bd27b11 11-Aug-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Introduce E-switch QoS management

Add TSAR to the eswitch which will act as the vports rate limiter.
Create/Destroy TSAR on Enable/Dsiable SRIOV.
Attach/Detach vport to eswitch TSAR on Enable/Disable vport.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# 247f139c 25-Oct-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Change the acl enable prototype to return status

The Ingress/Egress ACL enable function may fail and it should return
status to its caller to avoid NULL pointer dereference.

Fixes: f942380c1239 ('net/mlx5: E-Switch, Vport ingress/egress ACLs rules for spoofchk')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e33dfe31 22-Sep-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: E-Switch, Allow fine tuning of eswitch vport push/pop vlan

The HW can be programmed to push vlan, pop vlan or both.

A factorization step towards using the push/pop capabilties in the
eswitch offloads mode. This patch doesn't add new functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4eea37d7 18-Sep-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: E-Switch, Fix error flow in the SRIOV e-switch init code

When enablement of the SRIOV e-switch in certain mode (switchdev or legacy)
fails, we must set the mode to none. Otherwise, we'll run into double free
based crashes when further attempting to deal with the e-switch (such
as when disabling sriov or unloading the driver).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 766a0e97 18-Sep-2016 Baoyou Xie <baoyou.xie@linaro.org>

net/mlx5: clean function declarations in eswitch.c up

We get 2 warnings when building kernel with W=1:
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:463:5: warning: no previous prototype for 'esw_offloads_init' [-Wmissing-prototypes]
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:521:6: warning: no previous prototype for 'esw_offloads_cleanup' [-Wmissing-prototypes]

In fact, both functions are declared in
drivers/net/ethernet/mellanox/mlx5/core/eswitch.c,but should be
declared in a header file, thus can be recognized in other file.

So this patch moves the declarations into
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h

Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1ab2068a 09-Sep-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Implement vports admin state backup/restore

Save the user configuration in the vport sturct.
Restore saved old configuration upon vport enable.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c2d6e31a 09-Sep-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Align sriov/eswitch modules with the new load/unload flow.

Init/cleanup sriov/eswitch in the core software context init/cleanup
flows.
Attach/detach sriov/eswitch in the core load/unload flows.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 62a9b90a 09-Sep-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Implement eswitch attach/detach flows

Needed for lightweight and modular internal/pci error handling.
Implement eswitch attach function which allocates/starts hw related
resources.
Implement eswitch detach function which releases/stops hw related
resources.
Init/cleanup function only handle eswitch software context allocation
and destruction.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f96750f8 18-Aug-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: E-Switch, Avoid ACLs in the offloads mode

When we are in the switchdev/offloads mode, HW matching is done as
dictated by the offloaded rules and hence we don't need to enable
the ACLs mechanism used by the legacy mode.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2974ab6e 28-Jul-2016 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Improve driver log messages

Remove duplicate pci dev name printing in mlx5_core_err.
Use mlx5_core_{warn,info,err} where possible to have the pci info in the
driver log messages.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Parvi Kaustubhi <parvik@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# c4f287c4 19-Jul-2016 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Unify and improve command interface

Now as all commands use mlx5 ifc interface, instead of doing two calls
for executing a command we embed command status checking into
mlx5_cmd_exec to simplify the interface.

Also we do here some cleanup for redundant software structures
(inbox/outbox) and functions and improved command failure output.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>


# c5bb1730 04-Jul-2016 Maor Gottlieb <maorg@mellanox.com>

net/mlx5: Refactor mlx5_add_flow_rule

Reduce the set of arguments passed to mlx5_add_flow_rule
by introducing flow_spec structure.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 127ea380 01-Jul-2016 Hadar Hen Zion <hadarh@mellanox.com>

net/mlx5: Add Representors registration API

Introduce E-Switch registration/unregister representors functions.

Those functions are called by the mlx5e driver when the PF NIC is
created upon pci probe action regardless of the E-Switch mode (NONE,
LEGACY or OFFLOADS).

Adding basic E-Switch database that will hold the vport represntors
upon creation.

This patch doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c930a3ad 01-Jul-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5e: Add devlink based SRIOV mode changes

Implement handlers for the devlink commands to get and set the SRIOV
E-Switch mode.

When turning to the switchdev/offloads mode, we disable the e-switch
and enable it again in the new mode, create the NIC offloads table
and create VF reps.

When turning to legacy mode, we remove the VF reps and the offloads
table, and re-initiate the e-switch in it's legacy mode.

The actual creation/removal of the VF reps is done in downstream patches.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 69697b6e 01-Jul-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: E-Switch, Add support for the sriov offloads mode

Unlike the legacy mode, here, forwarding rules are not learned by the
driver per events on macs set by VFs/VMs into their vports, but rather
should be programmed by higher-level SW entities.

Saying that, still, in the offloads mode (SRIOV_OFFLOADS), two flow
groups are created by the driver for management (slow path) purposes:

The first group will be used for sending packets over e-switch vports
from the host OS where the e-switch management code runs, to be
received by VFs.

The second group will be used by a miss rule which forwards packets toward
the e-switch manager. Further logic will trap these packets such that
the receiving net-device as seen by the networking stack is the representor
of the vport that sent the packet over the e-switch data-path.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6ab36e35 01-Jul-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: E-Switch, Add operational mode to the SRIOV e-Switch

Define three modes for the SRIOV e-switch operation, none (SRIOV_NONE,
none of the VF vports are enabled), legacy (SRIOV_LEGACY, the current mode)
and sriov offloads (SRIOV_OFFLOADS). Currently, when in SRIOV, only the
legacy mode is supported, where steering rules are of the form:

destination mac --> VF vport

This patch does not change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 62e3c24a 09-Jun-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: E-Switch, always set mc_promisc for allmulti vports

Set the mc_promisc flag also in the case of adding new mc address to
existing allmulti vport.

Fixes: a35f71f27a61 ('net/mlx5: E-Switch, Implement promiscuous rx modes vf request handling')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 23898c76 09-Jun-2016 Noa Osherovich <noaos@mellanox.com>

net/mlx5: E-Switch, Modify node guid on vf set MAC

In RoCE, the RDMA-CM needs the node guid to establish connection
between nodes.
Today, the node guid exposed to mlx5 Ethernet VFs is zero, therefore
RDMA-CM on the VF is broken.

Whenever the administrator sets a MAC for a VF, derive the node guid
from it and set it as well in the following way:
MAC: e4:1d:2d:b3:f4:01 -> node_guid: e4:1d:2d:ff:fe:b3:f4:01

Fixes: 77256579c6b43 ('net/mlx5: E-Switch, Introduce Vport...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 25fff58c 09-Jun-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: E-Switch, Fix vport enable flow

Reorder vport enable flow to mark the vport as enabled before calling
the vport change handler which was modified to handle the case for
when vport is not enabled.

This fixes the case for when the PF netdev is open before sriov is
enabled, once sriov is enabled at esw_enable_vport,
esw_vport_change_handle_locked didn't read the PF context since it
thought the PF vport was not enabled.

When we enable the vport, arming for events is not required anymore,
since it's done on the vport change handle

Fixes: 586cfa7f1d58 ('net/mlx5: E-Switch, Use vport event handler for vport cleanup')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3f42ac66 09-Jun-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: E-Switch, Use the correct error check on returned pointers

The mlx5 flow-steering API (mlx5_create_flow_table/group/rule) never
returns null pointer on error. Even if it was doing that, checking
for IS_ERR_OR_NULL(p) and then returning PTR_ERR(p) would have cause
bugs, since PTR_ERR(NULL) --> success, crash.

To make things more robust and protect against related future bugs,
convert all IS_ERR_OR_NULL checks on returned values to IS_ERR.

Fixes: 5742df0f7dbe ('net/mlx5: E-Switch, Introduce VST vport ingress/egress ACLs')
Fixes: 86d722ad2c3b ('net/mlx5: Use flow steering infrastructure for mlx5_en')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3fe3d819 09-Jun-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/mlx5: E-Switch, Use the correct free() function

We must use kvfree() for something that could have been allocated with vzalloc(),
do that.

Fixes: 5742df0f7dbe ('net/mlx5: E-Switch, Introduce VST vport ingress/egress ACLs')
Fixes: 86d722ad2c3b ('net/mlx5: Use flow steering infrastructure for mlx5_en')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1edc57e2 03-May-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: E-Switch, Implement trust vf ndo

- Add support to configure trusted vf attribute through trust_vf_ndo.

- Upon VF trust setting change we update vport context to refresh
allmulti/promisc or any trusted vf attributes that we didn't trust the
VF for before.

- Lock the eswitch state lock on vport event in order to synchronise the
vport context updates , this will prevent contention with vport trust
setting change which will trigger vport mac list update.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a35f71f2 03-May-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: E-Switch, Implement promiscuous rx modes vf request handling

Add promisc_change as a trigger to vport context change event.
Add set vport promisc/allmulti functions to add vport to promiscuous
flowtable rules.
Upon promisc/allmulti rx mode vf request add the vport to
the relevant promiscuous group (Allmulti/Promisc group) so the relevant
traffic will be forwarded to it.
Upon allmulti vf request add the vport to each existing multicast fdb
rule.
Upon adding/removing mcast address from a vport, update all other
allmulti vports.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 78a9199b 03-May-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: E-Switch, Add promiscuous and allmulti FDB flowtable groups

Add promiscuous and allmulti steering groups in FDB table.
Besides the full match L2 steering rules group, we added
two more groups to catch the "miss" rules traffic:
* Allmulti group: One rule that forwards any mcast traffic coming from
either uplink or VFs/PF vports
* Promisc group: One rule that forwards all unmatched traffic coming
from uplink.

Needed for downstream privileged VF promisc and allmulti support.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 586cfa7f 03-May-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: E-Switch, Use vport event handler for vport cleanup

Remove the usage of explicit cleanup function and use existing vport
change handler. Calling vport change handler while vport
is disabled will cleanup the vport resources.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 01f51f22 03-May-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: E-Switch, Enable/disable ACL tables on demand

Enable ingress/egress ACL tables only when we need to configure ACL
rules.
Disable ingress/egress ACL tables once all ACL rules are removed.

All VF outgoing/incoming traffic need to go through the ingress/egress ACL
tables.
Adding/Removing these tables on demand will save unnecessary hops in the
flow steering when the ACL tables are empty.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f942380c 03-May-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: E-Switch, Vport ingress/egress ACLs rules for spoofchk

Configure ingress and egress vport ACL rules according to spoofchk
admin parameters.

Ingress ACL flow table rules:
if (!spoofchk && !vst) allow all traffic.
else :
1) one of the following rules :
* if (spoofchk && vst) allow only untagged traffic with smac=original
mac sent from the VF.
* if (spoofchk && !vst) allow only traffic with smac=original mac sent
from the VF.
* if (!spoofchk && vst) allow only untagged traffic.
2) drop all traffic that didn't hit #1.

Add support for set vf spoofchk ndo.

Add non zero mac validation in case of spoofchk to set mac ndo:
when setting new mac we need to validate that the new mac is
not zero while the spoofchk is on because it is illegal
combination.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dfcb1ed3 03-May-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode

Configure ingress and egress vport ACL rules according to
vlan and qos admin parameters.

Ingress ACL flow table rules:
1) drop any tagged packet sent from the VF
2) allow other traffic (default behavior)

Egress ACL flow table rules:
1) allow only tagged traffic with vlan_tag=vst_vid.
2) drop other traffic.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5742df0f 03-May-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: E-Switch, Introduce VST vport ingress/egress ACLs

Create egress/ingress ACLs per VF vport at vport enable.

Ingress ACL:
- one flow group to drop all tagged traffic in VST mode.

Egress ACL:
- one flow group that allows only untagged traffic with
smac that is equals to the original mac (anti-spoofing).
- one flow group that allows only untagged traffic.
- one flow group that allows only smac that is equals
to the original mac (anti-spoofing).
(note: only one of the above group has active rule)
- star rule will be used to drop all other traffic.

By default no rules are generated, unless VST is explicitly requested.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 761e205b 03-May-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: E-Switch, Fix error flow memory leak

Fix memory leak in case query nic vport command failed.

Fixes: 81848731ff40 ('net/mlx5: E-Switch, Add SR-IOV (FDB) support')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 831cae1d 03-May-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: E-Switch, Replace vport spin lock with synchronize_irq()

Vport spin lock can be replaced with synchronize_irq() in the right
place, this will remove the need of locking inside irq context.
Locking in esw_enable_vport is not required since vport events are yet
to be enabled, and at esw_disable_vport it is sufficient to
synchronize_irq() to guarantee no further vport events handlers will be
scheduled.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# efdc810b 03-May-2016 Mohamad Haj Yahia <mohamad@mellanox.com>

net/mlx5: Flow steering, Add vport ACL support

Update the relevant flow steering device structs and commands to
support vport.
Update the flow steering core API to receive vport number.
Add ingress and egress ACL flow table name spaces.
Add ACL flow table support:
* ACL (Access Control List) flow table is a table that contains
only allow/drop steering rules.

* We have two types of ACL flow tables - ingress and egress.

* ACLs handle traffic sent from/to E-Switch FDB table, Ingress refers to
traffic sent from Vport to E-Switch and Egress refers to traffic sent
from E-Switch to vport.

* Ingress ACL flow table allow/drop rules is checked against traffic
sent from VF.

* Egress ACL flow table allow/drop rules is checked against traffic sent
to VF.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d63cd286 28-Apr-2016 Maor Gottlieb <maorg@mellanox.com>

net/mlx5: Add user chosen levels when allocating flow tables

Currently, consumers of the flow steering infrastructure can't
choose their own flow table levels and are limited to one
flow table per level. This just waste levels.
Instead, we introduce here the possibility to use multiple
flow tables in a level. The user is free to connect these
flow tables, while following the rule (FTEs in FT of level x
could only point to FTs of level y where y > x).

In addition this patch switch the order of the create/destroy
flow tables of the NIC(vlan and main).

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 86d722ad 10-Dec-2015 Maor Gottlieb <maorg@mellanox.com>

net/mlx5: Use flow steering infrastructure for mlx5_en

Expose the new flow steering API and remove the old
one.

Few changes are required:

1. The Ethernet flow steering follows the existing implementation, but uses
the new steering API. The old flow steering implementation is removed.

2. Move the E-switch FDB management to use the new API.

3. When driver is loaded call to mlx5_init_fs which initialize
the flow steering tree structure, open namespaces for NIC receive
and for E-switch FDB.

4. Call to mlx5_cleanup_fs when the driver is unloaded.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3b751a2a 01-Dec-2015 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: E-Switch, Introduce get vf statistics

Add support to get VF statistics using query vport
counter command.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9e7ea352 01-Dec-2015 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: E-Switch, Introduce set vport vlan (VST mode)

Add query and modify functions to control client vlan and qos
striping or insertion, in E-Switch vports contexts.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 77256579 01-Dec-2015 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: E-Switch, Introduce Vport administration functions

Implement set VF mac/link state and query VF config
to be used later in nedev VF ndos or any other management API.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 81848731 01-Dec-2015 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: E-Switch, Add SR-IOV (FDB) support

Enabling E-Switch SRIOV for nvfs+1 vports.

Create E-Switch FDB for L2 UC/MC mac steering between VFs/PF and
external vport (Uplink).

FDB contains forwarding rules such as:
UC MAC0 -> vport0(PF).
UC MAC1 -> vport1.
UC MAC2 -> vport2.
MC MACX -> vport0, vport2, Uplink.
MC MACY -> vport1, Uplink.

For unmatched traffic FDB has the following default rules:
Unmached Traffic (src vport != Uplink) -> Uplink.
Unmached Traffic (src vport == Uplink) -> vport0(PF).

FDB rules population:
Each NIC vport (VF) will notify E-Switch manager of its UC/MC vport
context changes via modify vport context command, which will be
translated to an event that will be handled by E-Switch manager (PF)
which will update FDB table accordingly.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 073bb189 01-Dec-2015 Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Introducing E-Switch and l2 table

E-Switch is the software entity that represents and manages ConnectX4
inter-HCA ethernet l2 switching.

E-Switch has its own Virtual Ports, each Vport/vNIC/VF can be
connected to the device through a vport of an e-switch.

Each e-switch is managed by one vNIC identified by
HCA_CAP.vport_group_manager (usually it is the PF/vport[0]),
and its main responsibility is to forward each packet to the
right vport.

e-Switch needs to manage its own l2-table and FDB tables.

L2 table is a flow table that is managed by FW, it is needed for
Multi-host (Multi PF) configuration for inter HCA switching between
PFs.

FDB table is a flow table that is totally managed by e-Switch driver,
its main responsibility is to switch packets between e-Swtich internal
vports and uplink vport that belong to the same.

This patch introduces only e-Swtich l2 table management, FDB managemnt
will come later when ethernet SRIOV/VFs will be enabled.

preperation for ethernet sriov and l2 table management.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>