History log of /linux-master/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c
Revision Date Author Comments
# 7a3ce807 09-Aug-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, fix peer entry ageing in LAG mode

With current implementation in single FDB LAG mode all packets are
processed by eswitch 0 rules. As such, 'peer' FDB entries receive the
packets for rules of other eswitches and are responsible for updating the
main entry by sending SWITCHDEV_FDB_ADD_TO_BRIDGE notification from their
background update wq task. However, this introduces a race condition when
non-zero eswitch instance decides to delete a FDB entry, sends
SWITCHDEV_FDB_DEL_TO_BRIDGE notification, but another eswitch's update task
refreshes the same entry concurrently while its async delete work is still
pending on the workque. In such case another SWITCHDEV_FDB_ADD_TO_BRIDGE
event may be generated and entry will remain stuck in FDB marked as
'offloaded' since no more SWITCHDEV_FDB_DEL_TO_BRIDGE notifications are
sent for deleting the peer entries.

Fix the issue by synchronously marking deleted entries with
MLX5_ESW_BRIDGE_FLAG_DELETED flag and skipping them in background update
job.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 88d162b4 03-May-2023 Roi Dayan <roid@nvidia.com>

net/mlx5: Devcom, Infrastructure changes

Update devcom infrastructure to be more generic, without
depending on max supported ports definition or a device guid,
and also more encapsulated so callers don't need to pass
the register devcom component id per event call.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 791eb782 25-May-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, expose FDB state via debugfs

For debugging purposes expose offloaded FDB state (flags, counters, etc.)
via debugfs inside 'esw' root directory. Example debugfs file output:

$ cat mlx5/0000\:08\:00.0/esw/bridge/bridge1/fdb
DEV MAC VLAN PACKETS BYTES LASTUSE FLAGS
enp8s0f0_1 e4:0a:05:08:00:06 2 2 204 4295567112 0x0
enp8s0f0_0 e4:0a:05:08:00:03 2 3 278 4295567112 0x0

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ade19f0d 26-May-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, pass net device when linking vport to bridge

Following patch requires access to additional data in bridge net_device.
Pass the whole structure down the stack instead of adding necessary fields
as function arguments one-by-one.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 90ca127c 02-Jun-2023 Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Devcom, introduce devcom_for_each_peer_entry

Introduce generic APIs which will retrieve all peers.
This API replace mlx5_devcom_get/release_peer_data which retrieve
only a single peer.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 55f3e740 14-Feb-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, add tracepoints for multicast

Pass target struct net_device to mdb attach/detach handler in order to
expose the port name to the new tracepoints. Implemented following
tracepoints:

- Attach mdb to port.
- Detach mdb from port.

Usage example:

># cd /sys/kernel/debug/tracing
># echo mlx5:mlx5_esw_bridge_port_mdb_attach >> set_event
># cat trace
...
kworker/0:0-19071 [000] ..... 259004.253848: mlx5_esw_bridge_port_mdb_attach: net_device=enp8s0f0_0 addr=33:33:ff:00:00:01 vid=0 num_ports=1 offloaded=1

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 70f0302b 03-Feb-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, implement mdb offload

Implement support for add/del SWITCHDEV_OBJ_ID_PORT_MDB events. For mdb
destination addresses configure egress table rules to replicate to per-port
multicast tables of all ports that are member of the multicast group as
illustrated by 'MDB1' rule in the following diagram:

+--------+--+
+---------------------------------------> Port 1 | |
| +-^------+--+
| |
| |
+-----------------------------------------+ | +---------------------------+ |
| EGRESS table | | +--> PORT 1 multicast table | |
+----------------------------------+ +-----------------------------------------+ | | +---------------------------+ |
| INGRESS table | | | | | | | |
+----------------------------------+ | dst_mac=P1,vlan=X -> pop vlan, goto P1 +--+ | | FG0: | |
| | | dst_mac=P1,vlan=Y -> pop vlan, goto P1 | | | src_port=dst_port -> drop | |
| src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2 +--+ | | FG1: | |
| ... | | dst_mac=P2,vlan=Y -> goto P2 | | | | VLAN X -> pop, goto port | |
| | | dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+ | ... | |
+----------------------------------+ | | | | | VLAN Y -> pop, goto port +-------+
+-----------------------------------------+ | | | FG3: |
| | | matchall -> goto port |
| | | |
| | +---------------------------+
| |
| |
| | +--------+--+
+---------------------------------------> Port 2 | |
| +-^------+--+
| |
| |
| +---------------------------+ |
+--> PORT 2 multicast table | |
+---------------------------+ |
| | |
| FG0: | |
| src_port=dst_port -> drop | |
| FG1: | |
| VLAN X -> pop, goto port | |
| ... | |
| | |
| FG3: | |
| matchall -> goto port +-------+
| |
+---------------------------+

MDB is managed by extending mlx5 bridge to store an entry in
mlx5_esw_bridge->mdb_list linked list (used to iterate over all offloaded
MDBs) and mlx5_esw_bridge->mdb_ht hash table (used to lookup existing MDB
by MAC+VLAN). Every MDB entry can be attached to arbitrary amount of bridge
ports that are stored in mlx5_esw_bridge_mdb_entry->ports xarray in order
to allow both efficient lookup of the port and also iteration over all
ports that the entry is attached to. Every time MDB is attached/detached
to/from a port, the hardware rule is recreated with list of destinations
corresponding to all attached ports. When the entry is detached from the
last port it is removed from mdb and destroyed which means that the ports
xarray also acts as implicit reference counting mechanism.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b5e80625 22-Feb-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, support multicast VLAN pop

When VLAN with 'untagged' flag is created on port also provision the
per-port multicast table rule to pop the VLAN during packet replication.
This functionality must be in per-port table because some subset of ports
that are member of multicast group can require just a match on VLAN (trunk
mode) while other subset can be configured to remove the VLAN tag from
packets received on the ports (access mode).

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 272ecfc9 22-Feb-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, add per-port multicast replication tables

Multicast replication requires adding one more level of FDB_BR_OFFLOAD
priority flow tables. The new level is used for per-port multicast-specific
tables that have following flow groups structure (flow highest to lowest
priority):

- Flow group of size one that matches on source port metadata. This will
have a static single rule that prevent packets from being replicated to
their source port.

- Flow group of size one that matches all packets and forwards them to the
port that owns the table.

Initialize the table dynamically on all bridge ports when adding a port to
the bridge that has multicast enabled and on all existing bridge ports when
receiving multicast enable notification.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 18c2916c 21-Feb-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, snoop igmp/mld packets

Handle SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED attribute notification to
dynamically toggle bridge multicast offload. Set new
MLX5_ESW_BRIDGE_MCAST_FLAG bridge flag when multicast offload is enabled.
Put multicast-specific code into new bridge_mcast.c file.

When initializing bridge multicast pipeline create a static rule for
snooping on IGMP traffic and three rules for snooping on MLD traffic (for
query, report and done message types). Note that matching MLD traffic
requires having flexparser MLX5_FLEX_PROTO_ICMPV6 capability enabled.

By default Linux bridge is created with multicast enabled which can be
modified by 'mcast_snooping' argument:

$ ip link set name my_bridge type bridge mcast_snooping 0

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# b99c4ef2 13-Mar-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, extract code to lookup parent bridge of port

The pattern when function looks up a port by vport_num+vhca_id tuple in
order to just obtain its parent bridge is repeated multiple times in
bridge.c file. Further commits in this series use the pattern even more.
Extract the pattern to standalone mlx5_esw_bridge_from_port_lookup()
function to improve code readability.

This commits doesn't change functionality.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 6767c97d 19-Feb-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, move additional data structures to priv header

Following patches in series will require accessing flow tables and groups
sizes, table levels and struct mlx5_esw_bridge from new the new source file
dedicated to multicast code. Expose these data in bridge_priv.h to reduce
clutter in following patches that will implement the actual functionality.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9071b423 05-Jan-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, increase bridge tables sizes

Bridge ingress and egress tables got more flow groups recently for QinQ
support and will get more in following patches of this series. Increase the
sizes of the tables to allow offloading more flows in each mode.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# da0c5242 26-Jan-2023 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, fix ageing of peer FDB entries

SWITCHDEV_FDB_ADD_TO_BRIDGE event handler that updates FDB entry 'lastuse'
field is only executed for eswitch that owns the entry. However, if peer
entry processed packets at least once it will have hardware counter 'used'
value greater than entry 'lastuse' from that point on, which will cause FDB
entry not being aged out.

Process the event on all eswitch instances.

Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ea645f97 27-Oct-2022 Roi Dayan <roid@nvidia.com>

net/mlx5: Bridge, Use debug instead of warn if entry doesn't exists

There is no need for the warn if entry already removed.
Use debug print like in the update flow.
Also update the messages so user can identify if the it's
from the update flow or remove flow.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9c0ca9ba 03-Jun-2022 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, implement QinQ support

Implement support for new 802.1ad VLAN protocol type. Create new flow
groups that handle svlan tags. Create FDB flows with svlan tag match when
bridge VLAN is set to QinQ.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c5fcac93 03-Jun-2022 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, implement infrastructure for VLAN protocol change

Current implementation only supports 802.1Q VLAN Ethernet protocol. That
protocol type is assumed by default and
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL notification is ignored. To prepare
for supporting 802.1ad protocol in following patches implement the
necessary infrastructure to allow the user to dynamically change the VLAN
protocol:

- Handle SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL notification by flushing
FDB and re-creating VLAN modify header actions with new protocol. In this
patch the only allowed dynamic VLAN protocol value is ETH_P_8021Q.

- Save current VLAN protocol in per-bridge instance variable. Use the
dynamic variable instead of hardcoded values in mlx5 bridge code. Create
VLAN flow groups and flows based on current mlx5_esw_bridge->vlan_proto
value instead of assuming 802.1Q ethertype.

- Extract common flow group creation code into dedicated functions in order
to be reused for creating QinQ groups in following patches.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 5a9db8d4 03-Jun-2022 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, extract VLAN push/pop actions creation

Following patches in series need to re-create VLAN actions when user
changes VLAN protocol. Extract the code that creates VLAN push/pop actions
into dedicated function in order to be reused in next patch.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d4893978 03-Jun-2022 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, rename filter fg to vlan_filter

Following patches in series introduce new qinq filtering group. To improve
readability rename the existing group in function, variable and definition
names to include "vlan" in order to make it easy to distinguish from
upcoming qinq group.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 55d3654c 26-May-2022 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, refactor groups sizes and indices

Following patches in the series introduce additional flow groups for QinQ
support. With increased number of groups it becomes cumbersome to calculate
groups sizes as fractions of the table size. Instead, manually define sizes
of specific group types and ensure that totals are still correct by static
assertions. Having specific table size is important for firmware resource
management.

This commit doesn't change functionality.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 04f8c12f 06-Jan-2022 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, take rtnl lock in init error handler

The mlx5_esw_bridge_cleanup() is expected to be called with rtnl lock
taken, which is true for mlx5e_rep_bridge_cleanup() function but not for
error handling code in mlx5e_rep_bridge_init(). Add missing rtnl
lock/unlock calls and extend both mlx5_esw_bridge_cleanup() and its dual
function mlx5_esw_bridge_init() with ASSERT_RTNL() to verify the invariant
from now on.

Fixes: 7cd6a54a8285 ("net/mlx5: Bridge, handle FDB events")
Fixes: 19e9bfa044f3 ("net/mlx5: Bridge, add offload infrastructure")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3518c83f 19-Oct-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, support replacing existing FDB entry

The SWITCHDEV_FDB_ADD_TO_DEVICE is used for both adding new and replacing
existing entry. Implement support for replacing existing FDB entries in
mlx5 offload code.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 2deda2f1 19-Oct-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, extract code to lookup and del/notify entry

Following two patterns in bridge code are used in multiple places where
similar code is duplicated:

- Lookup FDB entry from hashtable by address+vid pair.

- Notify software bridge and then delete existing FDB entry.

In order to improve code quality and prepare for following patch series
that also uses described patterns, extract the codes to dedicated helper
functions.

This commit doesn't change functionality.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 17ac528d 12-Oct-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, provide flow source hints

Currently, SMFS mode doesn't support rx-loopback flows which causes bridge
egress rules to be rejected because without hint rules for both rx and tx
destinations are created by default. Provide explicit flow source hints for
compatibility with SMFS.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 575baa92 08-Sep-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, pop VLAN on egress table miss

Create lowest priority flow group in egress table with single rule that
matches on special reg_c1 value that is set on ingress VLAN push with
single action that pops VLAN. The flow destination is skip table that is
used to skip any further processing of packet in FDB bridge priority.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 5249001d 01-Sep-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, mark reg_c1 when pushing VLAN

On ingress VLAN push also assign value 0x7FE to reg_c1 tunnel id+opts
bits (tunnel id 0, which is not a valid tunnel id, and option 0x7FE which
was reserved by one of previous patches in the series). In following patch
the reg value is matched on egress miss to restore the packet to its
original state by removing the VLAN before passing it to the software data
path.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 64fc4b35 02-Sep-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, extract VLAN pop code to dedicated functions

Following patches in series need to pop VLAN when packet misses on egress.
To reuse existing bridge VLAN pop handling code, extract it to dedicated
helpers mlx5_esw_bridge_pkt_reformat_vlan_pop_supported() and
mlx5_esw_bridge_pkt_reformat_vlan_pop_create().

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a1a6e721 02-Sep-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, refactor eswitch instance usage

Several functions in bridge.c excessively obtain pointer to parent eswitch
instance by dereferencing br_offloads->esw on every usage and following
patches in this series add even more usages of eswitch. Introduce local
variable 'esw' and use it instead.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ff9b7521 17-Jul-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, support LAG

Allow adding bond net devices to mlx5 bridge with following changes:

- Modify bridge representor code to obtain uplink represetor that belongs
to eswitch that is registered for notification. Require representor to be
in shared FDB mode. If representor is the lag master, then consider its
port as local, otherwise treat it as peer.

- Use devcom to match on paired eswitch metadata in peer FDB entries. This
is necessary for shared FDB LAG to function since packets are always
received on active eswitch instance as opposed to parent eswitch of port.

- Support for deleting peer flows when receiving
SWITCHDEV_FDB_DEL_TO_BRIDGE notification was implemented in one of previous
patches in series. Now also implement support for handling
SWITCHDEV_FDB_ADD_TO_BRIDGE which can be generated on peer by bridge update
workqueue task in LAG configuration. Refresh the flow 'lastuse' timestamp
to current jiffies when receiving such notification on eswitch that manages
the local FDB entry. This allows peer entries to prevent ageing of the FDB.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c358ea17 25-Jun-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, allow merged eswitch connectivity

Allow connectivity between representors of different eswitch instances that
are attached to same bridge when merged_eswitch capability is enabled. Add
ports of peer eswitch to bridge instance and mark them with
MLX5_ESW_BRIDGE_PORT_FLAG_PEER. Mark FDBs offloaded on peer ports with
MLX5_ESW_BRIDGE_FLAG_PEER flag. Such FDBs can only be aged out on their
local eswitch instance, which then sends SWITCHDEV_FDB_DEL_TO_BRIDGE event.
Listen to the event on mlx5 bridge implementation and delete peer FDBs in
event handler.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# bf3d56d8 23-Jul-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, extract FDB delete notification to function

SWITCHDEV_FDB_DEL_TO_BRIDGE notification is generated in multiple places in
bridge code. Following patch in series changes the condition for the
notification. Extract the notification into dedicated helper function
mlx5_esw_bridge_fdb_del_notify() to only modify it in single place in the
future changes.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 3ee6233e 17-Jun-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, identify port by vport_num+esw_owner_vhca_id pair

Following patches in series allow traffic between vports of different
eswitch instances, which requires addressing bridge port by
vport_num+esw_owner_vhca_id pair since vport_num is only unique
per-eswitch. As a preparation, extend struct mlx5_esw_bridge_port with
'esw_owner_vhca_id' field and use it as part of key for
mlx5_esw_bridge->vports xarray.

With this change we can't rely on switchdev_handle_port_obj_add() helper to
get mlx5 representor from stacked device because we need specifically
representor from parent eswitch that registered the callback to obtain
correct esw_owner_vhca_id. The helper doesn't allow passing additional
parameters to predicate function and doesn't provide access to the notifier
block to obtain eswitch through br_offloads. Implement custom helpers to
obtain mlx5 representor and use them in
mlx5_esw_bridge_port_obj_{add|del|attr_set}() implementations.

Remove direct pointer to parent bridge from struct mlx5_vport as it is no
longer needed.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# a514d173 18-Jul-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, obtain core device from eswitch instead of priv

Following patches in series will pass bond device to bridge, which means
the code can't assume the device is mlx5 representor. Moreover, the core
device can be easily obtained from eswitch instance, so there is no reason
for more complex code that obtains struct mlx5_priv from net_device in
order to use its mdev. Refactor the code to use esw->dev instead of
priv->mdev.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 4de20e9a 17-Jun-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, release bridge in same function where it is taken

Refactor mlx5_esw_bridge_vport_link() to release the bridge instance if
mlx5_esw_bridge_vport_init() returned an error instead of relying on it to
release the bridge. This improves the design because object instance is
taken and released in same layer and simplifies following patches that add
more logic to mlx5_esw_bridge_vport_link().

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c35b57ce 10-Aug-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge

The blamed commit added a new field to struct switchdev_notifier_fdb_info,
but did not make sure that all call paths set it to something valid.
For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE
notifier, and since the 'is_local' flag is not set, it contains junk
from the stack, so the bridge might interpret those notifications as
being for local FDB entries when that was not intended.

To avoid that now and in the future, zero-initialize all
switchdev_notifier_fdb_info structures created by drivers such that all
newly added fields to not need to touch drivers again.

Fixes: 2c4eca3ef716 ("net: bridge: switchdev: include local flag in FDB notifications")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20210810115024.1629983-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 6d8680da 04-Aug-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, fix ageing time

Ageing time is not converted from clock_t to jiffies which results
incorrect ageing timeout calculation in workqueue update task. Fix it by
applying clock_t_to_jiffies() to provided value.

Fixes: c636a0f0f3f0 ("net/mlx5: Bridge, dynamic entry ageing")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 9724fd5d 02-Apr-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, add tracepoints

Move private bridge structures to dedicated headers that is accessible to
bridge tracepoint header. Implemented following tracepoints:

- Initialize FDB entry.
- Refresh FDB entry.
- Cleanup FDB entry.
- Create VLAN.
- Cleanup VLAN.
- Attach port to bridge.
- Detach port from bridge.

Usage example:

># cd /sys/kernel/debug/tracing
># echo mlx5:mlx5_esw_bridge_fdb_entry_init >> set_event
># cat trace
...
kworker/u20:1-96 [001] .... 231.892503: mlx5_esw_bridge_fdb_entry_init: net_device=enp8s0f0_0 addr=e4:fd:05:08:00:02 vid=3 flags=0 lastuse=4294895695

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# cc2987c4 18-Mar-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, filter tagged packets that didn't match tagged fg

With support for pvid vlans in mlx5 bridge it is possible to have rules in
untagged flow group when vlan filtering is enabled. However, such rules can
also match tagged packets that didn't match anything in tagged flow group.
Filter such packets by introducing additional flow group between tagged and
untagged groups. When filtering is enabled on the bridge create additional
flow in vlan filtering flow group and matches tagged packets with specified
source MAC address and redirects them to new "skip" table. The skip table
is new lowest-level empty table that is used to skip all further processing
on packet in bridge priority.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 36e55079 02-Apr-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, support pvid and untagged vlan configurations

Implement support for pushing vlan header into untagged packet on ingress
of port that has pvid configured and support for popping vlan on egress of
port that has the matching vlan configured as untagged. To support such
configurations packet reformat contexts of {INSERT|REMOVE}_HEADER types are
created per such vlan and saved to struct mlx5_esw_bridge_vlan which allows
all FDB entries on particular vlan to share single packet reformat
instance. When initializing FDB entries with pvid or untagged vlan type set
its mlx5_flow_act->pkt_reformat action accordingly.

Flush all flows when removing vlan from port. This is necessary because
even though software bridge removes all FDB entries before removing their
vlan, mlx5 bridge implementation deletes their corresponding flow entries
from hardware in asynchronous workqueue task, which will cause firmware
error if vlan packet reformat context is deleted before all flows that
point to it.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# ffc89ee5 12-Mar-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, match FDB entry vlan tag

Add support for FDB vlan-tagged entries. Extend ingress and egress flow
tables with flow groups to match packet vlan tag. Modify the flow creation
code to include vlan tag, if vlan is configured on port and vlan
configuration is supported for offload.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# d75b9e80 30-Mar-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, implement infrastructure for vlans

Establish all the necessary infrastructure for implementing vlan matching
and vlan push/pop in following patches:

- Add new per-vport struct mlx5_esw_bridge_port that is used to store
metadata for all port vlans. Initialize and cleanup the instance of the
structure when port representor is linked/unliked to bridge. Use xarray to
allow quick vport metadata lookup by vport number.

- Add new per-port-vlan struct mlx5_esw_bridge_vlan that is used to store
vlan-specific data (vid, flags). Handle SWITCHDEV_PORT_OBJ_{ADD|DEL}
switchdev blocking event for SWITCHDEV_OBJ_ID_PORT_VLAN object by
creating/deleting the vlan structure and saving it in per-vport xarray for
quick lookup.

- Implement support for SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING object
attribute that is used to toggle vlan filtering. Remove all FDB entries
from hardware when vlan filtering state is changed.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# c636a0f0 30-Mar-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, dynamic entry ageing

Dynamic FDB entries require capability to age out unused entries. Such
entries are either aged out by kernel software bridge implementation or by
hardware switch that offloaded them (and notified the kernel to mark them
as SWITCHDEV_FDB_ADD_TO_BRIDGE). Leaving ageing to kernel bridge would
result it deleting offloaded dynamic FDB entries every ageing_time period
due to packets being processed by hardware and, consecutively, 'used'
timestamp for FDB entry not being updated. However, since hardware doesn't
support ageing, software solution inside the driver is required.

In order to emulate hardware ageing in driver, extend bridge FDB ingress
flows with counter and create delayed br_offloads->update_work task on
bridge offloads workqueue. Run the task every second, update 'used'
timestamp in software bridge dynamic entry by sending
SWITCHDEV_FDB_ADD_TO_BRIDGE for the entry, if it flow hardware counter
lastuse field was changed since last update. If lastuse wasn't changed for
ageing_time period, then delete the FDB entry and notify kernel bridge by
sending SWITCHDEV_FDB_DEL_TO_BRIDGE notification.

Register blocking switchdev notifier callback and handle attribute set
SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME event to allow user to dynamically
configure bridge FDB entry ageing timeout. Save the value per-bridge in
struct mlx5_esw_bridge. Silently ignore
SWITCHDEV_ATTR_ID_PORT_{PRE_}BRIDGE_FLAGS switchdev event since mlx5 bridge
implementation relies on software bridge for implementing necessary
behavior for all of these flags.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 7cd6a54a 07-Jun-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, handle FDB events

Hardware supported by mlx5 driver doesn't provide learning and requires the
driver to emulate all switch-like behavior in software. As such, all
packets by default go through miss path, appear on representor and get to
software bridge, if it is the upper device of the representor. This causes
bridge to process packet in software, learn the MAC address to FDB and send
SWITCHDEV_FDB_ADD_TO_DEVICE event to all subscribers.

In order to offload FDB entries in mlx5, register switchdev notifier
callback and implement support for both 'added_by_user' and dynamic FDB
entry SWITCHDEV_FDB_ADD_TO_DEVICE events asynchronously using new
mlx5_esw_bridge_offloads->wq ordered workqueue. In workqueue callback
offload the ingress rule (matching FDB entry MAC as packet source MAC) and
egress table rule (matching FDB entry MAC as destination MAC). For ingress
table rule also match source vport to ensure that only traffic coming from
expected bridge port is matched by offloaded rule. Save all the relevant
FDB entry data in struct mlx5_esw_bridge_fdb_entry instance and insert the
instance in new mlx5_esw_bridge->fdb_list list (for traversing all entries
by software ageing implementation in following patch) and in new
mlx5_esw_bridge->fdb_ht hash table for fast retrieval. Notify the bridge
that FDB entry has been offloaded by sending SWITCHDEV_FDB_OFFLOADED
notification.

Delete FDB entry on reception of SWITCHDEV_FDB_DEL_TO_DEVICE event.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>


# 19e9bfa0 02-Apr-2021 Vlad Buslov <vladbu@nvidia.com>

net/mlx5: Bridge, add offload infrastructure

Create new files bridge.{c|h} in en/rep directory that implement bridge
interaction with representor netdevices and handle required
events/notifications, bridge.{c|h} in esw directory that implement all
necessary eswitch offloading infrastructure and works on vport/eswitch
level. Provide new kconfig MLX5_BRIDGE which is automatically selected when
both kernel bridge and mlx5 eswitch configs are enabled.

Provide basic infrastructure for bridge offloads:

- struct mlx5_esw_bridge_offloads - per-eswitch bridge offload structure
that encapsulates generic bridge-offloads data (notifier blocks, ingress
flow table/group, etc.) that is created/deleted on enable/disable eswitch
offloads.

- struct mlx5_esw_bridge - per-bridge structure that encapsulates
per-bridge data (reference counter, FDB, egress flow table/group, etc.)
that is created when first eswitch represetor is attached to new bridge and
deleted when last representor is removed from the bridge as a result of
NETDEV_CHANGEUPPER event.

The bridge tables are created with new priority FDB_BR_OFFLOAD in FDB
namespace. The new priority is between tc-miss and slow path priorities.
Priority consist of two levels: the ingress table that is global per
eswitch and matches incoming packets by src_mac/vid and redirects them to
next level (egress table) that is chosen according to ingress port bridge
membership and matches on dst_mac/vid in order to redirect packet to vport
according to the following diagram:

+
|
+---------v----------+
| |
| FDB_TC_OFFLOAD |
| |
+---------+----------+
|
|
+---------v----------+
| |
| FDB_FT_OFFLOAD |
| |
+---------+----------+
|
|
+---------v----------+
| |
| FDB_TC_MISS |
| |
+---------+----------+
|
+--------------------------------------+
| | |
| +------+ |
| | |
| +------v--------+ FDB_BR_OFFLOAD |
| | INGRESS_TABLE | |
| +------+---+----+ |
| | | match |
| | +---------+ |
| | | | +-------+
| | +-------v-------+ match | | |
| | | EGRESS_TABLE +------------> vport |
| | +-------+-------+ | | |
| | | | +-------+
| | miss | |
| +------+------+ |
| | |
+--------------------------------------+
|
|
+---------v----------+
| |
| FDB_SLOW_PATH |
| |
+---------+----------+
|
v

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>