History log of /linux-master/net/switchdev/switchdev.c
Revision Date Author Comments
# dc489f86 14-Feb-2024 Tobias Waldekranz <tobias@waldekranz.com>

net: bridge: switchdev: Skip MDB replays of deferred events on offload

Before this change, generation of the list of MDB events to replay
would race against the creation of new group memberships, either from
the IGMP/MLD snooping logic or from user configuration.

While new memberships are immediately visible to walkers of
br->mdb_list, the notification of their existence to switchdev event
subscribers is deferred until a later point in time. So if a replay
list was generated during a time that overlapped with such a window,
it would also contain a replay of the not-yet-delivered event.

The driver would thus receive two copies of what the bridge internally
considered to be one single event. On destruction of the bridge, only
a single membership deletion event was therefore sent. As a
consequence of this, drivers which reference count memberships (at
least DSA), would be left with orphan groups in their hardware
database when the bridge was destroyed.

This is only an issue when replaying additions. While deletion events
may still be pending on the deferred queue, they will already have
been removed from br->mdb_list, so no duplicates can be generated in
that scenario.

To a user this meant that old group memberships, from a bridge in
which a port was previously attached, could be reanimated (in
hardware) when the port joined a new bridge, without the new bridge's
knowledge.

For example, on an mv88e6xxx system, create a snooping bridge and
immediately add a port to it:

root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
> ip link set dev x3 up master br0

And then destroy the bridge:

root@infix-06-0b-00:~$ ip link del dev br0
root@infix-06-0b-00:~$ mvls atu
ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a
DEV:0 Marvell 88E6393X
33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . .
33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . .
ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a
root@infix-06-0b-00:~$

The two IPv6 groups remain in the hardware database because the
port (x3) is notified of the host's membership twice: once via the
original event and once via a replay. Since only a single delete
notification is sent, the count remains at 1 when the bridge is
destroyed.

Then add the same port (or another port belonging to the same hardware
domain) to a new bridge, this time with snooping disabled:

root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
> ip link set dev x3 up master br1

All multicast, including the two IPv6 groups from br0, should now be
flooded, according to the policy of br1. But instead the old
memberships are still active in the hardware database, causing the
switch to only forward traffic to those groups towards the CPU (port
0).

Eliminate the race in two steps:

1. Grab the write-side lock of the MDB while generating the replay
list.

This prevents new memberships from showing up while we are generating
the replay list. But it leaves the scenario in which a deferred event
was already generated, but not delivered, before we grabbed the
lock. Therefore:

2. Make sure that no deferred version of a replay event is already
enqueued to the switchdev deferred queue, before adding it to the
replay list, when replaying additions.

Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f2e2857b 19-Jul-2023 Petr Machata <petrm@nvidia.com>

net: switchdev: Add a helper to replay objects on a bridge port

When a front panel joins a bridge via another netdevice (typically a LAG),
the driver needs to learn about the objects configured on the bridge port.
When the bridge port is offloaded by the driver for the first time, this
can be achieved by passing a notifier to switchdev_bridge_port_offload().
The notifier is then invoked for the individual objects (such as VLANs)
configured on the bridge, and can look for the interesting ones.

Calling switchdev_bridge_port_offload() when the second port joins the
bridge lower is unnecessary, but the replay is still needed. To that end,
add a new function, switchdev_bridge_port_replay(), which does only the
replay part of the _offload() function in exactly the same way as that
function.

Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Ivan Vecera <ivecera@redhat.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: bridge@lists.linux-foundation.org
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d62607c3 07-Jun-2022 Jakub Kicinski <kuba@kernel.org>

net: rename reference+tracking helpers

Netdev reference helpers have a dev_ prefix for historic
reasons. Renaming the old helpers would be too much churn
but we can rename the tracking ones which are relatively
recent and should be the default for new code.

Rename:
dev_hold_track() -> netdev_hold()
dev_put_track() -> netdev_put()
dev_replace_track() -> netdev_ref_replace()

Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ec638740 23-Feb-2022 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device

When the switchdev_handle_fdb_event_to_device() event replication helper
was created, my original thought was that FDB events on LAG interfaces
should most likely be special-cased, not just replicated towards all
switchdev ports beneath that LAG. So this replication helper currently
does not recurse through switchdev lower interfaces of LAG bridge ports,
but rather calls the lag_mod_cb() if that was provided.

No switchdev driver uses this helper for FDB events on LAG interfaces
yet, so that was an assumption which was yet to be tested. It is
certainly usable for that purpose, as my RFC series shows:

https://patchwork.kernel.org/project/netdevbpf/cover/20220210125201.2859463-1-vladimir.oltean@nxp.com/

however this approach is slightly convoluted because:

- the switchdev driver gets a "dev" that isn't its own net device, but
rather the LAG net device. It must call switchdev_lower_dev_find(dev)
in order to get a handle of any of its own net devices (the ones that
pass check_cb).

- in order for FDB entries on LAG ports to be correctly refcounted per
the number of switchdev ports beneath that LAG, we haven't escaped the
need to iterate through the LAG's lower interfaces. Except that is now
the responsibility of the switchdev driver, because the replication
helper just stopped half-way.

So, even though yes, FDB events on LAG bridge ports must be
special-cased, in the end it's simpler to let switchdev_handle_fdb_*
just iterate through the LAG port's switchdev lowers, and let the
switchdev driver figure out that those physical ports are under a LAG.

The switchdev_handle_fdb_event_to_device() helper takes a
"foreign_dev_check" callback so it can figure out whether @dev can
autonomously forward to @foreign_dev. DSA fills this method properly:
if the LAG is offloaded by another port in the same tree as @dev, then
it isn't foreign. If it is a software LAG, it is foreign - forwarding
happens in software.

Whether an interface is foreign or not decides whether the replication
helper will go through the LAG's switchdev lowers or not. Since the
lan966x doesn't properly fill this out, FDB events on software LAG
uppers will get called. By changing lan966x_foreign_dev_check(), we can
suppress them.

Whereas DSA will now start receiving FDB events for its offloaded LAG
uppers, so we need to return -EOPNOTSUPP, since we currently don't do
the right thing for them.

Cc: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# acd8df58 21-Feb-2022 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: avoid infinite recursion from LAG to bridge with port object handler

The logic from switchdev_handle_port_obj_add_foreign() is directly
adapted from switchdev_handle_fdb_event_to_device(), which already
detects events on foreign interfaces and reoffloads them towards the
switchdev neighbors.

However, when we have a simple br0 <-> bond0 <-> swp0 topology and the
switchdev_handle_port_obj_add_foreign() gets called on bond0, we get
stuck into an infinite recursion:

1. bond0 does not pass check_cb(), so we attempt to find switchdev
neighbor interfaces. For that, we recursively call
__switchdev_handle_port_obj_add() for bond0's bridge, br0.

2. __switchdev_handle_port_obj_add() recurses through br0's lowers,
essentially calling __switchdev_handle_port_obj_add() for bond0

3. Go to step 1.

This happens because switchdev_handle_fdb_event_to_device() and
switchdev_handle_port_obj_add_foreign() are not exactly the same.
The FDB event helper special-cases LAG interfaces with its lag_mod_cb(),
so this is why we don't end up in an infinite loop - because it doesn't
attempt to treat LAG interfaces as potentially foreign bridge ports.

The problem is solved by looking ahead through the bridge's lowers to
see whether there is any switchdev interface that is foreign to the @dev
we are currently processing. This stops the recursion described above at
step 1: __switchdev_handle_port_obj_add(bond0) will not create another
call to __switchdev_handle_port_obj_add(br0). Going one step upper
should only happen when we're starting from a bridge port that has been
determined to be "foreign" to the switchdev driver that passes the
foreign_dev_check_cb().

Fixes: c4076cdd21f8 ("net: switchdev: introduce switchdev_handle_port_obj_{add,del} for foreign interfaces")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c4076cdd 15-Feb-2022 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: introduce switchdev_handle_port_obj_{add,del} for foreign interfaces

The switchdev_handle_port_obj_add() helper is good for replicating a
port object on the lower interfaces of @dev, if that object was emitted
on a bridge, or on a bridge port that is a LAG.

However, drivers that use this helper limit themselves to a box from
which they can no longer intercept port objects notified on neighbor
ports ("foreign interfaces").

One such driver is DSA, where software bridging with foreign interfaces
such as standalone NICs or Wi-Fi APs is an important use case. There, a
VLAN installed on a neighbor bridge port roughly corresponds to a
forwarding VLAN installed on the DSA switch's CPU port.

To support this use case while also making use of the benefits of the
switchdev_handle_* replication helper for port objects, introduce a new
variant of these functions that crawls through the neighbor ports of
@dev, in search of potentially compatible switchdev ports that are
interested in the event.

The strategy is identical to switchdev_handle_fdb_event_to_device():
if @dev wasn't a switchdev interface, then go one step upper, and
recursively call this function on the bridge that this port belongs to.
At the next recursion step, __switchdev_handle_port_obj_add() will
iterate through the bridge's lower interfaces. Among those, some will be
switchdev interfaces, and one will be the original @dev that we came
from. To prevent infinite recursion, we must suppress reentry into the
original @dev, and just call the @add_cb for the switchdev_interfaces.

It looks like this:

br0
/ | \
/ | \
/ | \
swp0 swp1 eth0

1. __switchdev_handle_port_obj_add(eth0)
-> check_cb(eth0) returns false
-> eth0 has no lower interfaces
-> eth0's bridge is br0
-> switchdev_lower_dev_find(br0, check_cb, foreign_dev_check_cb))
finds br0

2. __switchdev_handle_port_obj_add(br0)
-> check_cb(br0) returns false
-> netdev_for_each_lower_dev
-> check_cb(swp0) returns true, so we don't skip this interface

3. __switchdev_handle_port_obj_add(swp0)
-> check_cb(swp0) returns true, so we call add_cb(swp0)

(back to netdev_for_each_lower_dev from 2)
-> check_cb(swp1) returns true, so we don't skip this interface

4. __switchdev_handle_port_obj_add(swp1)
-> check_cb(swp1) returns true, so we call add_cb(swp1)

(back to netdev_for_each_lower_dev from 2)
-> check_cb(eth0) returns false, so we skip this interface to
avoid infinite recursion

Note: eth0 could have been a LAG, and we don't want to suppress the
recursion through its lowers if those exist, so when check_cb() returns
false, we still call switchdev_lower_dev_find() to estimate whether
there's anything worth a recursion beneath that LAG. Using check_cb()
and foreign_dev_check_cb(), switchdev_lower_dev_find() not only figures
out whether the lowers of the LAG are switchdev, but also whether they
actively offload the LAG or not (whether the LAG is "foreign" to the
switchdev interface or not).

The port_obj_info->orig_dev is preserved across recursive calls, so
switchdev drivers still know on which device was this notification
originally emitted.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7b465f4c 15-Feb-2022 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: rename switchdev_lower_dev_find to switchdev_lower_dev_find_rcu

switchdev_lower_dev_find() assumes RCU read-side critical section
calling context, since it uses netdev_walk_all_lower_dev_rcu().

Rename it appropriately, in preparation of adding a similar iterator
that assumes writer-side rtnl_mutex protection.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d8c28581 09-Feb-2022 Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn>

net/switchdev: use struct_size over open coded arithmetic

Replace zero-length array with flexible-array member and make use
of the struct_size() helper in kmalloc(). For example:

struct switchdev_deferred_item {
...
unsigned long data[];
};

Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4fc003fe 06-Dec-2021 Eric Dumazet <edumazet@google.com>

net: switchdev: add net device refcount tracker

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 716a30a9 26-Oct-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device

To reduce code churn, the same patch makes multiple changes, since they
all touch the same lines:

1. The implementations for these two are identical, just with different
function pointers. Reduce duplications and name the function pointers
"mod_cb" instead of "add_cb" and "del_cb". Pass the event as argument.

2. Drop the "const" attribute from "orig_dev". If the driver needs to
check whether orig_dev belongs to itself and then
call_switchdev_notifiers(orig_dev, SWITCHDEV_FDB_OFFLOADED), it
can't, because call_switchdev_notifiers takes a non-const struct
net_device *.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 957e2235 03-Aug-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge

With the introduction of explicit offloading API in switchdev in commit
2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform which bridge
ports are offloaded"), we started having Ethernet switch drivers calling
directly into a function exported by net/bridge/br_switchdev.c, which is
a function exported by the bridge driver.

This means that drivers that did not have an explicit dependency on the
bridge before, like cpsw and am65-cpsw, now do - otherwise it is not
possible to call a symbol exported by a driver that can be built as
module unless you are a module too.

There was an attempt to solve the dependency issue in the form of commit
b0e81817629a ("net: build all switchdev drivers as modules when the
bridge is a module"). Grygorii Strashko, however, says about it:

| In my opinion, the problem is a bit bigger here than just fixing the
| build :(
|
| In case, of ^cpsw the switchdev mode is kinda optional and in many
| cases (especially for testing purposes, NFS) the multi-mac mode is
| still preferable mode.
|
| There were no such tight dependency between switchdev drivers and
| bridge core before and switchdev serviced as independent, notification
| based layer between them, so ^cpsw still can be "Y" and bridge can be
| "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE
| will need to be set as "Y", or we will have to update drivers to
| support build with BRIDGE=n and maintain separate builds for
| networking vs non-networking testing. But is this enough? Wouldn't
| it cause 'chain reaction' required to add more and more "Y" options
| (like CONFIG_VLAN_8021Q)?
|
| PS. Just to be sure we on the same page - ARM builds will be forced
| (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our
| automation testing will just fail with omap2plus_defconfig.

In the light of this, it would be desirable for some configurations to
avoid dependencies between switchdev drivers and the bridge, and have
the switchdev mode as completely optional within the driver.

Arnd Bergmann also tried to write a patch which better expressed the
build time dependency for Ethernet switch drivers where the switchdev
support is optional, like cpsw/am65-cpsw, and this made the drivers
follow the bridge (compile as module if the bridge is a module) only if
the optional switchdev support in the driver was enabled in the first
place:
https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/

but this still did not solve the fact that cpsw and am65-cpsw now must
be built as modules when the bridge is a module - it just expressed
correctly that optional dependency. But the new behavior is an apparent
regression from Grygorii's perspective.

So to support the use case where the Ethernet driver is built-in,
NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we
need a framework that can handle the possible absence of the bridge from
the running system, i.e. runtime bloatware as opposed to build-time
bloatware.

Luckily we already have this framework, since switchdev has been using
it extensively. Events from the bridge side are transmitted to the
driver side using notifier chains - this was originally done so that
unrelated drivers could snoop for events emitted by the bridge towards
ports that are implemented by other drivers (think of a switch driver
with LAG offload that listens for switchdev events on a bonding/team
interface that it offloads).

There are also events which are transmitted from the driver side to the
bridge side, which again are modeled using notifiers.
SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with
notifying the bridge that a MAC address has been dynamically learned.
So there is a precedent we can use for modeling the new framework.

The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work
that the bridge needs to do when a port becomes offloaded is blocking in
its nature: replay VLANs, MDBs etc. The calling context is indeed
blocking (we are under rtnl_mutex), but the existing switchdev
notification chain that the bridge is subscribed to is only the atomic
one. So we need to subscribe the bridge to the blocking switchdev
notification chain too.

This patch:
- keeps the driver-side perception of the switchdev_bridge_port_{,un}offload
unchanged
- moves the implementation of switchdev_bridge_port_{,un}offload from
the bridge module into the switchdev module.
- makes everybody that is subscribed to the switchdev blocking notifier
chain "hear" offload & unoffload events
- makes the bridge driver subscribe and handle those events
- moves the bridge driver's handling of those events into 2 new
functions called br_switchdev_port_{,un}offload. These functions
contain in fact the core of the logic that was previously in
switchdev_bridge_port_{,un}offload, just that now we go through an
extra indirection layer to reach them.

Unlike all the other switchdev notification structures, the structure
used to carry the bridge port information, struct
switchdev_notifier_brport_info, does not contain a "bool handled".
This is because in the current usage pattern, we always know that a
switchdev bridge port offloading event will be handled by the bridge,
because the switchdev_bridge_port_offload() call was initiated by a
NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a
bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event
couldn't have happened.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2b0a5688 21-Jul-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: fix FDB entries towards foreign ports not getting propagated to us

The newly introduced switchdev_handle_fdb_{add,del}_to_device helpers
solved a problem but introduced another one. They have a severe design
bug: they do not propagate FDB events on foreign interfaces to us, i.e.
this use case:

br0
/ \
/ \
/ \
/ \
swp0 eno0
(switchdev) (foreign)

when an address is learned on eno0, what is supposed to happen is that
this event should also be propagated towards swp0. Somehow I managed to
convince myself that this did work correctly, but obviously it does not.

The trouble with foreign interfaces is that we must reach a switchdev
net_device pointer through a foreign net_device that has no direct
upper/lower relationship with it. So we need to do exploratory searching
through the lower interfaces of the foreign net_device's bridge upper
(to reach swp0 from eno0, we must check its upper, br0, for lower
interfaces that pass the check_cb and foreign_dev_check_cb). This is
something that the previous code did not do, it just assumed that "dev"
will become a switchdev interface at some point, somehow, probably by
magic.

With this patch, assisted address learning on the CPU port works again
in DSA:

ip link add br0 type bridge
ip link set swp0 master br0
ip link set eno0 master br0
ip link set br0 up

[ 46.708929] mscc_felix 0000:00:00.5 swp0: Adding FDB entry towards eno0, addr 00:04:9f:05:f4:ab vid 0 as host address

Fixes: 8ca07176ab00 ("net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE")
Reported-by: Eric Woudstra <ericwouds@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 71f4f89a 20-Jul-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: recurse into __switchdev_handle_fdb_del_to_device

The difference between __switchdev_handle_fdb_del_to_device and
switchdev_handle_del_to_device is that the former takes an extra
orig_dev argument, while the latter starts with dev == orig_dev.

We should recurse into the variant that does not lose the orig_dev along
the way. This is relevant when deleting FDB entries pointing towards a
bridge (dev changes to the lower interfaces, but orig_dev shouldn't).

The addition helper already recurses properly, just the deletion one
doesn't.

Fixes: 8ca07176ab00 ("net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8ca07176 19-Jul-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE

Currently DSA has an issue with FDB entries pointing towards the bridge
in the presence of br_fdb_replay() being called at port join and leave
time.

In particular, each bridge port will ask for a replay for the FDB
entries pointing towards the bridge when it joins, and for another
replay when it leaves.

This means that for example, a bridge with 4 switch ports will notify
DSA 4 times of the bridge MAC address.

But if the MAC address of the bridge changes during the normal runtime
of the system, the bridge notifies switchdev [ once ] of the deletion of
the old MAC address as a local FDB towards the bridge, and of the
insertion [ again once ] of the new MAC address as a local FDB.

This is a problem, because DSA keeps the old MAC address as a host FDB
entry with refcount 4 (4 ports asked for it using br_fdb_replay). So the
old MAC address will not be deleted. Additionally, the new MAC address
will only be installed with refcount 1, and when the first switch port
leaves the bridge (leaving 3 others as still members), it will delete
with it the new MAC address of the bridge from the local FDB entries
kept by DSA (because the br_fdb_replay call on deletion will bring the
entry's refcount from 1 to 0).

So the problem, really, is that the number of br_fdb_replay() calls is
not matched with the refcount that a host FDB is offloaded to DSA during
normal runtime.

An elegant way to solve the problem would be to make the switchdev
notification emitted by br_fdb_change_mac_address() result in a host FDB
kept by DSA which has a refcount exactly equal to the number of ports
under that bridge. Then, no matter how many DSA ports join or leave that
bridge, the host FDB entry will always be deleted when there are exactly
zero remaining DSA switch ports members of the bridge.

To implement the proposed solution, we remember that the switchdev
objects and port attributes have some helpers provided by switchdev,
which can be optionally called by drivers:
switchdev_handle_port_obj_{add,del} and switchdev_handle_port_attr_set.
These helpers:
- fan out a switchdev object/attribute emitted for the bridge towards
all the lower interfaces that pass the check_cb().
- fan out a switchdev object/attribute emitted for a bridge port that is
a LAG towards all the lower interfaces that pass the check_cb().

In other words, this is the model we need for the FDB events too:
something that will keep an FDB entry emitted towards a physical port as
it is, but translate an FDB entry emitted towards the bridge into N FDB
entries, one per physical port.

Of course, there are many differences between fanning out a switchdev
object (VLAN) on 3 lower interfaces of a LAG and fanning out an FDB
entry on 3 lower interfaces of a LAG. Intuitively, an FDB entry towards
a LAG should be treated specially, because FDB entries are unicast, we
can't just install the same address towards 3 destinations. It is
imaginable that drivers might want to treat this case specifically, so
create some methods for this case and do not recurse into the LAG lower
ports, just the bridge ports.

DSA also listens for FDB entries on "foreign" interfaces, aka interfaces
bridged with us which are not part of our hardware domain: think an
Ethernet switch bridged with a Wi-Fi AP. For those addresses, DSA
installs host FDB entries. However, there we have the same problem
(those host FDB entries are installed with a refcount of only 1) and an
even bigger one which we did not have with FDB entries towards the
bridge:

br_fdb_replay() is currently not called for FDB entries on foreign
interfaces, just for the physical port and for the bridge itself.

So when DSA sniffs an address learned by the software bridge towards a
foreign interface like an e1000 port, and then that e1000 leaves the
bridge, DSA remains with the dangling host FDB address. That will be
fixed separately by replaying all FDB entries and not just the ones
towards the port and the bridge.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 69bfac96 27-Jun-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: add a context void pointer to struct switchdev_notifier_info

In the case where the driver asks for a replay of a certain type of
event (port object or attribute) for a bridge port that is a LAG, it may
do so because this port has just joined the LAG.

But there might already be other switchdev ports in that LAG, and it is
preferable that those preexisting switchdev ports do not act upon the
replayed event.

The solution is to add a context to switchdev events, which is NULL most
of the time (when the bridge layer initiates the call) but which can be
set to a value controlled by the switchdev driver when a replay is
requested. The driver can then check the context to figure out if all
ports within the LAG should act upon the switchdev event, or just the
ones that match the context.

We have to modify all switchdev_handle_* helper functions as well as the
prototypes in the drivers that use these helpers too, because these
helpers hide the underlying struct switchdev_notifier_info from us and
there is no way to retrieve the context otherwise.

The context structure will be populated and used in later patches.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dcbdf135 13-Feb-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: bridge: propagate extack through switchdev_port_attr_set

The benefit is the ability to propagate errors from switchdev drivers
for the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING and
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL attributes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4c08c586 12-Feb-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: propagate extack to port attributes

When a struct switchdev_attr is notified through switchdev, there is no
way to report informational messages, unlike for struct switchdev_obj.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bae33f2b 08-Jan-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: remove the transaction structure from port attributes

Since the introduction of the switchdev API, port attributes were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.

Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8eceffc1 ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.

It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.

This patch removes the struct switchdev_trans member from switchdev port
attribute notifier structures, and converts drivers to not look at this
member.

In part, this patch contains a revert of my previous commit 2e554a7a5d8a
("net: dsa: propagate switchdev vlan_filtering prepare phase to
drivers").

For the most part, the conversion was trivial except for:
- Rocker's world implementation based on Broadcom OF-DPA had an odd
implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
done mechanically, by pasting the implementation twice, then only
keeping the code that would get executed during prepare phase on top,
then only keeping the code that gets executed during the commit phase
on bottom, then simplifying the resulting code until this was obtained.
- DSA's offloading of STP state, bridge flags, VLAN filtering and
multicast router could be converted right away. But the ageing time
could not, so a shim was introduced and this was left for a further
commit.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# cf6def51 08-Jan-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: delete switchdev_port_obj_add_now

After the removal of the transactional model inside
switchdev_port_obj_add_now, it has no added value and we can just call
switchdev_port_obj_notify directly, bypassing this function. Let's
delete it.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ffb68fc5 08-Jan-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: remove the transaction structure from port object notifiers

Since the introduction of the switchdev API, port objects were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.

Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8eceffc1 ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.

It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.

This patch removes the struct switchdev_trans member from switchdev port
object notifier structures, and converts drivers to not look at this
member.

Where driver conversion is trivial (like in the case of the Marvell
Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
done in this patch.

Where driver conversion needs more attention (DSA, Mellanox Spectrum),
the conversion is left for subsequent patches and here we only fake the
prepare/commit phases at a lower level, just not in the switchdev
notifier itself.

Where the code has a natural structure that is best left alone as a
preparation and a commit phase (as in the case of the Ocelot switch),
that structure is left in place, just made to not depend upon the
switchdev transactional model.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 20776b46 25-Jan-2021 Rasmus Villemoes <rasmus.villemoes@prevas.dk>

net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP

It's not true that switchdev_port_obj_notify() only inspects the
->handled field of "struct switchdev_notifier_port_obj_info" if
call_switchdev_blocking_notifiers() returns 0 - there's a WARN_ON()
triggering for a non-zero return combined with ->handled not being
true. But the real problem here is that -EOPNOTSUPP is not being
properly handled.

The wrapper functions switchdev_handle_port_obj_add() et al change a
return value of -EOPNOTSUPP to 0, and the treatment of ->handled in
switchdev_port_obj_notify() seems to be designed to change that back
to -EOPNOTSUPP in case nobody actually acted on the notifier (i.e.,
everybody returned -EOPNOTSUPP).

Currently, as soon as some device down the stack passes the check_cb()
check, ->handled gets set to true, which means that
switchdev_port_obj_notify() cannot actually ever return -EOPNOTSUPP.

This, for example, means that the detection of hardware offload
support in the MRP code is broken: switchdev_port_obj_add() used by
br_mrp_switchdev_send_ring_test() always returns 0, so since the MRP
code thinks the generation of MRP test frames has been offloaded, no
such frames are actually put on the wire. Similarly,
br_mrp_switchdev_set_ring_role() also always returns 0, causing
mrp->ring_role_offloaded to be set to 1.

To fix this, continue to set ->handled true if any callback returns
success or any error distinct from -EOPNOTSUPP. But if all the
callbacks return -EOPNOTSUPP, make sure that ->handled stays false, so
the logic in switchdev_port_obj_notify() can propagate that
information.

Fixes: 9a9f26e8f7ea ("bridge: mrp: Connect MRP API with the switchdev API")
Fixes: f30f0601eb93 ("switchdev: Add helpers to aid traversal through lower devices")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Link: https://lore.kernel.org/r/20210125124116.102928-1-rasmus.villemoes@prevas.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ea6754ae 22-Sep-2020 Tian Tao <tiantao6@hisilicon.com>

net: switchdev: Fixed kerneldoc warning

Update kernel-doc line comments to fix warnings reported by make W=1.
net/switchdev/switchdev.c:413: warning: Function parameter or
member 'extack' not described in 'call_switchdev_notifiers'

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c8af73f0 12-Jul-2020 Andrew Lunn <andrew@lunn.ch>

net: switchdev: kerneldoc fixes

Simple fixes which require no deep knowledge of the code.

Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 07c6f980 26-Feb-2020 Russell King <rmk+kernel@armlinux.org.uk>

net: switchdev: do not propagate bridge updates across bridges

When configuring a tree of independent bridges, propagating changes
from the upper bridge across a bridge master to the lower bridge
ports brings surprises.

For example, a lower bridge may have vlan filtering enabled. It
may have a vlan interface attached to the bridge master, which may
then be incorporated into another bridge. As soon as the lower
bridge vlan interface is attached to the upper bridge, the lower
bridge has vlan filtering disabled.

This occurs because switchdev recursively applies its changes to
all lower devices no matter what.

Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fbfc8502 17-Feb-2020 Gustavo A. R. Silva <gustavo@embeddedor.com>

net: switchdev: Replace zero-length array with flexible-array member

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2874c5fd 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 91cf8ece 27-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

switchdev: Remove unused transaction item queue

There are no more in tree users of the
switchdev_trans_item_{dequeue,enqueue} or switchdev_trans_item structure
in the kernel since commit 00fc0c51e35b ("rocker: Change world_ops API
and implementation to be switchdev independant").

Remove this unused code and update the documentation accordingly since.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d45224d6 27-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

net: switchdev: Replace port attr set SDO with a notification

Drop switchdev_ops.switchdev_port_attr_set. Drop the uses of this field
from all clients, which were migrated to use switchdev notification in
the previous patches.

Add a new function switchdev_port_attr_notify() that sends the switchdev
notifications SWITCHDEV_PORT_ATTR_SET and calls the blocking (process)
notifier chain.

We have one odd case within net/bridge/br_switchdev.c with the
SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier that
requires executing from atomic context, we deal with that one
specifically.

Drop __switchdev_port_attr_set() and update switchdev_port_attr_set()
likewise.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1cb33af1 27-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

switchdev: Add SWITCHDEV_PORT_ATTR_SET

In preparation for allowing switchdev enabled drivers to veto specific
attribute settings from within the context of the caller, introduce a
new switchdev notifier type for port attributes.

Suggested-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 72636db5 24-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

switchdev: Complete removal of switchdev_port_attr_get()

We have no more in tree users of switchdev_port_attr_get() after
d0e698d57a94 ("Merge branch 'net-Get-rid-of-switchdev_port_attr_get'")
so completely remove the function signature and body.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bccb3025 06-Feb-2019 Florian Fainelli <f.fainelli@gmail.com>

net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID

Now that we have a dedicated NDO for getting a port's parent ID, get rid
of SWITCHDEV_ATTR_ID_PORT_PARENT_ID and convert all callers to use the
NDO exclusively. This is a preliminary change to getting rid of
switchdev_ops eventually.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6685987c 16-Jan-2019 Petr Machata <petrm@mellanox.com>

switchdev: Add extack argument to call_switchdev_notifiers()

A follow-up patch will enable vetoing of FDB entries. Make it possible
to communicate details of why an FDB entry is not acceptable back to the
user.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 69213513 12-Dec-2018 Petr Machata <petrm@mellanox.com>

net: switchdev: Add extack to switchdev_handle_port_obj_add() callback

Drivers use switchdev_handle_port_obj_add() to handle recursive descent
through lower devices. Change this function prototype to take add_cb
that itself takes an extack argument. Decode extack from
switchdev_notifier_port_obj_info and pass it to add_cb.

Update mlxsw and ocelot drivers which use this helper.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 479c86dc 12-Dec-2018 Petr Machata <petrm@mellanox.com>

net: switchdev: Add extack to struct switchdev_notifier_info

In order to pass extack to the drivers that need it, add an extack field
to struct switchdev_notifier_info, and an extack argument to the
function call_switchdev_blocking_notifiers(). Also add a helper function
switchdev_notifier_info_to_extack().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 69b7320e 12-Dec-2018 Petr Machata <petrm@mellanox.com>

net: switchdev: Add extack argument to switchdev_port_obj_add()

After the previous patch, bridge driver has extack argument available to
pass to switchdev. Therefore extend switchdev_port_obj_add() with this
argument, updating all callers, and passing the argument through to
switchdev_port_obj_notify().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d17d9f5e 22-Nov-2018 Petr Machata <petrm@mellanox.com>

switchdev: Replace port obj add/del SDO with a notification

Drop switchdev_ops.switchdev_port_obj_add and _del. Drop the uses of
this field from all clients, which were migrated to use switchdev
notification in the previous patches.

Add a new function switchdev_port_obj_notify() that sends the switchdev
notifications SWITCHDEV_PORT_OBJ_ADD and _DEL.

Update switchdev_port_obj_del_now() to dispatch to this new function.
Drop __switchdev_port_obj_add() and update switchdev_port_obj_add()
likewise.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f30f0601 22-Nov-2018 Petr Machata <petrm@mellanox.com>

switchdev: Add helpers to aid traversal through lower devices

After the transition from switchdev operations to notifier chain (which
will take place in following patches), the onus is on the driver to find
its own devices below possible layer of LAG or other uppers.

The logic to do so is fairly repetitive: each driver is looking for its
own devices among the lowers of the notified device. For those that it
finds, it calls a handler. To indicate that the event was handled,
struct switchdev_notifier_port_obj_info.handled is set. The differences
lie only in what constitutes an "own" device and what handler to call.

Therefore abstract this logic into two helpers,
switchdev_handle_port_obj_add() and switchdev_handle_port_obj_del(). If
a driver only supports physical ports under a bridge device, it will
simply avoid this layer of indirection.

One area where this helper diverges from the current switchdev behavior
is the case of mixed lowers, some of which are switchdev ports and some
of which are not. Previously, such scenario would fail with -EOPNOTSUPP.
The helper could do that for lowers for which the passed-in predicate
doesn't hold. That would however break the case that switchdev ports
from several different drivers are stashed under one master, a scenario
that switchdev currently happily supports. Therefore tolerate any and
all unknown netdevices, whether they are backed by a switchdev driver
or not.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a93e3b17 22-Nov-2018 Petr Machata <petrm@mellanox.com>

switchdev: Add a blocking notifier chain

In general one can't assume that a switchdev notifier is called in a
non-atomic context, and correspondingly, the switchdev notifier chain is
an atomic one.

However, port object addition and deletion messages are delivered from a
process context. Even the MDB addition messages, whose delivery is
scheduled from atomic context, are queued and the delivery itself takes
place in blocking context. For VLAN messages in particular, keeping the
blocking nature is important for error reporting.

Therefore introduce a blocking notifier chain and related service
functions to distribute the notifications for which a blocking context
can be assumed.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 47d5b6db 09-Nov-2017 Andrew Lunn <andrew@lunn.ch>

net: bridge: Add/del switchdev object on host join/leave

When the host joins or leaves a multicast group, use switchdev to add
an object to the hardware to forward traffic for the group to the
host.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 29ab586c 06-Aug-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

net: switchdev: Remove bridge bypass support from switchdev

Currently the bridge port flags, vlans, FDBs and MDBs can be offloaded
through the bridge code, making the switchdev's SELF bridge bypass
implementation to be redundant. This implies several changes:
- No need for dump infra in switchdev, DSA's special case is handled
privately.
- Remove obj_dump from switchdev_ops.
- FDBs are removed from obj_add/del routines, due to the fact that they
are offloaded through the bridge notification chain.
- The switchdev_port_bridge_xx() and switchdev_port_fdb_xx() functions
can be removed.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2bedde1a 06-Aug-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

net: dsa: Move FDB dump implementation inside DSA

>From all switchdev devices only DSA requires special FDB dump. This is due
to lack of ability for syncing the hardware learned FDBs with the bridge.
Due to this it is removed from switchdev and moved inside DSA.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ff5cf100 08-Jun-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

net: switchdev: Change notifier chain to be atomic

In order to use the switchdev notifier chain for FDB sync with the
device it has to be changed to atomic. The is done because the bridge
can learn new FDBs in atomic context.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fceb6435 12-Apr-2017 Johannes Berg <johannes.berg@intel.com>

netlink: pass extended ACK struct to parsing functions

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c778453b 25-Oct-2016 Ido Schimmel <idosch@mellanox.com>

switchdev: Remove redundant variable

Instead of storing return value in 'err' and returning, just return
directly.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 97c24290 18-Oct-2016 Ido Schimmel <idosch@mellanox.com>

switchdev: Execute bridge ndos only for bridge ports

We recently got the following warning after setting up a vlan device on
top of an offloaded bridge and executing 'bridge link':

WARNING: CPU: 0 PID: 18566 at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:81 mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
[...]
CPU: 0 PID: 18566 Comm: bridge Not tainted 4.8.0-rc7 #1
Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox switch, BIOS 4.6.5 05/21/2015
0000000000000286 00000000e64ab94f ffff880406e6f8f0 ffffffff8135eaa3
0000000000000000 0000000000000000 ffff880406e6f930 ffffffff8108c43b
0000005106e6f988 ffff8803df398840 ffff880403c60108 ffff880406e6f990
Call Trace:
[<ffffffff8135eaa3>] dump_stack+0x63/0x90
[<ffffffff8108c43b>] __warn+0xcb/0xf0
[<ffffffff8108c56d>] warn_slowpath_null+0x1d/0x20
[<ffffffffa01420d5>] mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
[<ffffffffa0142195>] mlxsw_sp_port_attr_get+0xa5/0xb0 [mlxsw_spectrum]
[<ffffffff816f151f>] switchdev_port_attr_get+0x4f/0x140
[<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140
[<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140
[<ffffffff816f1d6b>] switchdev_port_bridge_getlink+0x5b/0xc0
[<ffffffff816f2680>] ? switchdev_port_fdb_dump+0x90/0x90
[<ffffffff815f5427>] rtnl_bridge_getlink+0xe7/0x190
[<ffffffff8161a1b2>] netlink_dump+0x122/0x290
[<ffffffff8161b0df>] __netlink_dump_start+0x15f/0x190
[<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230
[<ffffffff815fab46>] rtnetlink_rcv_msg+0x1a6/0x220
[<ffffffff81208118>] ? __kmalloc_node_track_caller+0x208/0x2c0
[<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230
[<ffffffff815fa9a0>] ? rtnl_newlink+0x890/0x890
[<ffffffff8161cf54>] netlink_rcv_skb+0xa4/0xc0
[<ffffffff815f56f8>] rtnetlink_rcv+0x28/0x30
[<ffffffff8161c92c>] netlink_unicast+0x18c/0x240
[<ffffffff8161ccdb>] netlink_sendmsg+0x2fb/0x3a0
[<ffffffff815c5a48>] sock_sendmsg+0x38/0x50
[<ffffffff815c6031>] SYSC_sendto+0x101/0x190
[<ffffffff815c7111>] ? __sys_recvmsg+0x51/0x90
[<ffffffff815c6b6e>] SyS_sendto+0xe/0x10
[<ffffffff817017f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4

The problem is that the 8021q module propagates the call to
ndo_bridge_getlink() via switchdev ops, but the switch driver doesn't
recognize the netdev, as it's not offloaded.

While we can ignore calls being made to non-bridge ports inside the
driver, a better fix would be to push this check up to the switchdev
layer.

Note that these ndos can be called for non-bridged netdev, but this only
happens in certain PF drivers which don't call the corresponding
switchdev functions anyway.

Fixes: 99f44bb3527b ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Tamir Winetroub <tamirw@mellanox.com>
Tested-by: Tamir Winetroub <tamirw@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 347e3b28 25-Sep-2016 Jiri Pirko <jiri@mellanox.com>

switchdev: remove FIB offload infrastructure

Since this is now taken care of by FIB notifier, remove the code, with
all unused dependencies.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c9850187 25-Sep-2016 Jiri Pirko <jiri@mellanox.com>

fib: introduce FIB info offload flag helpers

These helpers are to be used in case someone offloads the FIB entry. The
result is that if the entry is offloaded to at least one device, the
offload flag is set.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d297653d 30-Aug-2016 Roopa Prabhu <roopa@cumulusnetworks.com>

rtnetlink: fdb dump: optimize by saving last interface markers

fdb dumps spanning multiple skb's currently restart from the first
interface again for every skb. This results in unnecessary
iterations on the already visited interfaces and their fdb
entries. In large scale setups, we have seen this to slow
down fdb dumps considerably. On a system with 30k macs we
see fdb dumps spanning across more than 300 skbs.

To fix the problem, this patch replaces the existing single fdb
marker with three markers: netdev hash entries, netdevs and fdb
index to continue where we left off instead of restarting from the
first netdev. This is consistent with link dumps.

In the process of fixing the performance issue, this patch also
re-implements fix done by
commit 472681d57a5d ("net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump")
(with an internal fix from Wilson Kok) in the following ways:
- change ndo_fdb_dump handlers to return error code instead
of the last fdb index
- use cb->args strictly for dump frag markers and not error codes.
This is consistent with other dump functions.

Below results were taken on a system with 1000 netdevs
and 35085 fdb entries:
before patch:
$time bridge fdb show | wc -l
15065

real 1m11.791s
user 0m0.070s
sys 1m8.395s

(existing code does not return all macs)

after patch:
$time bridge fdb show | wc -l
35085

real 0m2.017s
user 0m0.113s
sys 0m1.942s

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6bc506b4 25-Aug-2016 Ido Schimmel <idosch@mellanox.com>

bridge: switchdev: Add forward mark support for stacked devices

switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
port netdevs so that packets being flooded by the device won't be
flooded twice.

It works by assigning a unique identifier (the ifindex of the first
bridge port) to bridge ports sharing the same parent ID. This prevents
packets from being flooded twice by the same switch, but will flood
packets through bridge ports belonging to a different switch.

This method is problematic when stacked devices are taken into account,
such as VLANs. In such cases, a physical port netdev can have upper
devices being members in two different bridges, thus requiring two
different 'offload_fwd_mark's to be configured on the port netdev, which
is impossible.

The main problem is that packet and netdev marking is performed at the
physical netdev level, whereas flooding occurs between bridge ports,
which are not necessarily port netdevs.

Instead, packet and netdev marking should really be done in the bridge
driver with the switch driver only telling it which packets it already
forwarded. The bridge driver will mark such packets using the mark
assigned to the ingress bridge port and will prevent the packet from
being forwarded through any bridge port sharing the same mark (i.e.
having the same parent ID).

Remove the current switchdev 'offload_fwd_mark' implementation and
instead implement the proposed method. In addition, make rocker - the
sole user of the mark - use the proposed method.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5c326ab4 25-Aug-2016 Ido Schimmel <idosch@mellanox.com>

switchdev: Support parent ID comparison for stacked devices

switchdev_port_same_parent_id() currently expects port netdevs, but we
need it to support stacked devices in the next patch, so drop the
NO_RECURSE flag.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2eb03e6c 15-Aug-2016 Or Gerlitz <ogerlitz@mellanox.com>

switchdev: Put export declaration in the right place

Move exporting of switchdev_port_same_parent_id to be right
below it and not elsewhere.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8438884d 14-Jul-2016 Or Gerlitz <ogerlitz@mellanox.com>

net/switchdev: Export the same parent ID service function

This helper serves to know if two switchdev port netdevices belong to the
same HW ASIC, e.g to figure out if forwarding offload is possible between them.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# da4ed551 17-May-2016 Jiri Pirko <jiri@mellanox.com>

switchdev: pass pointer to fib_info instead of copy

The problem is that fib_info->nh is [0] so the struct fib_info
allocation size depends on number of nexthops. If we just copy fib_info,
we do not copy the nexthops info and driver accesses memory which is not
ours.

Given the fact that fib4 does not defer operations and therefore it does
not need copy, just pass the pointer down to drivers as it was done
before.

Fixes: 850d0cbc91 ("switchdev: remove pointers from switchdev objects")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7ceb2afb 20-Apr-2016 Elad Raz <eladr@mellanox.com>

switchdev: Adding complete operation to deferred switchdev ops

When using switchdev deferred operation (SWITCHDEV_F_DEFER), the operation
is executed in different context and the application doesn't have any way
to get the operation real status.

Adding a completion callback fixes that. This patch adds fields to
switchdev_attr and switchdev_obj "complete_priv" field which is used by
the "complete" callback.

Application can set a complete function which will be called once the
operation executed.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3e347660 24-Mar-2016 Nicolas Dichtel <nicolas.dichtel@6wind.com>

switchdev: fix typo in comments/doc

Two minor typo.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 472681d5 24-Feb-2016 MINOURA Makoto / 箕浦 真 <minoura@valinux.co.jp>

net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump.

When the send skbuff reaches the end, nlmsg_put and friends returns
-EMSGSIZE but it is silently thrown away in ndo_fdb_dump. It is called
within a for_each_netdev loop and the first fdb entry of a following
netdev could fit in the remaining skbuff. This breaks the mechanism
of cb->args[0] and idx to keep track of the entries that are already
dumped, which results missing entries in bridge fdb show command.

Signed-off-by: Minoura Makoto <minoura@valinux.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4f2c6ae5 27-Jan-2016 Ido Schimmel <idosch@mellanox.com>

switchdev: Require RTNL mutex to be held when sending FDB notifications

When switchdev drivers process FDB notifications from the underlying
device they resolve the netdev to which the entry points to and notify
the bridge using the switchdev notifier.

However, since the RTNL mutex is not held there is nothing preventing
the netdev from disappearing in the middle, which will cause
br_switchdev_event() to dereference a non-existing netdev.

Make switchdev drivers hold the lock at the beginning of the
notification processing session and release it once it ends, after
notifying the bridge.

Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
when RTNL mutex is held.

Fixes: 03bf0c281234 ("switchdev: introduce switchdev notifier")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4d41e125 10-Jan-2016 Elad Raz <eladr@mellanox.com>

switchdev: Adding MDB entry offload

Define HW multicast entry: MAC and VID.
Using a MAC address simplifies support for both IPV4 and IPv6.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6ff64f6f 15-Dec-2015 Ido Schimmel <idosch@mellanox.com>

switchdev: Pass original device to port netdev driver

switchdev drivers need to know the netdev on which the switchdev op was
invoked. For example, the STP state of a VLAN interface configured on top
of a port can change while being member in a bridge. In this case, the
underlying driver should only change the STP state of that particular
VLAN and not of all the VLANs configured on the port.

However, current switchdev infrastructure only passes the port netdev down
to the driver. Solve that by passing the original device down to the
driver as part of the required switchdev object / attribute.

This doesn't entail any change in current switchdev drivers. It simply
enables those supporting stacked devices to know the originating device
and act accordingly.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0c63d80c 03-Nov-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion

Caller passing down the SKIP_EOPNOTSUPP switchdev flag expects that
-EOPNOTSUPP cannot be returned. But in case of direct op call without
recurtion, this may happen. So fix this by checking it always on the
end of __switchdev_port_attr_set function.

Fixes: 464314ea6c11 ("switchdev: skip over ports returning -EOPNOTSUPP when recursing ports")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e258d919 29-Oct-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: fix: pass correct obj size when deferring obj add

Fixes: 4d429c5dd ("switchdev: introduce possibility to defer obj_add/del")
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3a7bde55 29-Oct-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: fix: erasing too much of vlan obj when handling multiple vlan specs

When adding vlans with multiple IFLA_BRIDGE_VLAN_INFO attrs set in AFSPEC,
we would wipe the vlan obj struct after the first IFLA_BRIDGE_VLAN_INFO.
Fix this by only clearing what's necessary on each IFLA_BRIDGE_VLAN_INFO
iteration.

Fixes: 9e8f4a54 ("switchdev: push object ID back to object structure")
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 741af005 28-Oct-2015 Ido Schimmel <idosch@mellanox.com>

switchdev: Add support for flood control

Allow devices supporting this feature to control the flooding of unknown
unicast traffic, by making switchdev infrastructure propagate this setting
to the switch driver.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 771acac2 14-Oct-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: assert rtnl mutex when going over lower netdevs

netdev_for_each_lower_dev has to be called with rtnl mutex held. So
better enforce it in switchdev functions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4d429c5d 14-Oct-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: introduce possibility to defer obj_add/del

Similar to the attr usecase, the caller knows if he is holding RTNL and is
in atomic section. So let the called to decide the correct call variant.

This allows drivers to sleep inside their ops and wait for hw to get the
operation status. Then the status is propagated into switchdev core.
This avoids silent errors in drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 850d0cbc 14-Oct-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: remove pointers from switchdev objects

When object is used in deferred work, we cannot use pointers in
switchdev object structures because the memory they point at may be already
used by someone else. So rather do local copy of the value.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0bc05d58 14-Oct-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: allow caller to explicitly request attr_set as deferred

Caller should know if he can call attr_set directly (when holding RTNL)
or if he has to defer the att_set processing for later.

This also allows drivers to sleep inside attr_set and report operation
status back to switchdev core. Switchdev core then warns if status is
not ok, instead of silent errors happening in drivers.

Benefit from newly introduced switchdev deferred ops infrastructure.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f7fadf30 14-Oct-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: make struct switchdev_attr parameter const for attr_set calls

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 793f4014 14-Oct-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: introduce switchdev deferred ops infrastructure

Introduce infrastructure which will be used internally to defer ops.
Note that the deferred ops are queued up and either are processed by
scheduled work or explicitly by user calling deferred_process function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 87aaf2ca 12-Oct-2015 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

switchdev: check if the vlan id is in the proper vlan range

VLANs 0 and 4095 are reserved and shouldn't be used, add checks to
switchdev similar to the bridge. Also make sure ids above 4095 cannot
be passed either.

Fixes: 47f8328bb1a4 ("switchdev: add new switchdev bridge setlink")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cc02aa8e 12-Oct-2015 Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

switchdev: enforce no pvid flag in vlan ranges

We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Elad Raz <eladr@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 464314ea 08-Oct-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: skip over ports returning -EOPNOTSUPP when recursing ports

This allows us to recurse over all the ports, skipping over unsupporting
ports. Without the change, the recursion would stop at first unsupported
port.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9e8f4a54 01-Oct-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: push object ID back to object structure

Suggested-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 648b4a99 01-Oct-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: bring back switchdev_obj and use it as a generic object param

Replace "void *obj" with a generic structure. Introduce couple of
helpers along that.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 52ba57cf 01-Oct-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: rename switchdev_obj_fdb to switchdev_obj_port_fdb

Make the struct name in sync with object id name.

Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f24f309 01-Oct-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: rename switchdev_obj_vlan to switchdev_obj_port_vlan

Make the struct name in sync with object id name.

Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1f868398 01-Oct-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: rename SWITCHDEV_ATTR_* enum values to SWITCHDEV_ATTR_ID_*

To be aligned with obj.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 57d80838 01-Oct-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: rename SWITCHDEV_OBJ_* enum values to SWITCHDEV_OBJ_ID_*

Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ab069002 28-Sep-2015 Vivien Didelot <vivien.didelot@gmail.com>

net: switchdev: abstract object in add/del ops

Similar to the notifier_call callback of a notifier_block, change the
function signature of switchdev add and del operations to:

int switchdev_port_obj_add/del(struct net_device *dev,
enum switchdev_obj_id id, void *obj);

This allows the caller to pass a specific switchdev_obj_* structure
instead of the generic switchdev_obj one.

Drivers implementation of these operations and switchdev have been
changed accordingly.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 25f07adc 28-Sep-2015 Vivien Didelot <vivien.didelot@gmail.com>

net: switchdev: pass callback to dump operation

Similar to the notifier_call callback of a notifier_block, change the
function signature of switchdev dump operation to:

int switchdev_port_obj_dump(struct net_device *dev,
enum switchdev_obj_id id, void *obj,
int (*cb)(void *obj));

This allows the caller to pass and expect back a specific
switchdev_obj_* structure instead of the generic switchdev_obj one.

Drivers implementation of dump operation can now expect this specific
structure and call the callback with it. Drivers have been changed
accordingly.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 03d5fb18 28-Sep-2015 Vivien Didelot <vivien.didelot@gmail.com>

net: switchdev: remove dev from switchdev_obj cb

The net_device associated to a dump operation does not have to be passed
to the callback. switchdev stores it in a superset struct, if needed.

Also some drivers (such as DSA drivers) may not have easy access to it.

This will simplify pushing the callback function down to the drivers.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e02a06b2 28-Sep-2015 Vivien Didelot <vivien.didelot@gmail.com>

net: switchdev: move dev in switchdev_fdb_dump

The FDB dump callback requires the related net_device so move it to the
struct switchdev_fdb_dump superset instead of using a callback param.

With this done, it'll be simpler to change the dump function signature.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e23b002b 28-Sep-2015 Vivien Didelot <vivien.didelot@gmail.com>

net: switchdev: remove dev in port_vlan_dump_put

The static switchdev_port_vlan_dump_put function does not need the
net_device parameter, so remove it.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f623ab7f 24-Sep-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: reduce transaction phase enum down to a boolean

Now, since we have only 2 values for transaction phase, just use bool.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9f6467cf 24-Sep-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: remove "ABORT" transaction phase

No longer used by drivers, as transaction queue with item destructors
takes care of abort phase internally in switchdev code. So kill it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f8db8348 24-Sep-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: move transaction phase enum under transaction structure

Before it disappears completely, move transaction phase enum under
transaction structure and make attr/obj structures a bit cleaner.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7ea6eb3f 24-Sep-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: introduce transaction item queue for attr_set and obj_add

Now, the memory allocation in prepare/commit state is done separatelly
in each driver (rocker). Introduce the similar mechanism in generic
switchdev code, in form of queue. That can be used not only for memory
allocations, but also for different items. Abort item destruction
is handled as well.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 69f5df49 24-Sep-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: rename "trans" to "trans_ph".

This is temporary, name "trans" will be used for something else and
"trans_ph" will eventually disappear.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0890cf6c 03-Sep-2015 Jiri Pirko <jiri@mellanox.com>

switchdev: fix return value of switchdev_port_fdb_dump in case of error

switchdev_port_fdb_dump is used as .ndo_fdb_dump. Its return value is
idx, so we cannot return errval.

Fixes: 45d4122ca7cd ("switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Acked-by: Scott Feldman<sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ce80e7bc 10-Aug-2015 Vivien Didelot <vivien.didelot@gmail.com>

net: switchdev: support static FDB addresses

This patch adds an ndm_state member to the switchdev_obj_fdb structure,
in order to support static FDB addresses.

Set Rocker ndm_state to NUD_REACHABLE.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cdf09697 11-Aug-2015 David S. Miller <davem@davemloft.net>

Revert "Merge branch 'mv88e6xxx-switchdev-fdb'"

This reverts commit f1d5ca434413b20cd3f8c18ff2b634b7782149a5, reversing
changes made to 4933d85c5173832ebd261756522095837583c458.

I applied v2 instead of v3.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 89024826 05-Aug-2015 Vivien Didelot <vivien.didelot@gmail.com>

net: switchdev: support static FDB addresses

This patch adds a is_static boolean to the switchdev_obj_fdb structure,
in order to set the ndm_state to either NUD_NOARP or NUD_REACHABLE.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1525c386 05-Aug-2015 Vivien Didelot <vivien.didelot@gmail.com>

net: switchdev: change fdb addr for a byte array

The address in the switchdev_obj_fdb structure is currently represented
as a pointer. Replacing it for a 6-byte array allows switchdev to carry
addresses directly read from hardware registers, not stored by the
switch chip driver (as in Rocker).

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1a3b2ec9 18-Jul-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: add offload_fwd_mark generator helper

skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be
unique for device and may even be unique for a sub-set of ports within
device, so add switchdev helper function to generate unique marks based on
port's switch ID and group_ifindex. group_ifindex would typically be the
container dev's ifindex, such as the bridge's ifindex.

The generator uses a global hash table to store offload_fwd_marks hashed by
{switch ID, group_ifindex} key.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d754f98b 18-Jul-2015 Scott Feldman <sfeldma@gmail.com>

net: add phys ID compare helper to test if two IDs are the same

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2ee94014 10-Jul-2015 Vivien Didelot <vivien.didelot@gmail.com>

net: switchdev: don't abort unsupported operations

There is no need to abort attribute setting or object addition, if the
prepare phase returned operation not supported.

Thus, abort these two transactions only if the error is not -EOPNOTSUPP.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5c8079d0 23-Jun-2015 Vivien Didelot <vivien.didelot@gmail.com>

net: switchdev: ignore unsupported bridge flags

switchdev_port_bridge_getlink() queries SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS
attributes, but a driver doesn't need to implement this in order to get
bridge link information.

So error out only on errors different than -EOPNOTSUPP.

(This is a follow-up patch for 7d4f8d8.)

Fixes: 8793d0a664a8 ("switchdev: add new switchdev_port_bridge_getlink")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e9fdaec0 11-Jun-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: change BUG_ON to WARN for attr set failure case

This particular BUG_ON condition was checking for attr set err in the
COMMIT phase, which isn't expected (it's a driver bug if PREPARE phase is
OK but COMMIT fails). But BUG_ON() is too strong for this case, so change
to WARN(). BUG_ON() would be warranted if the system was corrupted beyond
repair, but this is not the case here.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7d4f8d87 22-Jun-2015 Scott Feldman <sfeldma@gmail.com>

switchdev; add VLAN support for port's bridge_getlink

One more missing piece of the puzzle. Add vlan dump support to switchdev
port's bridge_getlink. iproute2 "bridge vlan show" cmd already knows how
to show the vlans installed on the bridge and the device , but (until now)
no one implemented the port vlan part of the netlink PF_BRIDGE:RTM_GETLINK
msg. Before this patch, "bridge vlan show":

$ bridge -c vlan show
port vlan ids
sw1p1 30-34 << bridge side vlans
57

sw1p1 << device side vlans (missing)

sw1p2 57

sw1p2

sw1p3

sw1p4

br0 None

(When the port is bridged, the output repeats the vlan list for the vlans
on the bridge side of the port and the vlans on the device side of the
port. The listing above show no vlans for the device side even though they
are installed).

After this patch:

$ bridge -c vlan show
port vlan ids
sw1p1 30-34 << bridge side vlan
57

sw1p1 30-34 << device side vlans
57
3840 PVID

sw1p2 57

sw1p2 57
3840 PVID

sw1p3 3842 PVID

sw1p4 3843 PVID

br0 None

I re-used ndo_dflt_bridge_getlink to add vlan fill call-back func.
switchdev support adds an obj dump for VLAN objects, using the same
call-back scheme as FDB dump. Support included for both compressed and
un-compressed vlan dumps.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3e3a78b4 22-Jun-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: rename vlan vid_start to vid_begin

Use vid_begin/end to be consistent with BRIDGE_VLAN_INFO_RANGE_BEGIN/END.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 10ea5165 17-Jun-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: fdb filter_dev is always NULL for self (device), so remove check

Remove the filter_dev check when dumping fdb entries, otherwise dump
returns empty list. filter_dev is always passed as NULL when dumping fdbs
on SELF. We want the fdbs installed on the device to be listed in the
dump.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Fixes: 45d4122c ("switchdev: add support for fdb add/del/dump via switchdev_port_obj ops")
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 57225e77 11-Jun-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: fix BUG when port driver doesn't support set attr op

Fix a BUG_ON() where CONFIG_NET_SWITCHDEV is set but the driver for a
bridged port does not support switchdev_port_attr_set op. Don't BUG_ON()
if -EOPNOTSUPP is returned.

Also change BUG_ON() to netdev_err since this is a normal error path and
does not warrant the use of BUG_ON(), which is reserved for unrecoverable
errs.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# af201f72 10-Jun-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: fix handling for drivers not supporting IPv4 fib add/del ops

If CONFIG_NET_SWITCHDEV is enabled, but port driver does not implement
support for IPv4 FIB add/del ops, don't fail route add/del offload
operations. Route adds will not be marked as OFFLOAD. Routes will be
installed in the kernel FIB, as usual.

This was report/fixed by Florian when testing DSA driver with net-next on
devices with L2 offload support but no L3 offload support. What he reported
was an initial route installed from DHCP client would fail (route not
installed to kernel FIB). This was triggering the setting of
ipv4.fib_offload_disabled, which would disable route offloading after the
first failure. So subsequent attempts to install the route would succeed.

There is follow-on work/discussion to address the handling of route install
failures, but for now, let's differentiate between no support and failed
support.

Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7616dcbb 03-Jun-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: documentation: use switchdev_port_obj_xxx for IPv4 FIB add/modify/delete ops

Clarify in documentation and code that IPV4 FIB add operation is used for
both adding a new FIB entry to the device and for modifying an existing FIB
entry on the device.

Also, remove left-over references to ipv4_fib ops and replace with details
on SWITCHDEV_PORT_IPV4_FIB object.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 45d4122c 13-May-2015 Samudrala, Sridhar <sridhar.samudrala@intel.com>

switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.

- introduce port fdb obj and generic switchdev_port_fdb_add/del/dump()
- use switchdev_port_fdb_add/del/dump in rocker/team/bonding ndo ops.
- add support for fdb obj in switchdev_port_obj_add/del/dump()
- switch rocker to implement fdb ops via switchdev_ops

v3: updated to sync with named union changes.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eea39946 13-May-2015 Roopa Prabhu <roopa@cumulusnetworks.com>

rename RTNH_F_EXTERNAL to RTNH_F_OFFLOAD

RTNH_F_EXTERNAL today is printed as "offload" in iproute2 output.

This patch renames the flag to be consistent with what the user sees.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 42275bd8 13-May-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: don't use anonymous union on switchdev attr/obj structs

Older gcc versions (e.g. gcc version 4.4.6) don't like anonymous unions
which was causing build issues on the newly added switchdev attr/obj
structs. Fix this by using named union on structs.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7a7ee531 13-May-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: sparse warning: pass ipv4 fib dst as network-byte order

And let driver convert it to host-byte order as needed.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 22c1f67e 13-May-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: sparse warning: make __switchdev_port_obj_add static

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 58c2cb16 10-May-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: convert fib_ipv4_add/del over to switchdev_port_obj_add/del

The IPv4 FIB ops convert nicely to the switchdev objs and we're left with
only four switchdev ops: port get/set and port add/del. Other objs will
follow, such as FDB. So go ahead and convert IPv4 FIB over to switchdev
obj for consistency, anticipating more objs to come.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8793d0a6 10-May-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: add new switchdev_port_bridge_getlink

Like bridge_setlink, add switchdev wrapper to handle bridge_getlink and
call into port driver to get port attrs. For now, only BR_LEARNING and
BR_LEARNING_SYNC are returned. To add more, we'll probably want to break
away from ndo_dflt_bridge_getlink() and build the netlink skb directly in
the switchdev code.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 87a5dae5 10-May-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: remove unused switchdev_port_bridge_dellink

Now we can remove old wrappers for dellink.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5c34e022 10-May-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: add new switchdev_port_bridge_dellink

Same change as setlink. Provide the wrapper op for SELF ndo_bridge_dellink
and call into the switchdev driver to delete afspec VLANs.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e71f220b 10-May-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: remove old switchdev_port_bridge_setlink

New attr-based bridge_setlink can recurse lower devs and recover on err, so
remove old wrapper (including ndo_dflt_switchdev_port_bridge_setlink).

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 47f8328b 10-May-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: add new switchdev bridge setlink

Add new switchdev_port_bridge_setlink that can be used by drivers
implementing .ndo_bridge_setlink to set switchdev bridge attributes.
Basically turn the raw rtnl_bridge_setlink netlink into switchdev attr
sets. Proper netlink attr policy checking is done on the protinfo part of
the netlink msg.

Currently, for protinfo, only bridge port attrs BR_LEARNING and
BR_LEARNING_SYNC are parsed and passed to port driver.

For afspec, VLAN objs are passed so switchdev driver can set VLANs assigned
to SELF. To illustrate with iproute2 cmd, we have:

bridge vlan add vid 10 dev sw1p1 self master

To add VLAN 10 to port sw1p1 for both the bridge (master) and the device
(self).

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 491d0f15 10-May-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: introduce switchdev add/del obj ops

Like switchdev attr get/set, add new switchdev obj add/del. switchdev objs
will be things like VLANs or FIB entries, so add/del fits better for
objects than get/set used for attributes.

Use same two-phase prepare-commit transaction model as in attr set.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 35636062 10-May-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: convert STP update to switchdev attr set

STP update is just a settable port attribute, so convert
switchdev_port_stp_update to an attr set.

For DSA, the prepare phase is skipped and STP updates are only done in the
commit phase. This is because currently the DSA drivers don't need to
allocate any memory for STP updates and the STP update will not fail to HW
(unless something horrible goes wrong on the MDIO bus, in which case the
prepare phase wouldn't have been able to predict anyway).

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f8e20a9f 10-May-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: convert parent_id_get to switchdev attr get

Switch ID is just a gettable port attribute. Convert switchdev op
switchdev_parent_id_get to a switchdev attr.

Note: for sysfs and netlink interfaces, SWITCHDEV_ATTR_PORT_PARENT_ID is
called with SWITCHDEV_F_NO_RECUSE to limit switch ID user-visiblity to only
port netdevs. So when a port is stacked under bond/bridge, the user can
only query switch id via the switch ports, but not via the upper devices

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3094333d 10-May-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: introduce get/set attrs ops

Add two new swdev ops for get/set switch port attributes. Most swdev
interactions on a port are gets or sets on port attributes, so rather than
adding ops for each attribute, let's define clean get/set ops for all
attributes, and then we can have clear, consistent rules on how attributes
propagate on stacked devs.

Add the basic algorithms for get/set attr ops. Use the same recusive algo
to walk lower devs we've used for STP updates, for example. For get,
compare attr value for each lower dev and only return success if attr
values match across all lower devs. For sets, set the same attr value for
all lower devs. We'll use a two-phase prepare-commit transaction model for
sets. In the first phase, the driver(s) are asked if attr set is OK. If
all OK, the commit attr set in second phase. A driver would NACK the
prepare phase if it can't set the attr due to lack of resources or support,
within it's control. RTNL lock must be held across both phases because
we'll recurse all lower devs first in prepare phase, and then recurse all
lower devs again in commit phase. If any lower dev fails the prepare
phase, we need to abort the transaction for all lower devs.

If lower dev recusion isn't desired, allow a flag SWITCHDEV_F_NO_RECURSE to
indicate get/set only work on port (lowest) device.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9d47c0a2 10-May-2015 Jiri Pirko <jiri@resnulli.us>

switchdev: s/swdev_/switchdev_/

Turned out that "switchdev" sticks. So just unify all related terms to use
this prefix.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ebb9a03a 10-May-2015 Jiri Pirko <jiri@resnulli.us>

switchdev: s/netdev_switch_/switchdev_/ and s/NETDEV_SWITCH_/SWITCHDEV_/

Turned out that "switchdev" sticks. So just unify all related terms to use
this prefix.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 558d51fa 21-Mar-2015 Roopa Prabhu <roopa@cumulusnetworks.com>

switchdev: fix stp update API to work with layered netdevices

make it same as the netdev_switch_port_bridge_setlink/dellink
api (ie traverse lowerdevs to get to the switch port).

removes "WARN_ON(!ops->ndo_switch_parent_id_get)" because
direct bridge ports can be stacked netdevices (like bonds
and team of switch ports) which may not implement this ndo.

v2 to v3:
- remove changes to bond and team. Bring back the
transparently following lowerdevs like i initially
had for setlink/getlink
(http://www.spinics.net/lists/netdev/msg313436.html)
dave and scott feldman also seem to prefer it be that
way and move to non-transparent way of doing things
if we see a problem down the lane.

v3 to v4:
- fix ret initialization

v4 to v5:
- return err on first failure (scott feldman)

v5 to v6:
- change variable name (err) and initialize to
-EOPNOTSUPP (scott feldman).

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 98237d43 15-Mar-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: use new swdev ops

Move swdev wrappers over to new swdev ops (from previous ndo ops). No
functional changes to the implementation.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>

rocker: move to new swdev ops

Signed-off-by: Scott Feldman <sfeldma@gmail.com>

dsa: move to new swdev ops

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ac70c05b 11-Mar-2015 Simon Horman <simon.horman@netronome.com>

switchdev: correct spelling of notifier in comments

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f8f21471 09-Mar-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: add netlink flags to IPv4 FIB add op

Pass in the netlink flags (NLM_F_*) into switchdev driver for IPv4 FIB add op
to allow driver to 1) optimize hardware updates, 2) handle ip route prepend
and append commands correctly.

Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f4427bc3 09-Mar-2015 Jiri Pirko <jiri@resnulli.us>

switchdev: use gpl variant of symbol export

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e1315db1 06-Mar-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: fix CONFIG_IP_MULTIPLE_TABLES compile issue

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8e05fd71 05-Mar-2015 Scott Feldman <sfeldma@gmail.com>

fib: hook IPv4 fib for hardware offload

Call into the switchdev driver any time an IPv4 fib entry is
added/modified/deleted from the kernel's FIB. The switchdev driver may or
may not install the route to the offload device. In the case where the
driver tries to install the route and something goes wrong (device's routing
table is full, etc), then all of the offloaded routes will be flushed from the
device, route forwarding falls back to the kernel, and no more routes are
offloading.

We can refine this logic later. For now, use the simplist model of offloading
routes up to the point of failure, and then on failure, undo everything and
mark IPv4 offloading disabled.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b5d6fbde 05-Mar-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: implement IPv4 fib ndo wrappers

Flesh out ndo wrappers to call into device driver. To call into device driver,
the wrapper must interate over route's nexthops to ensure all nexthop devs
belong to the same switch device. Currently, there is no support for route's
nexthops spanning offloaded and non-offloaded devices, or spanning ports of
multiple offload devices.

Since switch device ports may be stacked under virtual interfaces (bonds and/or
bridges), and the route's nexthop may be on the virtual interface, the wrapper
will traverse the nexthop dev down to the base dev. It's the base dev that's
passed to the switchdev driver's ndo ops.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 104616e7 05-Mar-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: don't support custom ip rules, for now

Keep switchdev FIB offload model simple for now and don't allow custom ip
rules.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5e8d9049 05-Mar-2015 Scott Feldman <sfeldma@gmail.com>

switchdev: add IPv4 fib ndo ops wrappers

Add IPv4 fib ndo wrapper funcs and stub them out for now.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8a44dbb2 29-Jan-2015 Roopa Prabhu <roopa@cumulusnetworks.com>

swdevice: add new apis to set and del bridge port attributes

This patch adds two new api's netdev_switch_port_bridge_setlink
and netdev_switch_port_bridge_dellink to offload bridge port attributes
to switch port

(The names of the apis look odd with 'switch_port_bridge',
but am more inclined to change the prefix of the api to something else.
Will take any suggestions).

The api's look at the NETIF_F_HW_SWITCH_OFFLOAD feature flag to
pass bridge port attributes to the port device.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 03bf0c28 15-Jan-2015 Jiri Pirko <jiri@resnulli.us>

switchdev: introduce switchdev notifier

This patch introduces new notifier for purposes of exposing events which happen
on switch driver side. The consumers of the event messages are mainly involved
masters, namely bridge and ovs.

Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 38dcf357 28-Nov-2014 Scott Feldman <sfeldma@gmail.com>

bridge: call netdev_sw_port_stp_update when bridge port STP status changes

To notify switch driver of change in STP state of bridge port, add new
.ndo op and provide switchdev wrapper func to call ndo op. Use it in bridge
code then.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 007f790c 28-Nov-2014 Jiri Pirko <jiri@resnulli.us>

net: introduce generic switch devices support

The goal of this is to provide a possibility to support various switch
chips. Drivers should implement relevant ndos to do so. Now there is
only one ndo defined:
- for getting physical switch id is in place.

Note that user can use random port netdevice to access the switch.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>