History log of /linux-master/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
Revision Date Author Comments
# 44c2fbeb 08-Mar-2024 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Share nexthop counters in resilient groups

For resilient groups, we can reuse the same counter for all the buckets
that share the same nexthop. Keep a reference count per counter, and keep
all these counters in a per-next hop group xarray, which serves as a
NHID->counter cache. If a counter is already present for a given NHID, just
take a reference and use the same counter.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/cdd00084533fc83ac5917562f54642f008205bf3.1709901020.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5a5a98e5 08-Mar-2024 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Support nexthop group hardware statistics

When hw_stats is set on a group, install nexthop counters on members of a
group.

Counter allocation request is moved from nexthop object initialization to
the update code. The previous placement made sense: when the counters are
enabled by dpipe, the counters are installed to all existing nexthops and
all nexthops created from then on get them. For the finer-grained nexthop
group statistics, this is unsuitable. The existing placement was kept for
the IPv4 and IPv6 nexthops.

Resilient group replacement emits a pre_replace notification, and then any
bucket_replace notifications if there were any replacements at all. If the
group is balanced and the nexthop composition of the replaced group didn't
change, there will be no such notifiers. Therefore hook to the pre_replace
notifier and mark all buckets for update, to un/install the counters.

When reporting deltas for resilient groups, use the nexthop ID that we
stored in a previous patch to look up to which nexthop a bucket
contributes.

Co-developed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/87495a72f187df2e5d491d02729c550d235fcc85.1709901020.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 41acb554 08-Mar-2024 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Track NH ID's of group members

The core interfaces for collecting per-NH statistics are built around
nexthops even for resilient groups. Because mlxsw models each bucket as a
nexthop, the core next hop that a given bucket contributes to needs to be
looked up. In order to be able to match the two up, we need to track
nexthop ID for members of group nexthop objects. For simplicity, do it for
all nexthop objects, not just group members.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/184ceb6b154e08f5bcf116a705b0fcb01c31895c.1709901020.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 10bf92fd 08-Mar-2024 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Add helpers for nexthop counters

The next patch will add the ability to share nexthop counters among
mlxsw nexthops backed by the same core nexthop. To have a place to store
reference count, the counter should be kept in a dedicated structure. In
this patch, introduce the structure together with the related helpers, sans
the refcount, which comes in the next patch.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/61f23fa4f8c5d7879f68dacd793d8ab7425f33c0.1709901020.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 79fa5214 08-Mar-2024 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Avoid allocating NH counters twice

mlxsw_sp_nexthop_counter_disable() decays to a nop when called on a
disabled counter, but mlxsw_sp_nexthop_counter_enable() can't similarly
be called on an enabled counter. This would be useful in the following
patches. Add the missing condition.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/0cc9050e196366c1387ab5ee47f1cee8ecde9c86.1709901020.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 6fb88aaf 08-Mar-2024 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum: Allow fetch-and-clear of flow counters

For the report_delta-like interface like a previous patch has added for
collection of NH group statistics, it's easiest to read the counter and
have the HW clear it right away. Thus, change mlxsw_sp_flow_counter_get()
to take a bool indicating whether this should be done.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/6a096ede8ee92d5041e3832242c3bbc137198aba.1709901020.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 8acb480e 08-Mar-2024 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Have mlxsw_sp_nexthop_counter_enable() return int

In order to be able to diagnose failures in counter allocation, have the
function mlxsw_sp_nexthop_counter_enable() return an error code.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/e0bb5c0cc6234ade2ade1e92abac991359c3f446.1709901020.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 64f962c6 08-Mar-2024 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Rename two functions

The function mlxsw_sp_nexthop_counter_alloc() doesn't directly allocate
anything, and mlxsw_sp_nexthop_counter_free() doesn't directly free. For
the following patches, we will need names for functions that actually do
those things. Therefore rename to mlxsw_sp_nexthop_counter_enable() and
mlxsw_sp_nexthop_counter_disable() to free up the namespace.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/f59272958697a718f090f59f892d32beabcd8972.1709901020.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 1267f722 26-Jan-2024 Amit Cohen <amcohen@nvidia.com>

mlxsw: Use refcount_t for reference counting

mlxsw driver uses 'unsigned int' for reference counters in several
structures. Instead, use refcount_t type which allows us to catch overflow
and underflow issues. Change the type of the counters and use the
appropriate API.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 62bef636 17-Jan-2024 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Register netdevice notifier before nexthop

If there are IPIP nexthops at the time when the driver is loaded (or the
devlink instance reloaded), the driver looks up the corresponding IPIP
entry. But IPIP entries are only created as a result of netdevice
notifications. Since the netdevice notifier is registered after the nexthop
notifier, mlxsw_sp_nexthop_type_init() never finds the IPIP entry,
registers the nexthop MLXSW_SP_NEXTHOP_TYPE_ETH, and fails to assign a CRIF
to the nexthop. Later on when the CRIF is necessary, the WARN_ON in
mlxsw_sp_nexthop_rif() triggers, causing the splat [1].

In order to fix the issue, reorder the netdevice notifier to be registered
before the nexthop one.

[1] (edited for clarity):

WARNING: CPU: 1 PID: 1364 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3245 mlxsw_sp_nexthop_rif (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3246 (discriminator 1)) mlxsw_spectrum
Hardware name: Mellanox Technologies Ltd. MSN4410/VMOD0010, BIOS 5.11 01/06/2019
Call Trace:
? mlxsw_sp_nexthop_rif (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3246 (discriminator 1)) mlxsw_spectrum
__mlxsw_sp_nexthop_eth_update (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3637) mlxsw_spectrum
mlxsw_sp_nexthop_update (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3679 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3727) mlxsw_spectrum
mlxsw_sp_nexthop_group_update (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3757) mlxsw_spectrum
mlxsw_sp_nexthop_group_refresh (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4112) mlxsw_spectrum
mlxsw_sp_nexthop_obj_event (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5118 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5191 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5315 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5500) mlxsw_spectrum
nexthops_dump (net/ipv4/nexthop.c:217 net/ipv4/nexthop.c:440 net/ipv4/nexthop.c:3609)
register_nexthop_notifier (net/ipv4/nexthop.c:3624)
mlxsw_sp_router_init (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:11486) mlxsw_spectrum
mlxsw_sp_init (drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3267) mlxsw_spectrum
__mlxsw_core_bus_device_register (drivers/net/ethernet/mellanox/mlxsw/core.c:2202) mlxsw_core
mlxsw_devlink_core_bus_device_reload_up (drivers/net/ethernet/mellanox/mlxsw/core.c:2265 drivers/net/ethernet/mellanox/mlxsw/core.c:1603) mlxsw_core
devlink_reload (net/devlink/dev.c:314 net/devlink/dev.c:475)
[...]

Fixes: 9464a3d68ea9 ("mlxsw: spectrum_router: Track next hops at CRIFs")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/74edb8d45d004e8d8f5318eede6ccc3d786d8ba9.1705502064.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f7ebb402 20-Nov-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Call RIF setup before obtaining FID

For subport RIFs, the setup initializes, among other things, RIF port and
LAG numbers. Those are important to determine where in the PGT the RIF FID
will be stored. Therefore, call the RIF setup before fid_get.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/f24d8cad7e4748b8e8e0e16894ca6a20704dea32.1700503644.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 27851dfa 20-Nov-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Add a helper to get subport number from a RIF

In the CFF flood mode, responsibility for management of the PGT entries for
rFIDs is moved from FW to the driver. All rFIDs are based off either a
front panel port, or a LAG port. The flood vectors for port-based rFIDs
enable just the port itself, the ones for LAG-based rFIDs enable all member
ports of the LAG in question.

Since all rFIDs based off the same port have the same flood vector, and
similarly for LAG-based rFIDs, the flood entries are shared. The PGT
address of the flood vector is therefore determined based on the port (or
LAG) number of the RIF connected with the rFID.

Add a helper to determine subport number given a RIF, to be used in these
calculations.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/d7ab43cf5b021f785f363f236e4b6780d10eea93.1700503644.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 4d3a42ec 29-Sep-2023 Kees Cook <keescook@chromium.org>

mlxsw: spectrum_router: Annotate struct mlxsw_sp_nexthop_group_info with __counted_by

Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct mlxsw_sp_nexthop_group_info.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Petr Machata <petrm@nvidia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230929180746.3005922-4-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# cb211620 27-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: IPv6 events: Use tracker helpers to hold & put netdevices

Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with IPv6 address events.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/f0af6ad4722b4ca6e598fd4fda8311a3041651ec.1690471775.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# d0e0e880 27-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: RIF: Use tracker helpers to hold & put netdevices

Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with RIF allocation.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/8b7701a7b439ac268e4be4040eff99d01e27ae47.1690471775.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# b17b2d57 27-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: hw_stats: Use tracker helpers to hold & put netdevices

Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with hw_stats events.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/b972314cfef4f4c24e66e60d13cffa5d606d1bf3.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# deeaa371 27-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: FIB: Use tracker helpers to hold & put netdevices

Using the tracking helpers makes it easier to debug netdevice refcount
imbalances when CONFIG_NET_DEV_REFCNT_TRACKER is enabled.

Convert dev_hold() / dev_put() to netdev_hold() / netdev_put() in the
router code that deals with FIB events.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/5221a92e751c40447c55959f622267ccc999ed04.1690471774.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 4560cf40 19-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Replay IP NETDEV_UP on device deslavement

When a netdevice is removed from a bridge or a LAG, and it has an IP
address, it should join the router and gain a RIF. Do that by replaying
address addition event on the netdevice.

When handling deslavement of LAG or its upper from a bridge device, the
replay should be done after all the lowers of the LAG have left the bridge.
Thus these scenarios are handled by passing replay_deslavement of false,
and by invoking, after the lowers have been processed, a new helper,
mlxsw_sp_netdevice_post_lag_event(), which does the per-LAG / -upper
handling, and in particular invokes the replay.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 31618b22 19-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Replay IP NETDEV_UP on device enslavement

Enslaving of front panel ports (and their uppers) to netdevices that
already have uppers is currently forbidden. When this is permitted, any
uppers with IP addresses need to have the NETDEV_UP inetaddr event
replayed, so that any RIFs are created.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8fdb09a7 19-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Replay neighbours when RIF is made

As neighbours are created, mlxsw is involved through the netevent
notifications. When at the time there is no RIF for a given neighbour, the
notification is not acted upon. When the RIF is later created, these
outstanding neighbours are left unoffloaded and cause traffic to go through
the SW datapath.

In order to fix this issue, as a RIF is created, walk the ARP and ND tables
and find neighbours for the netdevice that represents the RIF. Then
schedule neighbour work for them, allowing them to be offloaded.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 49c3a615 19-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Replay MACVLANs when RIF is made

If IP address is added to a MACVLAN netdevice, the effect is of configuring
VRRP on the RIF for the netdevice linked to the MACVLAN. Because the
MACVLAN offload is tied to existence of a RIF at the linked netdevice,
adding a MACVLAN is currently not allowed until a RIF is present.

If this requirement stays, it will never be possible to attach a first port
into a topology that involves a MACVLAN. Thus topologies would need to be
built in a certain order, which is impractical.

Additionally, IP address removal, which leads to disappearance of the RIF
that the MACVLAN depends on, cannot be vetoed. Thus even as things stand
now it is possible to get to a state where a MACVLAN netdevice exists
without a RIF, despite having mlxsw lowers. And once the MACVLAN is
un-offloaded due to RIF getting destroyed, recreating the RIF does not
bring it back.

In this patch, accept that MACVLAN can be created out of order and support
that use case.

One option would seem to be to simply recognize MACVLAN netdevices as
"interesting", and let the existing replay mechanisms take care of the
offload. However, that does not address the necessity to reoffload MACVLAN
once a RIF is created.

Thus add a new replay hook, symmetrical to mlxsw_sp_rif_macvlan_flush(),
called mlxsw_sp_rif_macvlan_replay(), which instead of unwinding the
existing offloads, applies the configuration as if the netdevice were
created just now.

Additionally, remove all vetoes and warning messages that checked for
presence of a RIF at the linked device.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cfc01a92 19-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Offload ethernet nexthops when RIF is made

As RIF is created, refresh each netxhop group tracked at the CRIF for which
the RIF was created.

Note that nothing needs to be done for IPIP nexthops. The RIF for these is
either available from the get-go, or will never be available, so no after
the fact offloading needs to be done.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ef59713c 19-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Join RIFs of LAG upper VLANs

In the following patches, the requirement that ports be only enslaved to
masters without uppers, is going to be relaxed. It will therefore be
necessary to join not only RIF for the immediate LAG, as is currently the
case, but also RIFs for VLAN netdevices upper to the LAG.

In this patch, extend mlxsw_sp_netdevice_router_join_lag() to walk the
uppers of a LAG being joined, and also join any VLAN ones.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 96c3e45c 19-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Extract a helper to schedule neighbour work

This will come in handy for neighbour replay.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6bbc9ca6 19-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Allow address handlers to run on bridge ports

Currently the IP address event handlers bail out when the event is related
to a netdevice that is a bridge port or a member of a LAG. In order to
create a RIF when a bridged or LAG'd port is unenslaved, these event
handlers will be replayed. However, at the point in time when the
NETDEV_CHANGEUPPER event is delivered, informing of the loss of
enslavement, the port is still formally enslaved.

In order for the operation to have any effect, these handlers need an extra
parameter to indicate that the check for bridge or LAG membership should
not be done. In this patch, add an argument "nomaster" to several event
handlers.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a5b52692 13-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_switchdev: Manage RIFs on PVID change

Currently, mlxsw has several shortcomings with regards to RIF handling due
to PVID changes:

- In order to cause RIF for a bridge device to be created, the user is
expected first to set PVID, then to add an IP address. The reverse
ordering is disallowed, which is not very user-friendly.

- When such bridge gets a VLAN upper whose VID was the same as the existing
PVID, and this VLAN netdevice gets an IP address, a RIF is created for
this netdevice. The new RIF is then assigned to the 802.1Q FID for the
given VID. This results in a working configuration. However, then, when
the VLAN netdevice is removed again, the RIF for the bridge itself is
never reassociated to the VLAN.

- PVID cannot be changed once the bridge has uppers. Presumably this is
because the driver does not manage RIFs properly in face of PVID changes.
However, as the previous point shows, it is still possible to get into
invalid configurations.

In this patch, add the logic necessary for creation of a RIF as a result of
PVID change. Moreover, when a VLAN upper is created whose VID matches lower
PVID, do not create RIF for this netdevice.

These changes obviate the need for ordering of IP address additions and
PVID configuration, so stop forbidding addition of an IP address to a
PVID-less bridge. Instead, bail out quietly. Also stop preventing PVID
changes when the bridge has uppers.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3430f2cf 13-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: mlxsw_sp_inetaddr_bridge_event: Add an argument

For purposes of replay, mlxsw_sp_inetaddr_bridge_event() will need to make
decisions based on the proposed value of PVID. Querying PVID reveals the
current settings, not the in-flight values that the user requested and that
the notifiers are acting upon. Add a parameter, lower_pvid, which carries
the proposed PVID of the lower bridge, or -1 if the lower is not a bridge.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a24a4d29 13-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Adjust mlxsw_sp_inetaddr_vlan_event() coding style

The bridge branch of the dispatch in this function is going to get more
code and will need curly braces. Per the doctrine, that means the whole
if-else chain should get them.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a0944b24 13-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Take VID for VLAN FIDs from RIF params

Currently, when an IP address is added to a bridge that has no PVID, the
operation is rejected. An IP address addition is interpreted as a request
to create a RIF for the bridge device, but without a PVID there is no VLAN
for which the RIF should be created. Thus the correct way to create a RIF
for a bridge as a user is to first add a PVID, and then add the IP address.

Ideally this ordering requirement would not exist. RIF would be created
either because an IP address is added, or because a PVID is added,
depending on which comes last.

For that, the switchdev code (which notices the PVID change request) must
be able to request that a RIF is created with a given VLAN ID, because at
the time that the PVID notification is distributed, the PVID setting is not
yet visible for querying.

Therefore when creating a VLAN-based RIF, use mlxsw_sp_rif_params.vid to
communicate the VID, and do not determine it ad-hoc in the fid_get
callback.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5ca9f42c 13-Jul-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Pass struct mlxsw_sp_rif_params to fid_get

The fid_get callback is called to allocate a FID for the newly-created RIF.
In a following patch, the fid_get implementation for VLANs will be modified
to take the VLAN ID from the parameters instead of deducing it from the
netdevice. To that end, propagate the RIF parameters to the fid_get
callback.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 90a8007b 03-Jul-2023 Dan Carpenter <dan.carpenter@linaro.org>

mlxsw: spectrum_router: Fix an IS_ERR() vs NULL check

The mlxsw_sp_crif_alloc() function returns NULL on error. It doesn't
return error pointers. Fix the check.

Fixes: 78126cfd5dc9 ("mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIF")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9464a3d6 22-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Track next hops at CRIFs

Move the list of next hops from struct mlxsw_sp_rif to mlxsw_sp_crif. The
reason is that eventually, next hops for mlxsw uppers should be offloaded
and unoffloaded on demand as a netdevice becomes an upper, or stops being
one. Currently, next hops are tracked at RIFs, but RIFs do not exist when a
netdevice is not an mlxsw uppers. CRIFs are kept track of throughout the
netdevice lifetime.

Correspondingly, track at each next hop not its RIF, but its CRIF (from
which a RIF can always be deduced).

Note that now that next hops are tracked at a CRIF, it is not necessary to
move each over to a new RIF when it is necessary to edit a RIF. Therefore
drop mlxsw_sp_nexthop_rif_migrate() and have mlxsw_sp_rif_migrate_destroy()
call mlxsw_sp_nexthop_rif_update() directly.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/e7c1c0a7dd13883b0f09aeda12c4fcf4d63a70e3.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a285d664 22-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Split nexthop finalization to two stages

Nexthop finalization consists of two steps: the part where the offload is
removed, because the backing RIF is now gone; and the part where the
association to the RIF is severed.

Extract from mlxsw_sp_nexthop_type_fini() a helper that covers the
unoffloading part, mlxsw_sp_nexthop_type_rif_gone(), so that it can later
be called independently.

Note that this swaps around the ordering of mlxsw_sp_nexthop_ipip_fini()
vs. mlxsw_sp_nexthop_rif_fini(). The current ordering is more of a
historical happenstance than a conscious decision. The two cleanups do not
depend on each other, and this change should have no observable effects.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/7134559534c5f5c4807c3a1569fae56f8887e763.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# bdc0b78e 22-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Use router.lb_crif instead of .lb_rif_index

A previous patch added a pointer to loopback CRIF to the router data
structure. That makes the loopback RIF index redundant, as everything
necessary can be derived from the CRIF. Drop the field and adjust the code
accordingly.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/8637bf959bc5b6c9d5184b9bd8a0cd53c5132835.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# aa21242b 22-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Link CRIFs to RIFs

When a RIF is about to be created, the registration of the netdevice that
it should be associated with must have been seen in the past, and a CRIF
created. Therefore make this a hard requirement by looking up the CRIF
during RIF creation, and complaining loudly when there isn't one.

This then allows to keep a link between a RIF and its corresponding
CRIF (and back, as the relationship is one-to-at-most-one), which do.

The CRIF will later be useful as the objects tracked there will be
offloaded lazily as a result of RIF creation.

CRIFs are created when an "interesting" netdevice is registered, and
destroyed after such device is unregistered. CRIFs are supposed to already
exist when a RIF creation request arises, and exist at least as long as
that RIF exists. This makes for a simple invariant: it is always safe to
dereference CRIF pointer from "its" RIF.

To guarantee this, CRIFs cannot be removed immediately when the UNREGISTER
event is delivered. The reason is that if a RIF's netdevices has an IPv6
address, removal of this address is notified in an atomic block. To remove
the RIF, the IPv6 removal handler schedules a work item. It must be safe
for this work item to access the associated CRIF as well.

Thus when a netdevice that backs the CRIF is removed, if it still has a
RIF, do not actually free the CRIF, only toggle its can_destroy flag, which
this patch adds. Later on, mlxsw_sp_rif_destroy() collects the CRIF.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/68c8e33afa6b8c03c431b435e1685ffdff752e63.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 78126cfd 22-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIF

CRIFs are generally not maintained for loopback RIFs. However, the RIF for
the default VRF is used for offloading of blackhole nexthops. Nexthops
expect to have a valid CRIF. Therefore in this patch, add code to maintain
CRIF for the loopback RIF as well.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/7f2b2fcc98770167ed1254a904c3f7f585ba43f0.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 4796c287 22-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Maintain a hash table of CRIFs

CRIFs are objects that mlxsw maintains for netdevices that may not have an
associated RIF (i.e. they may not have been instantiated in the ASIC), but
if indeed they do not, it is quite possible they will in the future. These
netdevices are candidate RIFs, hence CRIFs. Netdevices for which CRIFs are
created include e.g. bridges, LAGs, or front panel ports. The idea is that
next hops would be kept at CRIFs, not RIFs, and thus it would be easier to
offload and unoffload the entities that have been added before the RIF was
created.

In this patch, add the code for low-level CRIF maintenance: create and
destroy, and keep in a table keyed by the netdevice pointer for easy
recall.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/186d44e399c475159da20689f2c540719f2d1ed0.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f3c85eed 22-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Use mlxsw_sp_ul_rif_get() to get main VRF LB RIF

The current function, mlxsw_sp_router_ul_rif_get(), is a wrapper around the
function mentioned in the subject. As such it forms an external interface
of the router code.

In future patches we will want to maintain connection between RIFs and the
CRIFs (introduced in the next patch) that back them. That will not hold
for the VRF-based loopback netdevices, so the whole CRIF business can be
kept hidden from the rest of mlxsw.

But for the main VRF loopback RIF we do want to keep the RIF-CRIF
connection, because that RIF is used for blackhole next hops, and the next
hop code can be kept simpler for assuming rif->crif is valid.

Hence, instead, call mlxsw_sp_ul_rif_get() to create the main VRF loopback
RIF. This being an internal function will take the CRIF argument anyway.
Furthermore, the function does not lock, which is not necessary at this
point in code yet.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/7a39a011a02a84164cd7f5da7985ec5b2ae01ba5.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ebbd17ce 22-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Add extack argument to mlxsw_sp_lb_rif_init()

The extack will be handy in later patches.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Link: https://lore.kernel.org/r/e87ba300121010d580b80a281877573a7b1377ca.1687438411.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# d4a37bf0 12-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Move IPIP init up

mlxsw will need to keep track of certain devices that are not related to
any of its front panel ports. This includes IPIP netdevices. To be able to
query the list of supported IPIP types, router->ipip_ops_arr needs to be
initialized.

To that end, move the IPIP initialization up (and finalization
correspondingly down).

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 440273e7 12-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Extract a helper for RIF migration

RIF configuration contains a number of parameters that cannot be changed
after the RIF is created. For the IPIP loopbacks, this is currently worked
around by creating a new RIF with the desired configuration changes
applied, and updating next hops to the new RIF, and then destroying the old
RIF. This operation will be useful as a reusable atom, so extract a helper
to that effect.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 33d11c4e 12-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Add a helper to check if netdev has addresses

This function will be useful later as the driver will need to retroactively
create RIFs for new uppers with addresses.

Add another helper that assumes RCU lock, and restructure the code to
skip the IPv6 branch not through conditioning on the addr_list_empty
variable, but by directly returning the result value. This makes the skip
more obvious than it previously was.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 571c5691 12-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Extract a helper to free a RIF

Right now freeing the object that mlxsw uses to keep track of a RIF is as
simple as calling a kfree. But later on as CRIF abstraction is brought in,
it will involve severing the link between CRIF and its RIF as well. Better
to have the logic encapsulated in a helper.

Since a helper is being introduced, make it a full-fledged destructor and
have it validate that the objects tracked at the RIF have been released.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 532b6e2b 12-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Access nhgi->rif through a helper

To abstract away deduction of RIF from the corresponding next hop group
info (NHGI), mlxsw currently uses a macro. In its current form, that macro
is impossible to extend to more general computation. Therefore introduce a
helper, mlxsw_sp_nhgi_rif(), and use it throughout. This will make it
possible to change the deduction path easily later on.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 69f4ba17 12-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Access nh->rif->dev through a helper

In order to abstract away deduction of netdevice from the corresponding
next hop, introduce a helper, mlxsw_sp_nexthop_dev(), and use it
throughout. This will make it possible to change the deduction path easily
later on.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 2019b5ee 12-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Access rif->dev from params in mlxsw_sp_rif_create()

The previous patch added a helper to access a netdevice given a RIF. Using
this helper in mlxsw_sp_rif_create() is unreasonable: the netdevice was
given in RIF creation parameters. Just take it there.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# fb6ac45e 12-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Access rif->dev through a helper

In order to abstract away deduction of netdevice from the corresponding
RIF, introduce a helper, mlxsw_sp_rif_dev(), and use it throughout. This
will make it possible to change the deduction path easily later on.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 76962b80 12-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Add a helper specifically for joining a LAG

Currently, joining a LAG very simply means that the LAG RIF should be
joined by the subport representing untagged traffic. If the RIF does not
exist, it does not have to be created: if the user wants there to be RIF
for the LAG device, they are supposed to add an IP address, and they are
supposed to do it after tha LAG becomes mlxsw upper.

We can also assume that the LAG has no uppers, otherwise the enslavement is
not allowed.

In the future, these ordering dependencies should be removed. That means
that joining LAG will be more complex operation, possibly involving a lazy
RIF creation, and possibly joining / lazily creating RIFs for VLAN uppers
of the LAG. It will be handy to have a dedicated function that handles all
this. The new function mlxsw_sp_router_port_join_lag() is that.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# e0db883b 12-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Extract a helper from mlxsw_sp_port_vlan_router_join()

Split out of mlxsw_sp_port_vlan_router_join() the part that checks for RIF
and dispatches to __mlxsw_sp_port_vlan_router_join(), leaving it as wrapper
that just manages the router lock.

The new function, mlxsw_sp_port_vlan_router_join_existing(), will be useful
as an atom in later patches.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# df95ae66 09-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Privatize mlxsw_sp_rif_dev()

Now that the external users of mlxsw_sp_rif_dev() have been converted in
the preceding patches, make the function static.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5374a50f 09-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: Convert does-RIF-have-this-netdev queries to a dedicated helper

In a number of places, a netdevice underlying a RIF is obtained only to
compare it to another pointer. In order to clean up the interface between
the router and the other modules, add a new helper to specifically answer
this question, and convert the relevant uses to this new interface.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0255f748 09-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: Convert RIF-has-netdevice queries to a dedicated helper

In a number of places, a netdevice underlying a RIF is obtained only to
check if it a NULL pointer. In order to clean up the interface between the
router and the other modules, add a new helper to specifically answer this
question, and convert the relevant uses to this new interface.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 151b89f6 09-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Reuse work neighbor initialization in work scheduler

After the struct mlxsw_sp_netevent_work.n field initialization is moved
here, the body of code that handles NETEVENT_NEIGH_UPDATE is almost
identical to the one in the helper function. Therefore defer to the helper
instead of inlining the equivalent.

Note that previously, the code took and put a reference of the netdevice.
The new code defers to mlxsw_sp_dev_lower_is_port() to obviate the need for
taking the reference.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 14304e70 09-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Use the available router pointer for netevent handling

This code handles NETEVENT_DELAY_PROBE_TIME_UPDATE, which is invoked every
time the delay_probe_time changes. mlxsw router currently only maintains
one timer, so the last delay_probe_time set wins.

Currently, mlxsw uses mlxsw_sp_port_lower_dev_hold() to find a reference to
the router. This is no longer necessary. But as a side effect, this makes
sure that only updates to "interesting netdevices" (ones that have a
physical netdevice lower) are projected.

Retain that side effect by calling mlxsw_sp_port_dev_lower_find_rcu() and
punting if there is none. Then just proceed using the router pointer that's
already at hand in the helper.

Note that previously, the code took and put a reference of the netdevice.
Because the mlxsw_sp pointer is now obtained from the notifier block, the
port pointer (non-) NULL-ness is all that's relevant, and the reference
does not need to be taken anymore.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 48dde35e 09-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Pass router to mlxsw_sp_router_schedule_work() directly

Instead of passing a notifier block and deducing the router pointer from
that in the helper, do that in the caller, and pass the result. In the
following patches, the pointer will also be made useful in the caller.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 41b2bd20 09-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Move here inetaddr validator notifiers

The validation logic is already in the router code. Move there the notifier
blocks themselves as well.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 50f6c3d5 09-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: mlxsw_sp_router_fini(): Extract a helper variable

Make mlxsw_sp_router_fini() more similar to the _init() function (and more
concise) by extracting the `router' handle to a named variable and using
that throughout. The availability of a dedicated `router' variable will
come in handy in following patches.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 75426cc0 02-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Do not query MAX_VRS on each iteration

MLXSW_CORE_RES_GET involves a call to spectrum_core, a separate module.
Instead of making the call on every iteration, cache it up front, and use
the value.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3903249e 02-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Do not query MAX_RIFS on each iteration

MLXSW_CORE_RES_GET involves a call to spectrum_core, a separate module.
Instead of making the call on every iteration, cache it up front, and use
the value.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5afef674 02-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Use extack in mlxsw_sp~_rif_ipip_lb_configure()

In commit 26029225d992 ("mlxsw: spectrum_router: Propagate extack
further"), the mlxsw_sp_rif_ops.configure callback got a new argument,
extack. However the callbacks that deal with tunnel configuration,
mlxsw_sp1_rif_ipip_lb_configure() and mlxsw_sp2_rif_ipip_lb_configure(),
were never updated to pass the parameter further. Do that now.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# be35db17 02-Jun-2023 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Clarify a comment

"Reserved for X" usually means that only X is supposed to use a given
object. Here, it is used in the sense that X should consider the object
"reserved", as in "restricted".

Replace the comment simply by "X", with the implication that that's where
the field is used.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 35c35692 13-Mar-2023 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum: Fix incorrect parsing depth after reload

Spectrum ASICs have a configurable limit on how deep into the packet
they parse. By default, the limit is 96 bytes.

There are several cases where this parsing depth is not enough and there
is a need to increase it. For example, timestamping of PTP packets and a
FIB multipath hash policy that requires hashing on inner fields. The
driver therefore maintains a reference count that reflects the number of
consumers that require an increased parsing depth.

During reload_down() the parsing depth reference count does not
necessarily drop to zero, but the parsing depth itself is restored to
the default during reload_up() when the firmware is reset. It is
therefore possible to end up in situations where the driver thinks that
the parsing depth was increased (reference count is non-zero), when it
is not.

Fix by making sure that all the consumers that increase the parsing
depth reference count also decrease it during reload_down().
Specifically, make sure that when the routing code is de-initialized it
drops the reference count if it was increased because of a FIB multipath
hash policy that requires hashing on inner fields.

Add a warning if the reference count is not zero after the driver was
de-initialized and explicitly reset it to zero during initialization for
good measures.

Fixes: 2d91f0803b84 ("mlxsw: spectrum: Add infrastructure for parsing configuration")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/9c35e1b3e6c1d8f319a2449d14e2b86373f3b3ba.1678727526.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 2ab6478d 05-Jan-2023 Kees Cook <keescook@chromium.org>

mlxsw: spectrum_router: Replace 0-length array with flexible array

Zero-length arrays are deprecated[1]. Replace struct
mlxsw_sp_nexthop_group_info's "nexthops" 0-length array with a flexible
array. Detected with GCC 13, using -fstrict-flex-arrays=3:

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c: In function 'mlxsw_sp_nexthop_group_hash_obj':
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3278:38: warning: array subscript i is outside array bounds of 'struct mlxsw_sp_nexthop[0]' [-Warray-bounds=]
3278 | val ^= jhash(&nh->ifindex, sizeof(nh->ifindex), seed);
| ^~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:2954:33: note: while referencing 'nexthops'
2954 | struct mlxsw_sp_nexthop nexthops[0];
| ^~~~~~~~

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Cc: Ido Schimmel <idosch@nvidia.com>
Cc: Petr Machata <petrm@nvidia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Tested-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5ca1b208 07-Dec-2022 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Add support for double entry RIFs

In Spectrum-1, loopback router interfaces (RIFs) used for IP-in-IP
encapsulation with an IPv6 underlay require two RIF entries and the RIF
index must be even.

Prepare for this change by extending the RIF parameters structure with a
'double_entry' field that indicates if the RIF being created requires
two RIF entries or not. Only set it for RIFs representing ip6gre tunnels
in Spectrum-1.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 1a2f65b4 07-Dec-2022 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Parametrize RIF allocation size

Currently, each router interface (RIF) consumes one entry in the RIFs
table. This is going to change in subsequent patches where some RIFs
will consume two table entries.

Prepare for this change by parametrizing the RIF allocation size. For
now, always pass '1'.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 40ef76de 07-Dec-2022 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Use gen_pool for RIF index allocation

Currently, each router interface (RIF) consumes one entry in the RIFs
table and there are no alignment constraints. This is going to change in
subsequent patches where some RIFs will consume two table entries and
their indexes will need to be aligned to the allocation size (even).

Prepare for this change by converting the RIF index allocation to use
gen_pool with the 'gen_pool_first_fit_order_align' algorithm.

No Kconfig changes necessary as mlxsw already selects
'GENERIC_ALLOCATOR'.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a292c256 08-Sep-2022 wangjianli <wangjianli@cdjrlc.com>

mellanox/mlxsw: fix repeated words in comments

Delete the redundant word 'in'.

Signed-off-by: wangjianli <wangjianli@cdjrlc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 59ad2471 21-Jul-2022 Juhee Kang <claudiajkang@gmail.com>

mlxsw: use netif_is_any_bridge_port() instead of open code

The open code which is netif_is_bridge_port() || netif_is_ovs_port() is
defined as a new helper function on netdev.h like netif_is_any_bridge_port
that can check both IFF flags in 1 go. So use netif_is_any_bridge_port()
function instead of open code. This patch doesn't change logic.

Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 72a4c8c9 16-Jul-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: convert driver to use unlocked devlink API during init/fini

Prepare for devlink reload being called with devlink->lock held and
convert the mlxsw driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 77b7f83d 04-Jul-2022 Amit Cohen <amcohen@nvidia.com>

mlxsw: Enable unified bridge model

After all the preparations for unified bridge model, finally flip mlxsw
driver to use the new model.

Change config profile, set 'ubridge' to true and remove the configurations
that are relevant only for the legacy model. Set 'flood_mode' to
'controlled' as the current mode is not supported with unified bridge
model.

Remove all the code which is dedicated to the legacy model. Remove
'struct mlxsw_sp.ubridge' variable which was temporarily added to separate
configurations between the models.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bf73904f 04-Jul-2022 Amit Cohen <amcohen@nvidia.com>

mlxsw: Add support for 802.1Q FID family

Using the legacy bridge model, there is no VID classification at egress
for 802.1Q FIDs, which means that the VID is maintained.

This behavior cause the limitation that 802.1Q FIDs cannot work with VXLAN.
This limitation stems from the fact that a decapsulated VXLAN packet should
not contain a VLAN tag. If such a packet was to egress from a local port
using a 802.1Q FID, it would "maintain" its VLAN on egress, which is no
VLAN at all.

Currently 802.1Q FIDs are emulated in mlxsw driver using 802.1D FIDs. Using
unified bridge model, there is a FID->VID mapping, so it is possible to
stop emulating 802.1Q FIDs.

The main changes are:
1. Use 'SFGC.bridge_type' = 0, to separate between 802.1Q FIDs and
802.1D FIDs.
2. Use VLAN RIF instead of the emulated one (VLAN_EMU which is emulated
using FID RIF).
3. Create VID->FID mapping when the FID is created. Then when a new port
is mapped to the FID, if it not in virtual mode, no new mapping is
needed. Save the new port in 'port_vid_list', to be able to update a
RIF in all {Port, VID}->FID mappings in case that the port will be in
virtual mode later.
4. Add a dedicated operation function per FID family to update RIF for
VID->FID mappings. For 802.1d and rFID families, just return. For
802.1q family, handle the global mapping which is created for new 802.1q
FID.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 662761d8 04-Jul-2022 Amit Cohen <amcohen@nvidia.com>

mlxsw: Add support for VLAN RIFs

Router interfaces (RIFs) constructed on top of VLAN-aware bridges are of
'VLAN' type, whereas RIFs constructed on top of VLAN-unaware bridges are of
'FID' type.

Currently 802.1Q FIDs are emulated using 802.1D FIDs, therefore VLAN RIFs
are emulated using FID RIFs. As part of converting the driver to use
unified bridge model, 802.1Q FIDs and VLAN RIFs will be used.

The egress FID is required for VLAN RIFs in Spectrum-2 and above, but not
in Spectrum-1, as in Spectrum-1 the mapping for VLAN RIFs is VID->FID,
while in other ASICs it is FID->FID. The reason for the change is that it
is more scalable to reuse the FID->FID entry than creating multiple
{Port, VID}->FID entries for the router port. Use the existing operation
structure to separate the configuration between different ASICs.

Add support for VLAN RIFs, most of the configurations are same to FID
RIFs.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 058de325 04-Jul-2022 Amit Cohen <amcohen@nvidia.com>

mlxsw: Configure egress FID classification after routing

After routing, a packet needs to perform an L2 lookup using the DMAC it got
from the routing and a FID. In unified bridge model, the egress FID
configuration needs to be performed by software.

It is configured by RITR for both sub-port RIFs and FID RIFs. Currently
FID RIFs already configure eFID. Add eFID configuration for sub-port RIFs.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2c3ae763 04-Jul-2022 Amit Cohen <amcohen@nvidia.com>

mlxsw: spectrum_router: Do not configure VID for sub-port RIFs

The field 'vid' in RITR is reserved when unified bridge model is used
and the RIF's type is sub-port RIF. Instead, ingress VID is configured via
SVFA and egress VID is configured via REIV.

Set 'vid' to zero in RITR register for sub-port RIF when unified bridge
model is used.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fea20547 04-Jul-2022 Amit Cohen <amcohen@nvidia.com>

mlxsw: Configure ingress RIF classification

Before layer 2 forwarding, the device classifies an incoming packet to
a FID. The classification is done based on one of the following keys:

1. FID
2. VNI (after decapsulation)
3. VID / {Port, VID}

After classification, the FID is known, but also all the attributes of
the FID, such as the router interface (RIF) via which a packet that
needs to be routed will ingress the router block.

In the legacy model, when a RIF was created / destroyed, it was
firmware's responsibility to update it in the previously mentioned FID
classification records. In the unified bridge model, this responsibility
moved to software.

The third classification requires to iterate over the FID's {Port, VID}
list and issue SVFA write with the correct mapping table according to the
port's mode (virtual or not). We never map multiple VLANs to the same FID
using VID->FID mapping, so such a mapping needs to be performed once.

When a new FID classification entry is configured and the FID already has
a RIF, set the RIF as part of SVFA configuration.

The reverse needs to be done when clearing a RIF from a FID. Currently,
clearing is done by issuing mlxsw_sp_fid_rif_set() with a NULL RIF pointer.
Instead, introduce mlxsw_sp_fid_rif_unset().

Note that mlxsw_sp_fid_rif_set() is called after the RIF is fully
operational, so it conforms to the internal requirement regarding
SVFA.irif_v: "Must not be set for a non-enabled RIF".

Do not set the ingress RIF for rFIDs, as the {Port, VID}->rFID entry is
configured by firmware when legacy model is used, a next patch will
handle this configuration for rFIDs and unified bridge model.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7dd19648 23-Jun-2022 Amit Cohen <amcohen@nvidia.com>

mlxsw: spectrum: Change mlxsw_sp_rif_vlan_fid_op() to be dedicated for FID RIFs

The function was designed to configure both VLAN and FID RIFs, but
currently the driver does not use VLAN RIFs. Instead, it emulates VLAN
RIFs using FID RIFs.

As part of the conversion to the unified bridge model, the driver will
need to use VLAN RIFs, but they will be configured differently from FID
RIFs.

As a preparation for this change, rename the function to reflect the
fact that it is specific to FID RIFs and do not pass the RIF type as an
argument.

This leaves mlxsw_reg_ritr_fid_set() unused, so remove it.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 027c92e0 23-Jun-2022 Amit Cohen <amcohen@nvidia.com>

mlxsw: spectrum: Rename MLXSW_SP_RIF_TYPE_VLAN

Currently, the driver emulates 802.1Q FIDs using 802.1D FIDs. As such,
the RIFs configured on top of these FIDs are FID RIFs and not VLAN RIFs.

As part of converting the driver to the unified bridge model, 802.1Q
FIDs and VLAN RIFs will be used.

As a preparation for this change, rename the emulated VLAN RIFs from
'MLXSW_SP_RIF_TYPE_VLAN' to 'MLXSW_SP_RIF_TYPE_VLAN_EMU'. After the
conversion the emulated VLAN RIFs will be removed.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4ec2feb2 16-Jun-2022 Petr Machata <petrm@nvidia.com>

mlxsw: Add a resource describing number of RIFs

The Spectrum ASIC has a limit on how many L3 devices (called RIFs) can be
created. The limit depends on the ASIC and FW revision, and mlxsw reads it
from the FW. In order to communicate both the number of RIFs that there can
be, and how many are taken now (i.e. occupancy), introduce a corresponding
devlink resource.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b9840fe0 16-Jun-2022 Petr Machata <petrm@nvidia.com>

mlxsw: Keep track of number of allocated RIFs

In order to expose number of RIFs as a resource, it is going to be handy
to have the number of currently-allocated RIFs as a single number.
Introduce such.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 87c0a3c6 13-Jun-2022 Petr Machata <petrm@nvidia.com>

mlxsw: Revert "Prepare for XM implementation - LPM trees"

This reverts commit 923ba95ea22d ("Merge branch
'mlxsw-spectrum-prepare-for-xm-implementation-lpm-trees'").

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 725ff532 13-Jun-2022 Petr Machata <petrm@nvidia.com>

mlxsw: Revert "Prepare for XM implementation - prefix insertion and removal"

This reverts commit e7086213f7b4 ("Merge branch
'mlxsw-spectrum-prepare-for-xm-implementation-prefix-insertion-and-removal'").

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 6a4b02b8 13-Jun-2022 Petr Machata <petrm@nvidia.com>

mlxsw: Revert "Introduce initial XM router support"

This reverts commit 75c2a8fe8e39 ("Merge branch
'mlxsw-introduce-initial-xm-router-support'").

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# e5ec6a25 19-Jul-2022 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Fix IPv4 nexthop gateway indication

mlxsw needs to distinguish nexthops with a gateway from connected
nexthops in order to write the former to the adjacency table of the
device. The check used to rely on the fact that nexthops with a gateway
have a 'link' scope whereas connected nexthops have a 'host' scope. This
is no longer correct after commit 747c14307214 ("ip: fix dflt addr
selection for connected nexthop").

Fix that by instead checking the address family of the gateway IP. This
is a more direct way and also consistent with the IPv6 counterpart in
mlxsw_sp_rt6_is_gateway().

Cc: stable@vger.kernel.org
Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop")
Fixes: 597cfe4fc339 ("nexthop: Add support for IPv4 nexthops")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8895a9c2 18-Jul-2022 Kuniyuki Iwashima <kuniyu@amazon.com>

ipv4: Fix data-races around sysctl_fib_multipath_hash_fields.

While reading sysctl_fib_multipath_hash_fields, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: ce5c9c20d364 ("ipv4: Add a sysctl to control multipath hash fields")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7998c12a 18-Jul-2022 Kuniyuki Iwashima <kuniyu@amazon.com>

ipv4: Fix data-races around sysctl_fib_multipath_hash_policy.

While reading sysctl_fib_multipath_hash_policy, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: bf4e0a3db97e ("net: ipv4: add support for ECMP hash policy choice")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7bf9e18d 13-Jul-2022 Kuniyuki Iwashima <kuniyu@amazon.com>

ip: Fix data-races around sysctl_ip_fwd_update_priority.

While reading sysctl_ip_fwd_update_priority, it can be changed
concurrently. Thus, we need to add READ_ONCE() to its readers.

Fixes: 432e05d32892 ("net: ipv4: Control SKB reprioritization after forwarding")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 665030fd 29-Jun-2022 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Fix rollback in tunnel next hop init

In mlxsw_sp_nexthop6_init(), a next hop is always added to the router
linked list, and mlxsw_sp_nexthop_type_init() is invoked afterwards. When
that function results in an error, the next hop will not have been removed
from the linked list. As the error is propagated upwards and the caller
frees the next hop object, the linked list ends up holding an invalid
object.

A similar issue comes up with mlxsw_sp_nexthop4_init(), where rollback
block does exist, however does not include the linked list removal.

Both IPv6 and IPv4 next hops have a similar issue with next-hop counter
rollbacks. As these were introduced in the same patchset as the next hop
linked list, include the cleanup in this patch.

Fixes: dbe4598c1e92 ("mlxsw: spectrum_router: Keep nexthops in a linked list")
Fixes: a5390278a5eb ("mlxsw: spectrum: Add support for setting counters on nexthops")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220629070205.803952-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# c353fb0d 08-May-2022 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Take router lock in router notifier handler

For notifications that the router needs to handle, router lock is taken.
Further, at least to determine whether an event is related to a tunnel
underlay, router lock also needs to be taken. Due to this, the router lock
is always taken for each unhandled event, and also for some handled events,
even if they are not related to underlay. Thus each event implies at least
one router lock, sometimes two.

Instead of deferring the locking to the leaf handlers, take the lock in the
router notifier handler always. This simplifies thinking about the locking
state, and in some cases saves one lock cycle.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 75ef4342 08-May-2022 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum: Move handling of tunnel events to router code

The events related to IPIP tunnels are handled by the router code. Move the
handling from the central dispatcher in spectrum.c to the new notifier
handler in the router module.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ba81954c 08-May-2022 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum: Move handling of router events to router code

The events NETDEV_PRE_CHANGEADDR, NETDEV_CHANGEADDR and NETDEV_CHANGEMTU
have implications for in-ASIC router interface objects, and as such are
handled in the router module. Move the handling from the central dispatcher
in spectrum.c to the new notifier handler in the router module.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f40e600b 08-May-2022 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum: Move handling of HW stats events to router code

L3 HW stats are implemented in mlxsw as RIF counters, and therefore the
code resides in spectrum_router. Exclude the offload xstats events from the
mlxsw_sp_netdevice_event_is_router() predicate, and instead recreate the
glue code in the router module.

Previously, the order of dispatch was that for events on tunnels, a
dedicated handler was called, which however did not handle HW stats events.
But there is nothing special about tunnel devices as far as HW stats: there
is a RIF associated with the tunnel netdevice, and that RIF is where the
counter should be installed. Therefore now, HW stats events are tested
first, independent of netdevice type. The upshot is that as of this commit,
mlxsw supports L3 HW stats work on GRE tunnels.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4f8afb68 08-May-2022 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum: Move handling of VRF events to router code

Events involving VRF, as L3 concern, are handled in the router code, by the
helper mlxsw_sp_netdevice_vrf_event(). The handler is currently invoked
from the centralized dispatcher in spectrum.c. Instead, move the call to
the newly-introduced router-specific notifier handler.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0a27cb16 08-May-2022 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Add a dedicated notifier block

Currently all netdevice events are handled in the centralized notifier
handler maintained by spectrum.c. Since a number of events are involving
router code, spectrum.c needs to dispatch them to spectrum_router.c. The
spectrum module therefore needs to know more about the router code than it
should have, and there is are several API points through which the two
modules communicate.

To simplify the notifier handlers, introduce a new notifier into the router
module.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cff94376 04-May-2022 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Only query neighbour activity when necessary

The driver periodically queries the device for activity of neighbour
entries in order to report it to the kernel's neighbour table.

Avoid unnecessary activity query when no neighbours are installed. Use
an atomic variable to track the number of neighbours, as it is read
without any locks.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 046eabbf 08-Apr-2022 Guillaume Nault <gnault@redhat.com>

mlxsw: Use dscp_t in struct mlxsw_sp_fib4_entry

Use the new dscp_t type to replace the tos field of struct
mlxsw_sp_fib4_entry. This ensures ECN bits are ignored and makes it
compatible with the dscp fields of fib_entry_notifier_info and
fib_rt_info.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 568a3f33 08-Apr-2022 Guillaume Nault <gnault@redhat.com>

ipv4: Use dscp_t in struct fib_entry_notifier_info

Use the new dscp_t type to replace the tos field of struct
fib_entry_notifier_info. This ensures ECN bits are ignored and makes it
compatible with the dscp field of struct fib_rt_info.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 888ade8f 08-Apr-2022 Guillaume Nault <gnault@redhat.com>

ipv4: Use dscp_t in struct fib_rt_info

Use the new dscp_t type to replace the tos field of struct fib_rt_info.
This ensures ECN bits are ignored and makes it compatible with the
fa_dscp field of struct fib_alias.

This also allows sparse to flag potential incorrect uses of DSCP and
ECN bits.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 6f2f36e5 02-Apr-2022 Tom Rix <trix@redhat.com>

mlxsw: spectrum_router: simplify list unwinding

The setting of i here
err_nexthop6_group_get:
i = nrt6;
Is redundant, i is already nrt6. So remove
this statement.

The for loop for the unwinding
err_rt6_create:
for (i--; i >= 0; i--) {
Is equivelent to
for (; i > 0; i--) {

Two consecutive labels can be reduced to one.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220402121516.2750284-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 8d0f7d3a 02-Mar-2022 Petr Machata <petrm@nvidia.com>

mlxsw: Add support for IFLA_OFFLOAD_XSTATS_L3_STATS

Spectrum machines support L3 stats by binding a counter to a RIF, a
hardware object representing a router interface. Recognize the netdevice
notifier events, NETDEV_OFFLOAD_XSTATS_*, to support enablement,
disablement, and reporting back to core.

As a netdevice gains a RIF, if L3 stats are enabled, install the counters,
and ping the core so that a userspace notification can be emitted.

Similarly, as a netdevice loses a RIF, push the as-yet-unreported
statistics to the core, so that they are not lost, and ping the core to
emit userspace notification.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9834e246 02-Mar-2022 Petr Machata <petrm@nvidia.com>

mlxsw: spectrum_router: Drop mlxsw_sp arg from counter alloc/free functions

The mlxsw_sp reference is carried by the mlxsw_sp_rif object that is passed
to these functions as well. Just deduce the former from the latter,
and drop the explicit mlxsw_sp parameter. Adapt callers.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 06c08f86 14-Dec-2021 Amit Cohen <amcohen@nvidia.com>

mlxsw: Add support for VxLAN with IPv6 underlay

Currently, mlxsw driver supports VxLAN with IPv4 underlay only.
Add support for IPv6 underlay.

The main differences are:

* Learning is not supported for IPv6 FDB entries, use static entries and
do not allow 'learning' flag for IPv6 VxLAN.

* IPv6 addresses for FDB entries should be saved as part of KVDL.
Use the new API to allocate and release entries for IPv6 addresses.

* Spectrum ASICs do not fill UDP checksum, while in software IPv6 UDP
packets with checksum zero are dropped.
Force the relevant flags which allow the VxLAN device to generate UDP
packets with zero checksum and also receive them.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c934757d 01-Dec-2021 Amit Cohen <amcohen@nvidia.com>

mlxsw: Use u16 for local_port field instead of u8

Currently, local_port field is saved as u8, which means that maximum 256
ports can be used.

As preparation for Spectrum-4, which will support more than 256 ports,
local_port field should be extended.

Save local_port as u16 to allow use of additional ports.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ed1607e2 23-Nov-2021 Danielle Ratson <danieller@nvidia.com>

mlxsw: spectrum_router: Remove deadcode in mlxsw_sp_rif_mac_profile_find

The function idr_for_each_entry() already checks that the next entry in
the IDR is not NULL.

Therefore, checking that again in every iteration leads to deadcode.

Remove the unnecessary check in order to avoid that.

Addresses-Coverity: ("Logically dead code")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b442f2ea 13-Dec-2021 Danielle Ratson <danieller@nvidia.com>

mlxsw: spectrum_router: Consolidate MAC profiles when possible

Currently, when setting a router interface (RIF) MAC address while the
MAC profile is not shared with other RIFs, the profile is edited so that
the new MAC address is assigned to it.

This does not take into account a situation in which the new MAC address
already matches an existing MAC profile. In that situation, two MAC
profiles will be occupied even though they hold MAC addresses from the
same profile.

In order to prevent that, add a check to ensure that editing a MAC
profile takes place only when the new MAC address does not match an
existing profile.

Fixes: 605d25cd782a6 ("mlxsw: spectrum_router: Add RIF MAC profiles support")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Tested-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1c375ffb 25-Oct-2021 Danielle Ratson <danieller@nvidia.com>

mlxsw: spectrum_router: Expose RIF MAC profiles to devlink resource

Expose via devlink-resource the maximum number of RIF MAC profiles and
their current occupancy, so it can be used for debug and writing generic
tests, like in the next patch.

Example for Spectrum-2 output:

$ devlink resource show pci/0000:06:00.0
...
name rif_mac_profiles size 4 occ 0 unit entry dpipe_tables none

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 605d25cd 25-Oct-2021 Danielle Ratson <danieller@nvidia.com>

mlxsw: spectrum_router: Add RIF MAC profiles support

Currently, mlxsw enforces that all the router interfaces (RIFs) have the
same MAC prefix.

Relax this limitation by using RIF MAC profiles. Each profile is
associated with a particular MAC prefix and multiple RIFs can use the
same profile. Therefore, the number of possible MAC prefixes is no
longer one, but the number of profiles supported by the device.

Store the profiles in an IDR and reference count them according to the
number of RIFs using them.

Associate a RIF with a profile when the RIF is created and remove the
association when the RIF is deleted.

Change the association following 'NETDEV_CHANGEADDR' events, except when
only one RIF is using the profile. In which case, change the MAC prefix
of the profile itself instead of associating the RIF with a new profile.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 26029225 25-Oct-2021 Danielle Ratson <danieller@nvidia.com>

mlxsw: spectrum_router: Propagate extack further

The next patch will set the MAC profile of a router interface (RIF) as
part of its configure() callback. The operation can fail in case the
maximum number of profiles was exceeded.

Add extack to mlxsw_sp_rif_ops::configure() in order to communicate such
failures to user space.

In addition, the MAC profile of a RIF can change following a
'NETDEV_CHANGEADDR' notification. Propagate extack to
mlxsw_sp_router_port_change_event() so that failures could be
communicated in this path as well.

No functional changes intended.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ba1c7132 23-Sep-2021 Amit Cohen <amcohen@nvidia.com>

mlxsw: Add support for IP-in-IP with IPv6 underlay for Spectrum-2 and above

Currently, mlxsw driver supports IP-in-IP only with IPv4 underlay.
Add support for IPv6 underlay for Spectrum-2 and above.

Most of the configurations are same to IPv4, the main difference between
IPv4 and IPv6 is related to saving IP addresses.
IPv6 addresses are saved as part of KVD and the relevant registers hold
pointer to them.
Add API for that as part of ipip_ops, so then only Spectrum-2 and above
will save IPv6 addresses in this way.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8d4f1046 23-Sep-2021 Amit Cohen <amcohen@nvidia.com>

mlxsw: spectrum_router: Increase parsing depth for IPv6 decapsulation

The Spectrum ASIC has a configurable limit on how deep into the packet
it parses. By default, the limit is 96 bytes.

For IP-in-IP packets, with IPv6 outer and inner headers, the default
parsing depth is not enough and without increasing it such packets cannot
be properly decapsulated.

Use the existing API to set parsing depth, call it once for each
decapsulation entry when it is created/destroyed.
There is no need to protect the code with new mutex because 'router->lock'
is already taken in these code paths.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a82feba6 23-Sep-2021 Amit Cohen <amcohen@nvidia.com>

mlxsw: Create separate ipip_ops_arr for different ASICs

Currently, there is support for IP-in-IP only with IPv4 underlay for all
supported Spectrum ASICs.
The next patches will add support for IPv6 underlay only for Spectrum-2
and above.

Add infrastructure for splitting IP-in-IP support between different
ASICs - create separate ipip_ops_arr and add ipips_init function to set the
right ops.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 59bf980d 23-Sep-2021 Amit Cohen <amcohen@nvidia.com>

mlxsw: Take tunnel's type into account when searching underlay device

The function __mlxsw_sp_ipip_netdev_ul_dev_get() returns the underlay
device that corresponds to the overlay device that it gets.
Currently, this function assumes that the tunnel is IPv4 GRE, because it
is the only one that is supported by mlxsw driver.

This assumption will no longer be correct when IPv6 GRE support is added,
resulting in wrong underlay device being returned.
Instead, check 'ol_dev->type' and return the underlay device accordingly.

Move the function to spectrum_ipip.c because spectrum_router.c should not
be aware to tunnel type.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 80ef2abc 23-Sep-2021 Amit Cohen <amcohen@nvidia.com>

mlxsw: spectrum_ipip: Create common function for mlxsw_sp_ipip_ol_netdev_change_gre()

The function mlxsw_sp_ipip_ol_netdev_change_gre4() contains code that
can be shared between IPv4 and IPv6.
The only difference is the way that arguments are taken from tunnel
parameters, which are different between IPv4 and IPv6.

For that, add structure 'mlxsw_sp_ipip_parms' to hold all the required
parameters for the function and save it as part of
'struct mlxsw_sp_ipip_entry' instead of the existing structure that is
not shared between IPv4 and IPv6. Add new operation as part of
'mlxsw_sp_ipip_ops' to initialize the new structure.

Then mlxsw_sp_ipip_ol_netdev_change_gre{4,6}() will prepare the new
structure and both will call the same function.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8aba32ce 23-Sep-2021 Amit Cohen <amcohen@nvidia.com>

mlxsw: spectrum_router: Fix arguments alignment

Suppress the following checkpatch.pl check [1] by adding a variable to
store the IP-in-IP options. Noticed while adding equivalent IPv6 code in
subsequent patches.

[1]
CHECK: Alignment should match open parenthesis
+ mlxsw_reg_ritr_loopback_ipip4_pack(ritr_pl, lb_cf.lb_ipipt,
+
+ MLXSW_REG_RITR_LOOPBACK_IPIP_OPTIONS_GRE_KEY_PRESET,

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 45bce5c9 23-Sep-2021 Amit Cohen <amcohen@nvidia.com>

mlxsw: spectrum_router: Create common function for fib_entry_type_unset() code

mlxsw_sp_fib4_entry_type_unset() is not specific for IPv4 FIB entry,
move the code to mlxsw_sp_fib_entry_type_unset(), and call this function
from mlxsw_sp_fib4_entry_type_unset() so then it will be used for IPv6
also.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e3a3aae7 22-Sep-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Start using new trap adjacency entry

Start using the trap adjacency entry that was added in the previous
patch and remove the existing one which is no longer needed.

Note that the name of the old entry was inaccurate as the entry did not
discard packets, but trapped them.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4bdf80bc 22-Sep-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Add trap adjacency entry upon first nexthop group

In commit 0c3cbbf96def ("mlxsw: Add specific trap for packets routed via
invalid nexthops"), mlxsw started allocating a new adjacency entry
during driver initialization, to trap packets routed via invalid
nexthops.

This behavior was later altered in commit 983db6198f0d ("mlxsw:
spectrum_router: Allocate discard adjacency entry when needed") to only
allocate the entry upon the first route that requires it. The motivation
for the change is explained in the commit message.

The problem with the current behavior is that the entry shows up as a
"leak" in a new BPF resource monitoring tool [1]. This is caused by the
asymmetry of the allocation/free scheme. While the entry is allocated
upon the first route that requires it, it is only freed during
de-initialization of the driver.

Instead, track the number of active nexthop groups and allocate the
adjacency entry upon the creation of the first group. Free it when the
number of active groups reaches zero.

The next patch will convert mlxsw to start using the new entry and
remove the old one.

[1] https://github.com/Mellanox/mlxsw/tree/master/Debugging/libbpf-tools/resmon

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 43c1b833 22-Aug-2021 Amit Cohen <amcohen@nvidia.com>

mlxsw: spectrum_router: Increase parsing depth for multipath hash

Commit 01848e05f8bb ("mlxsw: spectrum_router: Add support for inner
layer 3 multipath hash policy") and commit daeabf89eb89 ("mlxsw:
spectrum_router: Add support for custom multipath hash policy") added
support for multipath hash policies where the hash is calculated based
on inner packet fields.

For IPv6-in-IPv6 packets, the default parsing depth (96 bytes) is not
enough when these policies are used.

Therefore, for such cases, call the new API to increase / decrease the
parsing depth as necessary. Care is taken to ensure the API is not
called multiple times.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c35b57ce 10-Aug-2021 Vladimir Oltean <vladimir.oltean@nxp.com>

net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge

The blamed commit added a new field to struct switchdev_notifier_fdb_info,
but did not make sure that all call paths set it to something valid.
For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE
notifier, and since the 'is_local' flag is not set, it contains junk
from the stack, so the bridge might interpret those notifications as
being for local FDB entries when that was not intended.

To avoid that now and in the future, zero-initialize all
switchdev_notifier_fdb_info structures created by drivers such that all
newly added fields to not need to touch drivers again.

Fixes: 2c4eca3ef716 ("net: bridge: switchdev: include local flag in FDB notifications")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20210810115024.1629983-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# fb0a1dac 16-Jun-2021 Colin Ian King <colin.king@canonical.com>

mlxsw: spectrum_router: remove redundant continue statement

The continue statement at the end of a for-loop has no effect,
remove it.

Addresses-Coverity: ("Continue has no effect")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a08a6193 08-Jun-2021 Amit Cohen <amcohen@nvidia.com>

mlxsw: spectrum_router: Remove abort mechanism

The abort mechanism was introduced in commit 8e05fd7166c6 ("fib: hook
IPv4 fib for hardware offload") with the purpose of falling back to
software-based routing in case of a route programming error in hardware.
The process is irreversible and requires users to reload the offloading
driver or reboot the machine.

While this approach might make sense in theory, it makes very little
sense in practice. In the case of high speed ASICs such as the Spectrum
ASIC, the abort mechanism effectively kills the machine upon a non-fatal
error such as a route programming error.

Such an extreme policy does not belong in the kernel, especially when
user space can simply try to reprogram the route following the
RTM_NEWROUTE failure notification.

Therefore, remove the abort mechanism.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# daeabf89 19-May-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Add support for custom multipath hash policy

When this policy is set, only enable the packet fields that were enabled
by user space for multipath hash computation.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 01848e05 19-May-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Add support for inner layer 3 multipath hash policy

When this policy is set, the kernel uses the inner layer 3 fields for
multipath hash computation and falls back to the outer fields if no
encapsulation was encountered. This behavior is most likely influenced
by the behavior of the flow dissector, which is used for the packet
dissection.

The Spectrum ASIC, however, cannot fallback to outer fields if inner
fields are not available. This should not result in a discrepancy from
the software data path because if several flows have matching inner
fields, they will tend to have matching outer fields as well.

Therefore, implement this policy by enabling both outer and inner layer
3 fields for the multipath hash computation.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b7b8f435 19-May-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_outer: Factor out helper for common outer fields

Outer IPv4 and IPv6 addresses are used by multiple multipath hash
policies. Factor out helpers that set these fields to increase code
sharing between different policies.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9d23d3eb 19-May-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Move multipath hash configuration to a bitmap

Currently, the multipath hash configuration is written directly to the
register payload. While this is OK for the two currently supported
policies, it is going to be hard to follow when more policies and more
packet fields are added.

Instead, set the required headers and fields in a bitmap and then dump
it to the register payload.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7725c1c8 19-May-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Replace if statement with a switch statement

The code was written when only two multipath hash policies were present,
so the if statement was sufficient. The next patch and future patches
are going to add support for more policies, so move to a switch
statement.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 51746a35 17-May-2021 Ido Schimmel <idosch@OSS.NVIDIA.COM>

mlxsw: spectrum_router: Avoid missing error code warning

Explicitly set the error code to zero before the goto statement to avoid
the following smatch warning:

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3598 mlxsw_sp_nexthop_group_refresh() warn: missing error code 'err'

The warning is a false positive, but the change both suppresses the
warning and makes it clear to future readers that this is not an error
path.

The original report and discussion can be found here [1].

[1] https://lore.kernel.org/lkml/202105141823.Td2h3Mbi-lkp@intel.com/

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 837ec05c 17-May-2021 Danielle Ratson <danieller@nvidia.com>

mlxsw: Verify the accessed index doesn't exceed the array length

There are few cases in which an array index queried from a fw register,
is accessed without any validation that it doesn't exceed the array
length.

Add a proper length validation, so accessing memory past the end of an
array will be forbidden.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7866f265 30-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Only perform atomic nexthop bucket replacement when requested

When cleared, the 'force' parameter in nexthop bucket replacement
notifications indicates that a driver should try to perform an atomic
replacement. Meaning, only update the contents of the bucket if it is
inactive.

Since mlxsw only queries buckets' activity once every second, there is
no point in trying an atomic replacement if the idle timer interval is
smaller than 1 second.

Currently, mlxsw ignores the original value of 'force' and will always
try an atomic replacement if the idle timer is not smaller than 1
second.

Fix this by taking the original value of 'force' into account and never
promoting a non-atomic replacement to an atomic one.

Fixes: 617a77f044ed ("mlxsw: spectrum_router: Add nexthop bucket replacement support")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 03490a82 24-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Enable resilient nexthop groups to be programmed

Now that mlxsw supports resilient nexthop groups, allow them to be
programmed after validating that their configuration conforms to the
device's limitations (e.g., number of buckets is within predefined
range).

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# debd2b3b 24-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Periodically update activity of nexthop buckets

The kernel periodically checks the idle time of nexthop buckets to
determine if they are idle and can be re-populated with a new nexthop.

When the resilient nexthop group is offloaded to hardware, the kernel
will not see activity on nexthop buckets unless it is reported from
hardware.

Therefore, periodically (every 1 second) query the hardware for activity
of adjacency entries used as part of a resilient nexthop group and
report it to the nexthop code.

The activity is only queried if resilient nexthop groups are in use. The
delayed work is canceled otherwise.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d7761cb3 24-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Update hardware flags on nexthop buckets

So far, mlxsw only updated hardware flags ('offload' / 'trap') on
nexthop objects. For resilient nexthop groups, these flags need to be
updated on individual nexthop buckets as well.

Update these flags whenever updating the flags of the encapsulating
nexthop object and whenever a nexthop bucket is replaced.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 617a77f0 24-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Add nexthop bucket replacement support

Replace a single nexthop bucket upon receiving a
'NEXTHOP_EVENT_BUCKET_REPLACE' notification.

When the 'force' parameter is not set, instruct the device to only
overwrite an adjacency entry if its activity is cleared, so as not to
break existing flows using the adjacency entry. The device does not
provide feedback if the replacement was successful in this case, so the
contents of the adjacency entry after the replacement are compared with
the replacement request.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 197fdfd1 24-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Pass payload pointer to nexthop update function

Have the caller pass a pointer to the payload of the RATR register to
the function updating a single nexthop / adjacency entry.

In a subsequent patch, this will allow the caller to make sure
replacement was successful by querying the state of the adjacency entry
after replacement and comparing with the initial request.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 62b67ff3 24-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Add ability to overwrite adjacency entry only when inactive

Allow the driver to instruct the device to only overwrite an adjacency
entry if its activity is cleared. Currently, adjacency entry is always
overwritten, regardless of activity.

This will be used by subsequent patches to prevent replacement of active
nexthop buckets.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c6fc65f4 24-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Add support for resilient nexthop groups

Parse the configuration of resilient nexthop groups to existing mlxsw
data structures. Unlike non-resilient groups, nexthops without a valid
MAC or router interface (RIF) are programmed with a trap action instead
of not being programmed at all.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ea037b23 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Add Spectrum-{2, 3} adjacency group size ranges

Spectrum-{2,3} support different adjacency group size ranges compared to
Spectrum-1. Add an array describing these ranges and change the common
code to use the array which was set during the per-ASIC initialization.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 164fa130 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Encode adjacency group size ranges in an array

The device supports a fixed set of adjacency group sizes. Encode these
sizes in an array, so that the next patch will be able to split it
between Spectrum-1 and Spectrum-{2,3}, which support different size
ranges.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d354fdd9 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Create per-ASIC router operations

There are several differences in the router module between Spectrum-1
and Spectrum-{2,3}. Currently, this is only apparent in the router
interface (RIF) operations that are split between these ASICs.

A subsequent patch is going to introduce another difference between
these ASICs.

Create per-ASIC router operations that will encapsulate all these
differences. For now, these operations are only used to set the per-ASIC
RIF operations in 'mlxsw_sp->router->rif_ops_arr'. Note that this fields
was unused since commit 1f5b23033937 ("mlxsw: spectrum: Set RIF ops per
ASIC type").

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c1efd500 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Avoid unnecessary neighbour updates

Avoid updating neighbour and adjacency entries in hardware when the
neighbour is already connected and its MAC address did not change. This
can happen, for example, when neighbour transitions between valid states
such as 'NUD_REACHABLE' and 'NUD_DELAY'.

This is especially important for resilient hashing as these updates will
result in adjacency entries being marked as active.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 40f5429f 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Break nexthop group entry validation to a separate function

The validation of a nexthop group entry is also necessary for resilient
nexthop groups, so break the validation to a separate function to allow
for code reuse in subsequent patches.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 29017c64 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Encapsulate nexthop update in a function

Encapsulate this functionality in a separate function, so that it could
be invoked by follow-up patches, when replacing a nexthop bucket that is
part of a resilient nexthop group.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 424603cc 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Rename nexthop update function to reflect its type

mlxsw_sp_nexthop_update() is used to update the configuration of
Ethernet-type nexthops, as opposed to mlxsw_sp_nexthop_ipip_update(),
which is used to update IPinIP-type nexthops.

Rename the function to mlxsw_sp_nexthop_eth_update(), so that it is
consistent with mlxsw_sp_nexthop_ipip_update().

It will allow us to introduce mlxsw_sp_nexthop_update() in a follow-up
patch, which calls either of above mentioned function based on the
nexthop's type.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fc199d7c 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Add nexthop trap action support

Currently, nexthops are programmed with either forward or discard action
(for blackhole nexthops). Nexthops that do not have a valid MAC address
(neighbour) or router interface (RIF) are simply not written to the
adjacency table.

In resilient nexthop groups, the size of the group must remain fixed and
the kernel is in complete control of the layout of the adjacency table.
A nexthop without a valid MAC or RIF will therefore be written with a
trap action, to trigger neighbour resolution.

Allow such nexthops to be programmed to the adjacency table to enable
above mentioned use case.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1be2361e 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Prepare for nexthops with trap action

Nexthops that need to be programmed with a trap action might not have a
valid router interface (RIF) associated with them. Therefore, use the
loopback RIF created during initialization to program them to the
device.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 031d5c16 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Introduce nexthop action field

Currently, the action associated with the nexthop is assumed to be
'forward' unless the 'discard' bit is set.

Instead, simplify this by introducing a dedicated field to represent the
action of the nexthop. This will allow us to more easily introduce more
actions, such as trap.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 248136fa 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Adjust comments on nexthop fields

The comments assume that nexthops are simple Ethernet nexthops
that are programmed to forward packets to the associated neighbour. This
is no longer the case, as both IPinIP and blackhole nexthops are now
supported.

Adjust the comments to reflect these changes.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c6a5011b 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Only provide MAC address for valid nexthops

The helper returns the MAC address associated with the nexthop. It is
only valid when the nexthop forwards packets and when it is an Ethernet
nexthop. Reflect this in the checks the helper is performing.

This is not an issue because the sole caller of the function only
invokes it for such nexthops.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 26df5acc 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Consolidate nexthop helpers

The helper mlxsw_sp_nexthop_offload() is actually interested in finding
out if the nexthop is both written to the adjacency table and forwarding
packets (as opposed to discarding them).

Rename it to mlxsw_sp_nexthop_is_forward() and remove
mlxsw_sp_nexthop_is_discard().

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 08c99b92 22-Mar-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Remove RTNL assertion

Remove the RTNL assertion in the nexthop notifier block. The assertion
is not needed given RTNL is never assumed to be taken.

This is a preparation for future patches where mlxsw will start handling
nexthop events that are not always sent with RTNL held.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dc860b88 25-Feb-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Ignore routes using a deleted nexthop object

Routes are currently processed from a workqueue whereas nexthop objects
are processed in system call context. This can result in the driver not
finding a suitable nexthop group for a route and issuing a warning [1].

Fix this by ignoring such routes earlier in the process. The subsequent
deletion notification will be ignored as well.

[1]
WARNING: CPU: 2 PID: 7754 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4853 mlxsw_sp_router_fib_event_work+0x1112/0x1e00 [mlxsw_spectrum]
[...]
CPU: 2 PID: 7754 Comm: kworker/u8:0 Not tainted 5.11.0-rc6-cq-20210207-1 #16
Hardware name: Mellanox Technologies Ltd. MSN2100/SA001390, BIOS 5.6.5 05/24/2018
Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib_event_work [mlxsw_spectrum]
RIP: 0010:mlxsw_sp_router_fib_event_work+0x1112/0x1e00 [mlxsw_spectrum]

Fixes: cdd6cfc54c64 ("mlxsw: spectrum_router: Allow programming routes with nexthop objects")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Alex Veber <alexve@nvidia.com>
Tested-by: Alex Veber <alexve@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a4cb1c02 07-Feb-2021 Amit Cohen <amcohen@nvidia.com>

mlxsw: spectrum_router: Set offload_failed flag

When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and route insertion
fails, FIB abort is triggered.

After aborting, set the appropriate hardware flag to make the kernel emit
RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED flag.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0c5fcf9e 07-Feb-2021 Amit Cohen <amcohen@nvidia.com>

IPv6: Add "offload failed" indication to routes

After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.

With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.

This patch adds an "offload_failed" indication to IPv6 routes, so that
users will have better visibility into the offload process.

'struct fib6_info' is extended with new field that indicates if route
offload failed. Note that the new field is added using unused bit and
therefore there is no need to increase struct size.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 36c5100e 07-Feb-2021 Amit Cohen <amcohen@nvidia.com>

IPv4: Add "offload failed" indication to routes

After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.

With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.

This patch adds an "offload_failed" indication to IPv4 routes, so that
users will have better visibility into the offload process.

'struct fib_alias', and 'struct fib_rt_info' are extended with new field
that indicates if route offload failed. Note that the new field is added
using unused bit and therefore there is no need to increase structs size.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# efc42879 01-Feb-2021 Amit Cohen <amcohen@nvidia.com>

net: Do not call fib6_info_hw_flags_set() when IPv6 is disabled

With the next patch mlxsw and netdevsim will fail in compilation if
CONFIG_IPV6 is disabled.

Do not call fib6_info_hw_flags_set() when IPv6 is disabled.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# fbaca8f8 01-Feb-2021 Amit Cohen <amcohen@nvidia.com>

net: Pass 'net' struct as first argument to fib6_info_hw_flags_set()

The next patch will emit notification when hardware flags are changed,
in case that fib_notify_on_flag_change sysctl is set to 1.

To know sysctl values, net struct is needed.
This change is consistent with the IPv4 version, which gets 'net' struct
as its first argument.

Currently, the only callers of this function are mlxsw and netdevsim.
Patch the callers to pass net.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 09ad6bec 28-Jan-2021 Ido Schimmel <idosch@nvidia.com>

nexthop: Use enum to encode notification type

Currently there are only two types of in-kernel nexthop notification.
The two are distinguished by the 'is_grp' boolean field in 'struct
nh_notifier_info'.

As more notification types are introduced for more next-hop group types, a
boolean is not an easily extensible interface. Instead, convert it to an
enum.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 88a31b18 14-Dec-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Use eXtended mezzanine to offload IPv4 router

In case the eXtended mezzanine is present on the system, use it for IPv4
router offload.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 54ff9dbb 14-Dec-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router_xm: Implement L-value tracking for M-index

There is a table that assigns L-value per M-index. The L is always the
biggest from the currently inserted prefixes. Setup a hashtable to track
the M-index information and the prefixes that are related to it. Ensure
the L-value is always correctly set.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# e0bc244d 14-Dec-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Introduce per-ASIC XM initialization

During the router init flow, call into XM code and initialize couple of
items needed for XM functionality:

1) Query the capabilities and sizes. Check the XM device id.
2) Initialize the M-value. Note that currently the M-value is set fixed
to 16 for IPv4. In future this may change to better cover the actual
inserted routes.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ff462103 14-Dec-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Introduce XM implementation of router low-level ops

In order to offload entries to XM, implement a set of low-level
functions to work with LPM trees in XM and also to pack and write
FIB entries into XM.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# acde33bf 06-Dec-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Reduce mlxsw_sp_ipip_fib_entry_op_gre4()

Turned out that mlxsw_sp_ipip_fib_entry_op_gre4() does not need to
figure out the IP address and virtual router id. Those are exactly
the same as in the fib_entry it is called for. So just use that and
reduce mlxsw_sp_ipip_fib_entry_op_gre4() function to only call
mlxsw_sp_ipip_fib_entry_op_gre4_rtdp() make the ipip decap op
code similar to nve.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 31e1de4f 06-Dec-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum: Apply RIF configuration when joining a LAG

In case a router interface (RIF) is configured for a LAG, make sure its
configuration is applied on the new LAG member.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 09139f67 29-Nov-2020 Danielle Ratson <danieller@nvidia.com>

mlxsw: Add QinQ configuration vetoes

After adding support for QinQ, a.k.a 802.1ad protocol, there are a few
scenarios that should be vetoed.

The vetoes are motivated by various ASIC limitations.
For example, a port that is member in a 802.1ad bridge cannot have 802.1q
uppers as the port needs to be configured to treat 802.1q packets as
untagged packets.

Veto all those unsupported scenarios and return suitable messages.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ff47fa13 25-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Update adjacency index more efficiently

The device supports an operation that allows the driver to issue one
request to update the adjacency index for all the routes in a given
virtual router (VR) from old index and size to new ones. This is useful
in case the configuration of a certain nexthop group is updated and its
adjacency index changes.

Currently, the driver does not use this operation in an efficient
manner. It iterates over all the routes using the nexthop group and
issues an update request for the VR if it is not the same as the
previous VR.

Instead, use the VR tracking added in the previous patch to update the
adjacency index once for each VR currently using the nexthop group.

Example:

8k IPv6 routes were added in an alternating manner to two VRFs. All the
routes are using the same nexthop object ('nhid 1').

Before:

# perf stat -e devlink:devlink_hwmsg --filter='incoming==0' -- ip nexthop replace id 1 via 2001:db8:1::2 dev swp3

Performance counter stats for 'ip nexthop replace id 1 via 2001:db8:1::2 dev swp3':

16,385 devlink:devlink_hwmsg

4.255933213 seconds time elapsed

0.000000000 seconds user
0.666923000 seconds sys

Number of EMAD transactions corresponds to number of routes using the
nexthop group.

After:

# perf stat -e devlink:devlink_hwmsg --filter='incoming==0' -- ip nexthop replace id 1 via 2001:db8:1::2 dev swp3

Performance counter stats for 'ip nexthop replace id 1 via 2001:db8:1::2 dev swp3':

3 devlink:devlink_hwmsg

0.077655094 seconds time elapsed

0.000000000 seconds user
0.076698000 seconds sys

Number of EMAD transactions corresponds to number of VRFs / VRs.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# d2141a42 25-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Track nexthop group virtual router membership

For each nexthop group, track in which virtual routers (VRs) the group is
used. This is going to be used by the next patch to perform a more
efficient adjacency index update whenever the group's adjacency index
changes.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 9a4ab10c 25-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Rollback virtual router adjacency pointer update

In the rare case where the adjacency pointer cannot be updated for a
given virtual router, rollback the operation so that virtual routers
that are already using the new index will use the old one again.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 40e4413d 25-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Pass virtual router parameters directly instead of pointer

mlxsw_sp_adj_index_mass_update_vr() only needs the virtual router's
identifier and protocol, so pass them directly. In a subsequent patch
the caller will not have access to the pointer.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 1c2c5eb6 25-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Fix error handling issue

Return error to the caller instead of suppressing it.

Fixes: e3ddfb45bacd ("mlxsw: spectrum_router: Allow returning errors from mlxsw_sp_nexthop_group_refresh()")
Addresses-Coverity: ("Error handling issues (CHECKED_RETURN)")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 68e92ad8 23-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Add support for blackhole nexthops

Add support for blackhole nexthops by programming them to the adjacency
table with a discard action.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 18c4b79d 23-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Resolve RIF from nexthop struct instead of neighbour

The two are the same, but for blackhole nexthops we will not have an
associated neighbour struct, so resolve the RIF from the nexthop struct
itself instead.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 919f6aaa 23-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Use loopback RIF for unresolved nexthops

Now that the driver creates a loopback RIF during its initialization, it
can be used to program the adjacency entries for unresolved nexthops
instead of other RIFs. The loopback RIF is guaranteed to exist for the
entire life time of the driver, unlike other RIFs that come and go.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 52d45575 23-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Use different trap identifier for unresolved nexthops

Unresolved nexthops are currently written to the adjacency table with a
discard action. Packets hitting such entries are trapped to the CPU via
the 'DISCARD_ROUTER3' trap which can be enabled or disabled on demand,
but is always enabled in order to ensure the kernel can resolve the
unresolved neighbours.

This trap will be needed for blackhole nexthops support. Therefore, move
unresolved nexthops to explicitly program the adjacency entries with a
trap action and a different trap identifier, 'RTR_EGRESS0'.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 07c78536 23-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Create loopback RIF during initialization

Up until now RIFs (router interfaces) were created on demand (e.g.,
when an IP address was added to a netdev). However, sometimes the device
needs to be provided with a RIF when one might not be available.

For example, adjacency entries that drop packets need to be programmed
with an egress RIF despite the RIF not being used to forward packets.

Create such a RIF during initialization so that it could be used later
on to support blackhole nexthops.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# cdd6cfc5 19-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Allow programming routes with nexthop objects

Now that the driver supports nexthop objects, the check is no longer
necessary. Remove it.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c25db3a7 19-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Enable resolution of nexthop groups from nexthop objects

If the FIB info (i.e, 'struct fib_info', 'struct fib6_info') uses a
nexthop object, then use the object's identifier to resolve the nexthop
group.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 2a014b20 19-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Add support for nexthop objects

Register a listener to the nexthop notification chain and parse notified
nexthop objects into the existing mlxsw nexthop data structures.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# e3ddfb45 17-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Allow returning errors from mlxsw_sp_nexthop_group_refresh()

The function is responsible for allocating the adjacency entries used by
the nexthop group and populating them with the adjacency information
such as egress RIF and MAC address.

Allow the function to return an error when it encounters a problem and
have the relevant call sites check it.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 2efca2bf 17-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Add an indication if a nexthop group can be destroyed

Currently, a nexthop group is destroyed when the last FIB entry is
detached from it.

When nexthop objects are supported, this can no longer be the case, as
the group is a separate object whose lifetime is managed by user space.

Add an indication if a nexthop group can be destroyed and always set it
to true for the existing IPv4 and IPv6 nexthop groups.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a9a711a3 17-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Only clear offload indication from valid IPv6 FIB info

When the IPv6 FIB info has a nexthop object, the nexthop offload
indication is set on the nexthop object and not on the FIB info itself.

Therefore, do not try to clear the offload indication from the FIB info
when it has a nexthop object.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5b9954e1 17-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Re-order mlxsw_sp_nexthop6_group_get()

Attach the FIB entry to the nexthop group after setting the offload flag
on the IPv6 FIB info (i.e., 'struct fib6_info'). The second operation is
not needed when the nexthop group is a nexthop object. This will allow
us to have a common exit path from the function, regardless of the
nexthop group's type.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c0351b7c 17-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Set FIB entry's type based on nexthop group

The previous patch associated a nexthop group with the FIB entry before
the entry's type is determined.

Make use of the nexthop group when determining the entry's type instead
of relying on helpers that assume that the nexthop info is not a nexthop
object (i.e., 'struct nexthop').

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5c9a3b24 17-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Set FIB entry's type after creating nexthop group

Each FIB entry has a type (e.g., remote, local) that determines how the
entry is programmed to the device. In order to determine if the entry is
local (directly connected) or remote (has a gateway) the relevant FIB
info structures (e.g., 'struct fib_info') are checked.

When entries that use nexthop objects are supported, these checks will
need to be changed to take into account 'struct nexthop'.

Instead, first associate the entry with a nexthop group so that the next
patch could determine the entry's type based on the associated nexthop
group's type.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c68e248d 17-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Pass ifindex to mlxsw_sp_ipip_entry_find_by_decap()

The sole caller of the function will soon only have the ifindex
available, instead of the pointer itself.

Therefore, change the function to take the ifindex as input and have it
get the pointer.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ff8a2418 17-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Set ifindex for IPv4 nexthops

The ifindex of the nexthop device was never set for IPv4 nexthops,
unlike IPv6 nexthops. This went unnoticed since only IPv6 nexthops use
it.

Set the ifindex for IPv4 nexthops in order to be consistent with IPv6
and also because it will be used by a later patch.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# fbf805bf 17-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Fix wrong kfree() in error path

The function allocates 'nhgi', not 'nh_grp', so it needs to free the
former in its error path.

Fixes: 7f7a417e6a11 ("mlxsw: spectrum_router: Split nexthop group configuration to a different struct")
Addresses-Coverity: ("Memory - corruptions (USE_AFTER_FREE)")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 245f4e44 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Remove outdated comment

Since commit 21151f64a458 ("mlxsw: Add new FIB entry type for reject
routes") this comment is no longer correct. Remove it.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 9ed2b4d2 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Consolidate mlxsw_sp_nexthop{4, 6}_type_fini()

The two functions are identical, so consolidate them to
mlxsw_sp_nexthop_type_fini().

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c181a89a 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Consolidate mlxsw_sp_nexthop{4, 6}_type_init()

The two functions are now identical, so consolidate them to
mlxsw_sp_nexthop_type_init().

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# b360952b 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Remove unused argument from mlxsw_sp_nexthop6_type_init()

Remove it as it is unused.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c3bde5a9 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Pass nexthop netdev to mlxsw_sp_nexthop4_type_init()

Instead of passing the nexthop and resolving the nexthop netdev from it,
pass the nexthop netdev directly.

This will later allow us to consolidate code paths between IPv4 and IPv6
code.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 4dd38da5 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Pass nexthop netdev to mlxsw_sp_nexthop6_type_init()

Instead of passing the route and resolving the nexthop netdev from it,
pass the nexthop netdev directly.

This will later allow us to consolidate code paths between IPv4 and IPv6
code.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 7ba7bc55 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_ipip: Remove overlay protocol from can_offload() callback

The overlay protocol (i.e., IPv4/IPv6) that is being encapsulated has
no impact on whether a certain IP tunnel can be offloaded or not. Only
the underlay protocol matters.

Therefore, remove the unused overlay protocol parameter from the
callback.

This will later allow us to consolidate code paths between IPv4 and IPv6
code.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 7f7a417e 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Split nexthop group configuration to a different struct

Currently, the individual nexthops member in the group and attributes of
the group (e.g., its type) are stored in the same struct (i.e., 'struct
mlxsw_sp_nexthop_group'). This is fine since the individual nexthops
cannot change during the lifetime of the group.

With nexthop objects this is no longer the case. An existing nexthop
group can be replaced to use a new set of nexthops. Creating a new
struct whenever a group is replaced entails replacing the group pointer
of all the routes (i.e., 'struct mlxsw_sp_fib_entry') using the group.

Avoid this inefficient step by splitting the nexthop group configuration
to a different struct (i.e., 'struct mlxsw_sp_nexthop_group_info').
When a nexthop group is replaced a new group info struct is created and
the individual rotues do not need to be touched.

Illustration after the change:

mlxsw_sp_fib_entry mlxsw_sp_nexthop_group mlxsw_sp_nexthop_group_info
+-------------------+ +----------------------+ +---------------------------+
| nh_group; +--> nhgi; +--> |
| | | | | |
+-------------------+ +----------------------+ +---------------------------+

No functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5a49dfe5 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Move IPv4 FIB info into a union in nexthop group struct

Instead of storing the FIB info as 'priv' when the nexthop group
represents an IPv4 nexthop group, simply store it as a FIB info with a
proper comment.

When nexthop objects are supported, this field will become a union with
the nexthop object's identifier.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 46d5b7b5 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Remove unused field 'prio' from IPv4 FIB entry struct

Not used anywhere.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 9ce254d9 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Store FIB info in route

When needed, IPv4 routes fetch the FIB info (i.e., 'struct fib_info')
from their associated nexthop group. This will not work when the nexthop
group represents a nexthop object (i.e., 'struct nexthop'), as it will
only have access to the nexthop's identifier.

Instead, store the FIB info in the route itself.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 02d8fdca 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Associate neighbour table with nexthop instead of group

As explained in the previous patch, nexthop objects can have both IPv4
and IPv6 nexthops in the same group. Therefore, move the neighbour table
to be a property of the nexthop instead of the nexthop group.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 1664dd3d 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Use nexthop group type in hash table key

Both IPv4 and IPv6 nexthop groups are hashed in the same table. The
protocol field is used to indicate how the hash should be computed for
each group.

When nexthop group objects are supported, the hash will be computed for
them based on the nexthop identifier.

To differentiate between all the nexthop group types, encode the type of
the group in the key instead of the protocol.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a06191aa 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Add nexthop group type field

Currently, the type (i.e., IPv4/IPv6) of the nexthop group is derived
from the neighbour table associated with the group.

This is problematic when nexthop objects are taken into account, as a
nexthop group object can contain both IPv4 and IPv6 nexthops.

Instead, add a new field that indicates the type of the group and
initialize it during the group's creation. Currently, the types are IPv4
('struct fib_info') and IPv6 ('struct fib6_info'). In the future another
type will be added for nexthop objects ('struct nexthop').

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 10502d05 13-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_router: Compare key with correct object type

When comparing a key with a nexthop group in rhastable's obj_cmpfn()
callback, make sure that the key and nexthop group are of the same type
(i.e., IPv4 / IPv6).

The bug is not currently visible because IPv6 nexthop groups do not
populate the FIB info pointer and IPv4 nexthop groups do not set the
ifindex for the individual nexthops.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 173f14cd 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Introduce FIB entry update op

Follow-up patchset introducing XMDR implementation is going to need
to distinguish write and update ops. Therefore introduce "update op"
and call "write op" only when new FIB entry is inserted.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a005a7fe 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Track FIB entry committed state and skip uncommitted on delete

In case bulking is used, the entry that was previously added may not
be yet committed to the HW as it waits in the queue for bulk send. For
such entries, skip the deletion.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ae9ce81a 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Introduce fib_entry priv for low-level ops

Prepare for the low-level ops that need to store some data alongside
the fib_entry and introduce a per-fib_entry priv for ll ops.
The priv is reference counted as in the follow-up patch it is going
to be saved in pack() function and used later on in commit() even in
case the related fib_entry gets freed in the middle.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 91d20d71 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Have FIB entry op context allocated for the instance

Get the max size needed for FIB entry op context and allocate it once
for the instance. Use it repeatedly from the scheduled work.
By this, allow to extend the context to hold more data than it is wise
to do when it was on the stack. Make sure to signalize that the context
needs to be initialized in case families of subsequent FIB entries differ.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 505cd65c 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Prepare work context for possible bulking

For XMDR register it is possible to carry multiple FIB entry
operations in a single write. However the FW does not restrict mixing
the types of operations, make the code easier and indicate the bulking
is ok only in case the bulk contains FIB operations of the same family
and event.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 7f5c4090 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum: Push RALUE packing and writing into low-level router ops

With follow-up introduction of XM implementation, XMDR register is
going to be optionally used instead of RALUE register. Push the RALUE
packing helpers and write call into low-level router ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 1a9c21d5 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Use RALUE pack helper from abort function

Unify the RALUE register payload packing and use the
__mlxsw_sp_fib_entry_ralue_pack() helper from
__mlxsw_sp_router_set_abort_trap().

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 0c1d6b26 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Pass destination IP as a pointer to mlxsw_reg_ralue_pack4()

Instead of passing destination IP as a u32 value, pass it as pointer to
u32. Avoid using local variable for the pointer store.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# d271cf9f 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum: Export RALUE pack helper and use it from IPIP

As the RALUE packing is going to be put into op, make the user from
IPIP code use the same helper as the router code does.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 0f6b6601 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Push out RALUE pack into separate helper

As the RALUE packing is going to be pushed into an op, in preparation
for that push the code into a separate function in the meantime.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 2d5bd7a1 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum: Propagate context from work handler containing RALUE payload

Currently, RALUE payload is defined locally in the function that is
calling the register write. With introduction of alternative register to
RALUE, XMDR, it has to be possible to put multiple FIB entry
operations into single register write.

So in order to prepare for that, have per-work entry operation context
and propagate it all the way down to the functions writing RALUE.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# c1b290d5 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Introduce FIB event queue instead of separate works

Currently, every FIB event is queued-up as a separate work to be
processed. However, that allows to process only one FIB entry per work
callback.

In preparation of future XMDR register bulking of multiple FIB entries,
convert to FIB event queue. Implement this by a list_head, adding new
events to the end of the list in the FIB notify callback. That allows to
process multiple events from the list inside the work callback.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# d57ff022 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Use RALUE-independent op arg

Since the write/delete of FIB entry is going to be implemented by XMDR
register for XM implementation, introduce RALUE-independent enum for op
so the enum could be used in both RALUE and XMDR.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 69ba53e7 10-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Pass non-register proto enum to __mlxsw_sp_router_set_abort_trap()

Don't pass RALXX register enum and rather pass enum mlxsw_sp_l3proto
to __mlxsw_sp_router_set_abort_trap(). This is in preparation to fib
entry pack implementation by XMDR register.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 803be108 01-Nov-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum_router: Introduce low-level ops and implement them for RALXX regs

In preparation for support of XM router implementation which uses
different registers to work with trees and FIB entries, introduce
a structure to hold low-level ops and implement tree manipulation
register ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# eff74233 25-Sep-2020 Taehee Yoo <ap420073@gmail.com>

net: core: introduce struct netdev_nested_priv for nested interface infrastructure

Functions related to nested interface infrastructure such as
netdev_walk_all_{ upper | lower }_dev() pass both private functions
and "data" pointer to handle their own things.
At this point, the data pointer type is void *.
In order to make it easier to expand common variables and functions,
this new netdev_nested_priv structure is added.

In the following patch, a new member variable will be added into this
struct to fix the lockdep issue.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2595b113 21-Sep-2020 Qinglang Miao <miaoqinglang@huawei.com>

mlxsw: spectrum_router: simplify the return expression of __mlxsw_sp_router_init()

Simplify the return expression.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# 5515c344 28-Jul-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Fix use-after-free in router init / de-init

Several notifiers are registered as part of router initialization.
Since some of these notifiers are registered before the end of the
initialization, it is possible for them to access uninitialized or freed
memory when processing notifications [1].

Additionally, some of these notifiers queue work items on a workqueue.
If these work items are executed after the router was de-initialized,
they will access freed memory.

Fix both problems by moving the registration of the notifiers to the end
of the router initialization and flush the work queue after they are
unregistered.

[1]
BUG: KASAN: use-after-free in __mutex_lock_common kernel/locking/mutex.c:938 [inline]
BUG: KASAN: use-after-free in __mutex_lock+0xeea/0x1340 kernel/locking/mutex.c:1103
Read of size 8 at addr ffff888038c3a6e0 by task kworker/u4:1/61

CPU: 1 PID: 61 Comm: kworker/u4:1 Not tainted 5.8.0-rc2+ #36
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Workqueue: mlxsw_core_ordered mlxsw_sp_inet6addr_event_work
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xf6/0x16e lib/dump_stack.c:118
print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
__mutex_lock_common kernel/locking/mutex.c:938 [inline]
__mutex_lock+0xeea/0x1340 kernel/locking/mutex.c:1103
mlxsw_sp_inet6addr_event_work+0xb3/0x1b0 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:7123
process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
kthread+0x355/0x470 kernel/kthread.c:291
ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293

Allocated by task 1298:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:494 [inline]
__kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
kmalloc include/linux/slab.h:555 [inline]
kzalloc include/linux/slab.h:669 [inline]
mlxsw_sp_router_init+0xb2/0x1d20 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:8074
mlxsw_sp_init+0xbd8/0x3ac0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2932
__mlxsw_core_bus_device_register+0x657/0x10d0 drivers/net/ethernet/mellanox/mlxsw/core.c:1375
mlxsw_core_bus_device_register drivers/net/ethernet/mellanox/mlxsw/core.c:1436 [inline]
mlxsw_devlink_core_bus_device_reload_up+0xcd/0x150 drivers/net/ethernet/mellanox/mlxsw/core.c:1133
devlink_reload net/core/devlink.c:2959 [inline]
devlink_reload+0x281/0x3b0 net/core/devlink.c:2944
devlink_nl_cmd_reload+0x2f1/0x7c0 net/core/devlink.c:2987
genl_family_rcv_msg_doit net/netlink/genetlink.c:691 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:736 [inline]
genl_rcv_msg+0x611/0x9d0 net/netlink/genetlink.c:753
netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469
genl_rcv+0x24/0x40 net/netlink/genetlink.c:764
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0x150/0x190 net/socket.c:672
____sys_sendmsg+0x6d8/0x840 net/socket.c:2363
___sys_sendmsg+0xff/0x170 net/socket.c:2417
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2450
do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 1348:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
kasan_set_free_info mm/kasan/common.c:316 [inline]
__kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
slab_free_hook mm/slub.c:1474 [inline]
slab_free_freelist_hook mm/slub.c:1507 [inline]
slab_free mm/slub.c:3072 [inline]
kfree+0xe6/0x320 mm/slub.c:4063
mlxsw_sp_fini+0x340/0x4e0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3132
mlxsw_core_bus_device_unregister+0x16c/0x6d0 drivers/net/ethernet/mellanox/mlxsw/core.c:1474
mlxsw_devlink_core_bus_device_reload_down+0x8e/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:1123
devlink_reload+0xc6/0x3b0 net/core/devlink.c:2952
devlink_nl_cmd_reload+0x2f1/0x7c0 net/core/devlink.c:2987
genl_family_rcv_msg_doit net/netlink/genetlink.c:691 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:736 [inline]
genl_rcv_msg+0x611/0x9d0 net/netlink/genetlink.c:753
netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469
genl_rcv+0x24/0x40 net/netlink/genetlink.c:764
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0x150/0x190 net/socket.c:672
____sys_sendmsg+0x6d8/0x840 net/socket.c:2363
___sys_sendmsg+0xff/0x170 net/socket.c:2417
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2450
do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at ffff888038c3a000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 1760 bytes inside of
2048-byte region [ffff888038c3a000, ffff888038c3a800)
The buggy address belongs to the page:
page:ffffea0000e30e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffea0000e30e00 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x100000000010200(slab|head)
raw: 0100000000010200 dead000000000100 dead000000000122 ffff88806c40c000
raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888038c3a580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888038c3a600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888038c3a680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888038c3a700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888038c3a780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 965fa8e600d2 ("mlxsw: spectrum_router: Make RIF deletion more robust")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 89ab5331 28-Jul-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Allow programming link-local host routes

Cited commit added the ability to program link-local prefix routes to
the ASIC so that relevant packets are routed and trapped correctly.

However, host routes were not included in the change and thus not
programmed to the ASIC. This can result in packets being trapped via an
external route trap instead of a local route trap as in IPv4.

Fix this by programming all the link-local routes to the ASIC.

Fixes: 10d3757fcb07 ("mlxsw: spectrum_router: Allow programming link-local prefix routes")
Reported-by: Alex Veber <alexve@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d9d54202 10-Jul-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Remove inappropriate usage of WARN_ON()

We should not trigger a warning when a memory allocation fails. Remove
the WARN_ON().

The warning is constantly triggered by syzkaller when it is injecting
faults:

[ 2230.758664] FAULT_INJECTION: forcing a failure.
[ 2230.758664] name failslab, interval 1, probability 0, space 0, times 0
[ 2230.762329] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28
...
[ 2230.898175] WARNING: CPU: 3 PID: 1407 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:6265 mlxsw_sp_router_fib_event+0xfad/0x13e0
[ 2230.898179] Kernel panic - not syncing: panic_on_warn set ...
[ 2230.898183] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28
[ 2230.898190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014

Fixes: 3057224e014c ("mlxsw: spectrum_router: Implement FIB offload in deferred work")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7cf4eda4 27-May-2020 Colin Ian King <colin.king@canonical.com>

mlxsw: spectrum_router: remove redundant initialization of pointer br_dev

The pointer br_dev is being initialized with a value that is never read
and is being updated with a new value later on. The initialization
is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 10d3757f 25-May-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Allow programming link-local prefix routes

The device has a trap for IPv6 packets that need be routed and have a
unicast link-local destination IP (i.e., fe80::/10). This allows mlxsw
to ignore link-local routes, as the packets will be trapped to the CPU
in any case.

However, since link-local routes are not programmed, it is possible for
routed packets to hit the default route which might also be programmed
to trap packets. This means that packets with a link-local destination
IP might be trapped for the wrong reason.

To overcome this, allow programming link-local prefix routes (usually
one fe80::/64 per-table), so that the packets will be forwarded until
reaching the link-local trap.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cec2500d4 19-Apr-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Re-increase scale of IPv6 nexthop groups

As explained in commit fc25996e6f46 ("mlxsw: spectrum_router: Increase
scale of IPv6 nexthop groups"), each nexthop group is hashed by XOR-ing
the interface indexes of all the member nexthop devices.

To avoid many different nexthop groups ending up using the same key, the
above commit started hashing the interface indexes themselves before
they are XOR-ed.

However, in cases in which there are many nexthop groups that all use
the same nexthop device and only differ in the gateway IP, we can still
end up in a situation in which all the groups are using the same key.
This eventually leads to -EBUSY error from rhashtable during insertion.

Improve the situation by also making the gateway IP part of the key.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Veber <alexve@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a84acf78 27-Mar-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Avoid uninitialized symbol errors

Suppress the following smatch errors. None of these are actually
possible with current code paths.

drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1220
mlxsw_sp_ipip_entry_find_decap() error: uninitialized symbol 'saddrp'.
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1220
mlxsw_sp_ipip_entry_find_decap() error: uninitialized symbol
'saddr_len'.
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1221
mlxsw_sp_ipip_entry_find_decap() error: uninitialized symbol
'saddr_prefix_len'.

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:1390
mlxsw_sp_netdevice_ipip_ol_reg_event() error: uninitialized symbol
'ipipt'.

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3255
mlxsw_sp_nexthop_group_update() error: uninitialized symbol 'err'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bdb373cf 27-Mar-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Remove unused RIF and FID families

In merge commit 50853808ff4a ("Merge branch
'mlxsw-Prepare-for-VLAN-aware-bridge-w-VxLAN'") I flipped mlxsw to use
emulated 802.1Q FIDs and correspondingly emulated VLAN RIFs. This means
that the non-emulated variants are no longer used. Remove them and
suppress the following warnings when compiling with W=1:

drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:7572:38: warning:
‘mlxsw_sp_rif_vlan_ops’ defined but not used [-Wunused-const-variable=]

drivers/net/ethernet/mellanox/mlxsw//spectrum_fid.c:584:41: warning:
‘mlxsw_sp_fid_8021q_family’ defined but not used
[-Wunused-const-variable=]

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f0a66984 27-Mar-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add proper function documentation

Suppress following warnings when compiling with W=1:

drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1552: warning:
Function parameter or member 'mlxsw_sp' not described in
'__mlxsw_sp_ipip_entry_update_tunnel'
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1552: warning:
Function parameter or member 'ipip_entry' not described in
'__mlxsw_sp_ipip_entry_update_tunnel'
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1552: warning:
Function parameter or member 'extack' not described in
'__mlxsw_sp_ipip_entry_update_tunnel'

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9811f7a2 21-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Remove RTNL where possible

After introducing the router lock in previous patches and making sure it
protects internal router structures, we no longer need to rely on RTNL
to serialize access to these structures.

Remove RTNL from call sites that no longer require it.

Two calls sites that keep taking the lock are
mlxsw_sp_router_fibmr_event_work() and mlxsw_sp_inet6addr_event_work().
The first calls into ACL code that still assumes RTNL is taken. The
second potentially calls into the FID code that also relies on RTNL.
Removing RTNL from these two call sites is the subject of future work.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 50c173c3 21-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Take router lock from exported helpers

The routing code exports some helper functions that can be called from
other driver modules such as the bridge. These helpers are never called
with the router lock already held and therefore need to take it in order
to serialize access to shared router structures.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1be54763 21-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Take router lock from inetaddr listeners

Another entry point into the routing code is from inetaddr listeners.
The driver registers listeners to IPv4 and IPv6 inetaddr notification
chains in order to understand when a RIF needs to be created or
destroyed.

Serialize access to shared router structures from these listeners by
taking the router lock when processing inetaddr events.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b43c12e7 21-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Take router lock from netdev listener

One entry point into the routing code is from the netdev listener block.
Some netdev events require access to internal router structures. For
example, changing the MTU of a netdev requires looking-up the backing
RIF and adjusting its MTU.

In order to serialize access to shared router structures, take the
router lock when processing netdev events that require access to it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 894276e8 21-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Take router lock from inside routing code

There are several work items in the routing code that currently rely on
RTNL lock to guard against concurrent changes. Have these work items
acquire the router lock in preparation for the removal for RTNL lock
from the routing code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 20bf5d82 21-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Introduce router lock

Introduce a mutex to protect the internal structure of the routing code.
A single lock is added instead of a more fine-grained and complicated
locking scheme because there is not a lot of concurrency in the routing
code.

The main motivation is remove the dependence on RTNL lock, which is
currently used by both the process pushing routes to the kernel and the
workqueue pushing the routes to the underlying device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8e18d85e 21-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Store NVE decapsulation configuration in router

When a host route is added, the driver checks if the route needs to be
promoted to perform NVE decapsulation based on the current NVE
configuration. If so, the index of the decapsulation entry is retrieved
and associated with the route.

Currently, this information is stored in the NVE module which the router
module consults. Since the information is protected under RTNL and since
route insertion happens with RTNL held, there is no problem to retrieve
the information from the NVE module.

However, this is going to change and route insertion will no longer
happen under RTNL. Instead, a dedicated lock will be introduced for the
router module.

Therefore, store this information in the router module and change the
router module to consult this copy.

The validity of the information is set / cleared whenever an NVE tunnel
is initialized / de-initialized. When this happens the NVE module calls
into the router module to promote / demote the relevant host route.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2a60c460 21-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Expose router struct to internal users

The dpipe code accesses internal router data structures and acquires
RTNL to protect against their changes. Subsequent patches will remove
reliance on RTNL and introduce a dedicated lock to protect router data
structures.

Publish the router struct to internal users such as the dpipe, so that
they could acquire it instead of RTNL.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b69e1337 20-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Export function to check if RIF exists

After the previous patch, all the callers of mlxsw_sp_rif_find_by_dev()
outside of the routing code use it to understand if a RIF exists for the
passed netdev.

Therefore, export a function to check if a RIF exists and make
mlxsw_sp_rif_find_by_dev() internal to the routing code.

This will later allow us to more easily introduce the router lock which
will also protect the RIFs.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5e9a664d 20-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Prevent RIF access outside of routing code

There are currently 5 users of mlxsw_sp_rif_find_by_dev() outside of the
routing code. Only one call site actually needs to dereference the
router interface (RIF). The rest merely need to know if a RIF exists for
the provided netdev.

Convert this call site to query the needed information directly from the
routing code instead of dereferencing the RIF.

This will later allow us to replace mlxsw_sp_rif_find_by_dev() with a
function that checks if a RIF exist.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1c6d6b51 20-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Prepare function for router lock introduction

The function de-associates the port-vlan from its router interface
(RIF). It is called both from the netdev notifier block and the inetaddr
notifier block that will soon hold the router lock.

Make sure that router code calls the internal version, as it will
already have the router lock held when the function is called.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fbf8b356 20-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Prepare function for router lock introduction

The function removes the FDB entry that directs the macvlan's MAC to the
router port. It is called from both the netdev notifier block and the
inetaddr notifier block that will soon hold the router lock.

Make sure that only the netdev notifier calls the exported version, so
that is will take the router lock, which will already be held by the
inetaddr notifier.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f24fbf4d 20-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Do not assume RTNL is taken when resolving underlay device

The function that resolves the underlay device of the IPIP tunnel
assumes that RTNL is taken, but this will not be correct when RTNL is
removed from the route insertion path.

Convert the function to use dev_get_by_index_rcu() instead of
__dev_get_by_index() and make sure it is always called from an RCU
read-side critical section.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 23d154c0 20-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Do not assume RTNL is taken during RIF teardown

IPv6 addresses are deleted in an atomic context, so the driver defers
the potential teardown of the associated router interface (RIF) to a
work item that takes RTNL.

The RIF is only destroyed if the associated netdev does not have any IP
addresses (both IPv4 and IPv6). The IPv4 device ('struct in_device') is
currently fetched via __in_dev_get_rtnl() which assumes RTNL is taken.

Since RTNL is going to be removed, convert it to use __in_dev_get_rcu()
from an RCU read-side critical section.

Note that the IPv6 device ('struct inet6_dev') is fetched via
__in6_dev_get(), which does not require RTNL.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c43ef228 20-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Do not assume RTNL is taken during nexthop init

RTNL is going to be removed from route insertion path, so use
__in_dev_get_rcu() from an RCU read-side critical section instead of
__in_dev_get_rtnl() which assumes RTNL is taken.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# da1f9f8c 17-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Reduce dependency between bridge and router code

Commit f40be47a3e40 ("mlxsw: spectrum_router: Do not force specific
configuration order") added a call from the routing code to the bridge
code in order to handle the case where VNI should be set on a FID
following the joining of the router port to the FID.

This is no longer required, as previous patches made VXLAN devices
explicitly take a reference on the FID and set VNI on it.

Therefore, remove the unnecessary call and simply have the RIF take a
reference on the FID without checking if VNI should also be set on it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 490f0542 07-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Clear offload indication from IPv6 nexthops on abort

Unlike IPv4, in IPv6 there is no unique structure to represent the
nexthop and both the route and nexthop information are squashed to the
same structure ('struct fib6_info'). In order to improve resource
utilization the driver consolidates identical nexthop groups to the same
internal representation of a nexthop group.

Therefore, when the offload indication of a nexthop changes, the driver
needs to iterate over all the linked fib6_info and toggle their offload
flag accordingly.

During abort, all the routes are removed from the device and unlinked
from their nexthop group. The offload indication is cleared just before
the group is destroyed, but by that time no fib6_info is linked to the
group and the offload indication remains set.

Fix this by clearing the offload indication just before dropping the
reference from the nexthop.

Fixes: ee5a0448e72b ("mlxsw: spectrum_router: Set hardware flags for routes")
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0508ff89 07-Feb-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Prevent incorrect replacement of local table routes

The driver uses the same table to represent both the main and local
routing tables. Prevent routes in the main table from replacing routes
in the local table to reflect the fact that the local table is consulted
first during lookup.

Fixes: b6a1d871d37a ("mlxsw: spectrum_router: Start using new IPv4 route notifications")
Fixes: dacad7b34b59 ("mlxsw: spectrum_router: Start using new IPv6 route notifications")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4a44ee67 19-Jan-2020 Amit Cohen <amitc@mellanox.com>

mlxsw: Add ECN configurations with IPinIP tunnels

Initialize ECN mapping registers during router init according to
INET_ECN_encapsulate() and INET_ECN_decapsulate().

For IP-in-IP encapsulation, this is required to ensure that ECN bits in
the underlay are set in accordance with the kernel. For decapsulation,
this is required to ensure that packets with invalid ECN combination in
underlay and overlay are trapped to the kernel and not forwarded.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ee5a0448 14-Jan-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Set hardware flags for routes

Previous patches added support for two hardware flags for IPv4 and IPv6
routes: 'RTM_F_OFFLOAD' and 'RTM_F_TRAP'. Both indicate the presence of
the route in hardware. The first indicates that traffic is actually
offloaded from the kernel, whereas the second indicates that packets
hitting such routes are trapped to the kernel for processing (e.g., host
routes).

Use these two flags in mlxsw. The flags are modified in two places.
Firstly, whenever a route is updated in the device's table. This
includes the addition, deletion or update of a route. For example, when
a host route is promoted to perform NVE decapsulation, its action in the
device is updated, the 'RTM_F_OFFLOAD' flag set and the 'RTM_F_TRAP'
flag cleared.

Secondly, when a route is replaced and overwritten by another route, its
flags are cleared.

v2:
* Convert to new fib_alias_hw_flags_set() interface

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8c5a5b9b 14-Jan-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Separate nexthop offload indication from route

The driver currently uses the 'RTNH_F_OFFLOAD' flag for both routes and
nexthops, which is cumbersome and unnecessary now that we have separate
flag for the route itself.

Separate the offload indication for nexthops from routes and call it
whenever the offload state within the nexthop group changes.

Note that IPv6 (unlike IPv4) does not share the same nexthop group
between different routes, whereas mlxsw does. Therefore, whenever the
offload indication within an IPv6 nexthop group changes, all the linked
routes need to be updated.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 314bd842 29-Dec-2019 Amit Cohen <amitc@mellanox.com>

mlxsw: spectrum_router: Skip loopback RIFs during MAC validation

When a router interface (RIF) is created the MAC address of the backing
netdev is verified to have the same MSBs as existing RIFs. This is
required in order to avoid changing existing RIF MAC addresses that all
share the same MSBs.

Loopback RIFs are special in this regard as they do not have a MAC
address, given they are only used to loop packets from the overlay to
the underlay.

Without this change, an error is returned when trying to create a RIF
after the creation of a GRE tunnel that is represented by a loopback
RIF. 'rif->dev->dev_addr' points to the GRE device's local IP, which
does not share the same MSBs as physical interfaces. Adding an IP
address to any physical interface results in:

Error: mlxsw_spectrum: All router interface MAC addresses must have the
same prefix.

Fix this by skipping loopback RIFs during MAC validation.

Fixes: 74bc99397438 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7c4a7ec8 26-Dec-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Remove FIB entry list from FIB node

As explained in previous patches, the driver no longer needs to maintain
a list of identical FIB entries (i.e, same {tb_id, prefix, prefix
length}) and therefore each FIB node can only store one FIB entry.

Remove the FIB entry list and simplify the code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b04720ae 26-Dec-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Consolidate identical functions

After the last patch mlxsw_sp_fib{4,6}_node_entry_link() and
mlxsw_sp_fib{4,6}_node_entry_unlink() are identical and can therefore be
consolidated into the same common function.

Perform the consolidation.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0705297e 26-Dec-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Make route creation and destruction symmetric

Host routes that perform decapsulation of IP in IP tunnels have a
special adjacency entry linked to them. This entry stores information
such as the expected underlay source IP. When the route is deleted this
entry needs to be freed.

The allocation of the adjacency entry happens in
mlxsw_sp_fib4_entry_type_set(), but it is freed in
mlxsw_sp_fib4_node_entry_unlink().

Create a new function - mlxsw_sp_fib4_entry_type_unset() - and free the
adjacency entry there.

This will allow us to consolidate mlxsw_sp_fib{4,6}_node_entry_unlink()
in the next patch.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0d2fb5aa 26-Dec-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Eliminate dead code

Since the driver no longer maintains a list of identical routes there is
no route to promote when a route is deleted.

Remove that code that took care of it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 231c8d2b 26-Dec-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Remove unnecessary checks

Now that the networking stack takes care of only notifying the routes of
interest, we do not need to maintain a list of identical routes.

Remove the check that tests if the route is the first route in the FIB
node.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# caafb250 23-Dec-2019 Ido Schimmel <idosch@mellanox.com>

ipv6: Remove old route notifications and convert listeners

Now that mlxsw is converted to use the new FIB notifications it is
possible to delete the old ones and use the new replace / append /
delete notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dacad7b3 23-Dec-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Start using new IPv6 route notifications

With the new notifications mlxsw does not need to handle identical
routes itself, as this is taken care of by the core IPv6 code.

Instead, mlxsw only needs to take care of inserting and removing routes
from the device.

Convert mlxsw to use the new IPv6 route notifications and simplify the
code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 446f7391 14-Dec-2019 Ido Schimmel <idosch@mellanox.com>

ipv4: Remove old route notifications and convert listeners

Unlike mlxsw, the other listeners to the FIB notification chain do not
require any special modifications as they never considered multiple
identical routes.

This patch removes the old route notifications and converts all the
listeners to use the new replace / delete notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b6a1d871 14-Dec-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Start using new IPv4 route notifications

With the new notifications mlxsw does not need to handle identical
routes itself, as this is taken care of by the core IPv4 code.

Instead, mlxsw only needs to take care of inserting and removing routes
from the device.

Convert mlxsw to use the new IPv4 route notifications and simplify the
code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 62201c00 08-Dec-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Remove unlikely user-triggerable warning

In case the driver vetoes the addition of an IPv6 multipath route, the
IPv6 stack will emit delete notifications for the sibling routes that
were already added to the FIB trie. Since these siblings are not present
in hardware, a warning will be generated.

Have the driver ignore notifications for routes it does not have.

Fixes: ebee3cad835f ("ipv6: Add IPv6 multipath notifications for add / replace")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ed43cff0 24-Nov-2019 Amit Cohen <amitc@mellanox.com>

mlxsw: spectrum_router: Fix use of uninitialized adjacency index

When mlxsw_sp_adj_discard_write() is called for the first time, the
value stored in 'mlxsw_sp->router->adj_discard_index' is invalid, as
indicated by 'mlxsw_sp->router->adj_discard_index_valid' being set to
'false'.

In this case, we should not use the value initially stored in
'mlxsw_sp->router->adj_discard_index' (0) and instead use the value
allocated later in the function.

Fixes: 983db6198f0d ("mlxsw: spectrum_router: Allocate discard adjacency entry when needed")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>


# c5731cc5 24-Nov-2019 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: After underlay moves, demote conflicting tunnels

When a GRE tunnel is bound to an underlay netdevice and that netdevice is
moved to a different VRF, that could cause two tunnels to have the same
underlay local address in the same VRF. Linux in this situation dispatches
the traffic according to the tunnel key (or lack thereof), but that cannot
be offloaded to Spectrum devices.

Detect this situation and unoffload the two impacted tunnels when it
happens.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>


# 1fc16577 18-Nov-2019 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Fix determining underlay for a GRE tunnel

The helper mlxsw_sp_ipip_dev_ul_tb_id() determines the underlay VRF of a
GRE tunnel. For a tunnel without a bound device, it uses the same VRF that
the tunnel is in. However in Linux, a GRE tunnel without a bound device
uses the main VRF as the underlay. Fix the function accordingly.

mlxsw further assumed that moving a tunnel to a different VRF could cause
conflict in local tunnel endpoint address, which cannot be offloaded.
However, the only way that an underlay could be changed by moving the
tunnel device itself is if the tunnel device does not have a bound device.
But in that case the underlay is always the main VRF, so there is no
opportunity to introduce a conflict by moving such device. Thus this check
constitutes a dead code, and can be removed, which do.

Fixes: 6ddb7426a7d4 ("mlxsw: spectrum_router: Introduce loopback RIFs")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 983db619 14-Nov-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Allocate discard adjacency entry when needed

Commit 0c3cbbf96def ("mlxsw: Add specific trap for packets routed via
invalid nexthops") allocated an adjacency entry during driver
initialization whose purpose is to discard packets hitting the route
pointing to it.

These adjacency entries are allocated from a resource called KVD linear
(KVDL). There are situations in which the user can decide to set the
size of this resource (via devlink-resource) to 0, in which case the
driver will not be able to load.

Therefore, instead of pre-allocating this adjacency entry, simply
allocate it only when needed. A variable indicating the validity of the
entry is added and is used to ensure it is only allocated and written
once and that it is freed after all the routes were flushed.

Fixes: 0c3cbbf96def ("mlxsw: Add specific trap for packets routed via invalid nexthops")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0c3cbbf9 07-Nov-2019 Amit Cohen <amitc@mellanox.com>

mlxsw: Add specific trap for packets routed via invalid nexthops

Currently, mlxsw does not differentiate between these two cases of
routes with invalid nexthops:

1. Nexthops whose nexthop device is a mlxsw upper (has a RIF), but whose
neighbour could not be resolved

2. Nexthops whose nexthop device is not a mlxsw upper (e.g., management
interface)

Up until now this did not matter and mlxsw trapped packets for both
cases using the same trap ID. However, packets that should have been
routed in hardware (case 1), but incurred a problem are considered
exceptions and should be reported to the user. The two cases should
therefore be split between two different trap IDs.

Allocate a new adjacency entry during initialization and upon the
insertion of the first route with an invalid mlxsw nexthop, program this
entry to discard packets. Packets hitting this entry will be reported
using new trap ID - "DISCARD_ROUTER3".

In the future, the entry could be written during initialization, but
currently firmware requires a valid RIF, which is not available at this
stage.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 21151f64 07-Nov-2019 Amit Cohen <amitc@mellanox.com>

mlxsw: Add new FIB entry type for reject routes

Currently, packets that cannot be routed in hardware (e.g., nexthop
device is not upper of mlxsw), are trapped to the kernel for forwarding.
Such packets are trapped using "RTR_INGRESS0" trap. This trap also traps
packets that hit reject routes (e.g., "unreachable") so that the kernel
will generate the appropriate ICMP error message for them.

Subsequent patch will need to only report to devlink packets that hit a
reject route, which is impossible as long as "RTR_INGRESS0" is
overloaded like that.

Solve this by using "RTR_INGRESS1" trap for packets that hit reject
routes.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5bcfb6a4 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

mlxsw: Propagate extack down to register_fib_notifier()

During the devlink reaload the extack is present, so propagate it all
the way down to register_fib_notifier() call in spectrum_router.c.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 053e92aa 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum: Take devlink net instead of init_net

Follow-up patch is going to allow to reload devlink instance into
different network namespace, so use devlink_net() helper instead
of init_net.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b7a59557 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: fib_notifier: propagate extack down to the notifier block callback

Since errors are propagated all the way up to the caller, propagate
possible extack of the caller all the way down to the notifier block
callback.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3f9e5c11 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Don't rely on missing extack to symbolize dump

Currently if info->extack is NULL, mlxsw assumes that the event came
down from dump. Originally, the dump did not propagate the return value
back to the original caller (fib_notifier_register()). However, that is
now happening. So benefit from this and push the error up if it happened.
Remove rule cases in work handlers that are now dead code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7c550daf 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: fib_notifier: make FIB notifier per-netns

Currently all users of FIB notifier only cares about events in init_net.
Later in this patchset, users get interested in other namespaces too.
However, for every registered block user is interested only about one
namespace. Make the FIB notifier registration per-netns and avoid
unnecessary calls of notifier block for other namespaces.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fc25996e 23-Jul-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Increase scale of IPv6 nexthop groups

Unlike IPv4, the kernel does not consolidate IPv6 nexthop groups. To
avoid exhausting the device's adjacency table - where nexthops are
stored - the driver does this consolidation instead.

Each nexthop group is hashed by XOR-ing the interface indexes of all the
member nexthop devices. However, the ifindex itself is not hashed, which
can result in identical keys used for different groups and finally an
-EBUSY error from rhashtable due to too long objects list.

Improve the situation by hashing the ifindex itself.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d5382fef 18-Jun-2019 Ido Schimmel <idosch@mellanox.com>

ipv6: Stop sending in-kernel notifications for each nexthop

Both listeners - mlxsw and netdevsim - of IPv6 FIB notifications are now
ready to handle IPv6 multipath notifications.

Therefore, stop ignoring such notifications in both drivers and stop
sending notification for each added / deleted nexthop.

v2:
* Remove 'multipath_rt' from 'struct fib6_entry_notifier_info'

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2d9dd7ec 18-Jun-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Create IPv6 multipath routes in one go

Allow the driver to create an IPv6 multipath route in one go by passing
an array of sibling routes and iterating over them.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d21afd30 18-Jun-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add / delete multiple IPv6 nexthops

Currently, the functions that take care of populating IPv6 nexthop
groups only add / delete a single nexthop.

Prepare them to handle multiple routes in one notification by passing an
array of routes and adding / deleting all of them.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 921bc539 18-Jun-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Pass array of routes to route handling functions

Prepare the driver to handle multiple routes in a single notification by
passing an array of routes to the functions that actually add / delete a
route.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 94d628d1 18-Jun-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Adjust IPv6 replace logic to new notifications

Previously, IPv6 replace notifications were only sent from
fib6_add_rt2node(). The function only emitted such notifications if a
route actually replaced another route.

A previous patch added another call site in ip6_route_multipath_add()
from which such notification can be emitted even if a route was merely
added and did not replace another route.

Adjust the driver to take this into account and potentially set the
'replace' flag to 'false' if the notified route did not replace an
existing route.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 928c0b53 18-Jun-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Pass multiple routes to work item

Prepare the driver to process IPv6 multipath notifications by passing an
array of 'struct fib6_info' instead of just one route.

A reference is taken on each sibling route in order to prevent them from
being freed until they are processed by the workqueue.

v2:
* Remove 'multipath_rt' usage

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ccd56a5f 18-Jun-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Prepare function to return errors

The function mlxsw_sp_router_fib6_event() takes care of preparing the
needed information for the work item that actually inserts the route
into the device.

When processing an IPv6 multipath route, the function will need to
allocate an array to store pointers to all the sibling routes.

Change the function's signature to return an error code and adjust the
single call site.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 20247fca 18-Jun-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Remove processing of IPv6 append notifications

No such notifications are sent by the IPv6 code, so remove them.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f6c3bb75 18-Jun-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Ignore IPv6 multipath notifications

IPv6 multipath notifications are about to be sent, but mlxsw is not
ready to process them, so ignore them.

The limitation will be lifted by a subsequent patch which will also stop
the kernel from sending a notification for each nexthop.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 83d57826 11-Jun-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Refresh nexthop neighbour when it becomes dead

The driver tries to periodically refresh neighbours that are used to
reach nexthops. This is done by periodically calling neigh_event_send().

However, if the neighbour becomes dead, there is nothing we can do to
return it to a connected state and the above function call is basically
a NOP.

This results in the nexthop never being written to the device's
adjacency table and therefore never used to forward packets.

Fix this by dropping our reference from the dead neighbour and
associating the nexthop with a new neigbhour which we will try to
refresh.

Fixes: a7ff87acd995 ("mlxsw: spectrum_router: Implement next-hop routing")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Veber <alexve@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 54250805 03-Jun-2019 David Ahern <dsahern@gmail.com>

mlxsw: Fail attempts to use routes with nexthop objects

Fail attempts to use nexthop objects with routes until support can be
properly added.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5481d73f 03-Jun-2019 David Ahern <dsahern@gmail.com>

ipv4: Use accessors for fib_info nexthop data

Use helpers to access fib_nh and fib_nhs fields of a fib_info. Drop the
fib_dev macro which is an alias for the first nexthop. Replacements:

fi->fib_dev --> fib_info_nh(fi, 0)->fib_nh_dev
fi->fib_nh --> fib_info_nh(fi, 0)
fi->fib_nh[i] --> fib_info_nh(fi, i)
fi->fib_nhs --> fib_info_num_path(fi)

where fib_info_nh(fi, i) returns fi->fib_nh[nhsel] and fib_info_num_path
returns fi->fib_nhs.

Move the existing fib_info_nhc to nexthop.h and define the new ones
there. A later patch adds a check if a fib_info uses a nexthop object,
and defining the helpers in nexthop.h avoid circular header
dependencies.

After this all remaining open coded references to fi->fib_nhs and
fi->fib_nh are in:
- fib_create_info and helpers used to lookup an existing fib_info
entry, and
- the netdev event functions fib_sync_down_dev and fib_sync_up.

The latter two will not be reused for nexthops, and the fib_create_info
will be updated to handle a nexthop in a fib_info.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1cf844c7 22-May-2019 David Ahern <dsahern@gmail.com>

ipv6: Make fib6_nh optional at the end of fib6_info

Move fib6_nh to the end of fib6_info and make it an array of
size 0. Pass a flag to fib6_info_alloc indicating if the
allocation needs to add space for a fib6_nh.

The current code path always has a fib6_nh allocated with a
fib6_info; with nexthop objects they will be separate.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7973d9e7 23-Apr-2019 David Ahern <dsahern@gmail.com>

mlxsw: spectrum_router: Prevent ipv6 gateway with v4 route via replace and append

mlxsw currently does not support v6 gateways with v4 routes. Commit
19a9d136f198 ("ipv4: Flag fib_info with a fib_nh using IPv6 gateway")
prevents a route from being added, but nothing stops the replace or
append. Add a catch for them too.
$ ip ro add 172.16.2.0/24 via 10.99.1.2
$ ip ro replace 172.16.2.0/24 via inet6 fe80::202:ff:fe00:b dev swp1s0
Error: mlxsw_spectrum: IPv6 gateway with IPv4 route is not supported.
$ ip ro append 172.16.2.0/24 via inet6 fe80::202:ff:fe00:b dev swp1s0
Error: mlxsw_spectrum: IPv6 gateway with IPv4 route is not supported.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# be659b8d 21-Apr-2019 David Ahern <dsahern@gmail.com>

ipv6: Restore RTF_ADDRCONF check in rt6_qualify_for_ecmp

The RTF_ADDRCONF flag filters out routes added by RA's in determining
which routes can be appended to an existing one to create a multipath
route. Restore the flag check and add a comment to document the RA piece.

Fixes: 4e54507ab1a9 ("ipv6: Simplify rt6_qualify_for_ecmp")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4e54507a 21-Apr-2019 David Ahern <dsahern@gmail.com>

ipv6: Simplify rt6_qualify_for_ecmp

After commit c7a1ce397ada ("ipv6: Change addrconf_f6i_alloc to use
ip6_route_info_create"), the gateway is no longer filled in for fib6_nh
structs in a prefix route. Accordingly, the RTF_ADDRCONF flag check can
be dropped from the 'rt6_qualify_for_ecmp'.

Further, RTF_DYNAMIC is only set in rt6_info instances, so it can be
removed from the check as well.

This reduces rt6_qualify_for_ecmp and the mlxsw version to just checking
if the nexthop has a gateway which is the real indication of whether
entries can be coalesced into a multipath route.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 05414dd1 21-Apr-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Relax FIB rule validation

Currently, mlxsw does not support policy-based routing (PBR) and
therefore forbids the installation of non-default FIB rules except for
the l3mdev rule which is used for VRFs.

Relax the check to allow the installation of FIB rules that would never
match packets received by the device. Specifically, if the iif is that
of the loopback netdev. This is useful for users that need to redirect
locally generated packets based on FIB rules.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fa73989f 21-Apr-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Use a stable ECMP/LAG seed

In order to get a consistent behavior of traffic flows across reboots /
module unload, we need to use the same ECMP/LAG seed.

Calculate the seed by hashing the base MAC of the device. This results
in a seed that is both unique (to avoid polarization) and consistent.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# caf345a1 14-Apr-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add neighbour offload indication

In a similar fashion to routes and FDB entries, the neighbour table is
reflected to the device.

Set an offload indication on the neighbour in case it was programmed to
the device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a85e84e0 14-Apr-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Propagate neighbour update errors

Next patch will add offload indication to neighbours, but the indication
should only be altered in case the neighbour was successfully added to /
deleted from the device.

Propagate neighbour update errors, so that they could be taken into
account by the next patch.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 972fae68 10-Apr-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Do not check VRF MAC address

Commit 74bc99397438 ("mlxsw: spectrum_router: Veto unsupported RIF MAC
addresses") enabled the driver to veto router interface (RIF) MAC
addresses that it cannot support.

This check should only be performed for interfaces for which the driver
actually configures a RIF. A VRF upper is not one of them, so ignore it.

Without this patch it is not possible to set an IP address on the VRF
device and use it as a loopback.

Fixes: 74bc99397438 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 19a9d136 05-Apr-2019 David Ahern <dsahern@gmail.com>

ipv4: Flag fib_info with a fib_nh using IPv6 gateway

Until support is added to the offload drivers, they need to be able to
reject routes with an IPv6 gateway. To that end add a flag to fib_info
that indicates if any fib_nh has a v6 gateway. The flag allows the drivers
to efficiently know the use of a v6 gateway without walking all fib_nh
tied to a fib_info each time a route is added.

Update mlxsw and rocker to reject the routes with extack message as to why.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bdf00467 05-Apr-2019 David Ahern <dsahern@gmail.com>

net: Replace nhc_has_gw with nhc_gw_family

Allow the gateway in a fib_nh_common to be from a different address
family than the outer fib{6}_nh. To that end, replace nhc_has_gw with
nhc_gw_family and update users of nhc_has_gw to check nhc_gw_family.
Now nhc_family is used to know if the nh_common is part of a fib_nh
or fib6_nh (used for container_of to get to route family specific data),
and nhc_gw_family represents the address family for the gateway.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ad1601ae 27-Mar-2019 David Ahern <dsahern@gmail.com>

ipv6: Rename fib6_nh entries

Rename fib6_nh entries that will be moved to a fib_nh_common struct.
Specifically, the device, gateway, flags, and lwtstate are common
with all nexthop definitions. In some places new temporary variables
are declared or local variables renamed to maintain line lengths.

Rename only; no functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b75ed8b1 27-Mar-2019 David Ahern <dsahern@gmail.com>

ipv4: Rename fib_nh entries

Rename fib_nh entries that will be moved to a fib_nh_common struct.
Specifically, the device, oif, gateway, flags, scope, lwtstate,
nh_weight and nh_upper_bound are common with all nexthop definitions.
In the process shorten fib_nh_lwtstate to fib_nh_lws to avoid really
long lines.

Rename only; no functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2b2450ca 27-Mar-2019 David Ahern <dsahern@gmail.com>

ipv6: Move gateway checks to a fib6_nh setting

The gateway setting is not per fib6_info entry but per-fib6_nh. Add a new
fib_nh_has_gw flag to fib6_nh and convert references to RTF_GATEWAY to
the new flag. For IPv6 address the flag is cheaper than checking that
nh_gw is non-0 like IPv4 does.

While this increases fib6_nh by 8-bytes, the effective allocation size of
a fib6_info is unchanged. The 8 bytes is recovered later with a
fib_nh_common change.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 24f91ce0 12-Feb-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Drop unnecessary WARN_ON_ONCE()

In case the register access failed an error would be logged anyway, so
we can drop the warning.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9e475293 07-Feb-2019 Gustavo A. R. Silva <gustavo@embeddedor.com>

mlxsw: spectrum_router: Use struct_size() in kzalloc()

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
int stuff;
struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kzalloc(size, GFP_KERNEL)

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL)

Notice that, in this case, variable alloc_size is not necessary, hence
it is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2810c3b2 06-Feb-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Offload blackhole routes

Create a new FIB entry type for blackhole routes and set it in case the
type of the notified route is 'RTN_BLACKHOLE'.

Program such routes with a discard action and mark them as offloaded
since the device is dropping the packets instead of the kernel.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eff42aa9 23-Jan-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Expose functions to create and destroy underlay RIF

In Spectrum-2, instead of providing the ID of the virtual router (VR)
where NVE underlay lookups will occur as in Spectrum-1, the ID of a
router interface (RIF) in this VR is required.

Expose functions to create and destroy such a RIF.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a5040a90 19-Jan-2019 Nir Dotan <nird@mellanox.com>

mlxsw: spectrum_router: Add GRE tunnel support for Spectrum-2

Spectrum-2 GRE tunnel implementation requires a specific underlay RIF that
points to the virtual router used for forwarding the encapsulated packet.

Add Spectrum-2 specific loopback router interface creation methods which
may create or reuse the dedicated underlay RIF.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 311596f5 19-Jan-2019 Nir Dotan <nird@mellanox.com>

mlxsw: spectrum_router: Update tunnel decap properties

Spectrum-2 requires to specify the egress RIF when setting tunnel decap
properties. Add a method for accessing the underlay RIF index and then use
it when setting decap properties.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 73b8f493 19-Jan-2019 Nir Dotan <nird@mellanox.com>

mlxsw: spectrum_router: Support RIF without device

Spectrum-2 underlay RIF is merely an auxiliary RIF that points to the
virtual router used for encapsulated packets lookup. It exists only when
its overlay RIF exists but may be shared with other overlay RIFs.
Hence it is undesired to mark any device as related to it.

Therefore allow usage of NULL device when allocating RIF.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 33c04afe 19-Jan-2019 Nir Dotan <nird@mellanox.com>

mlxsw: spectrum_router: Change mlxsw_sp_ipip_lb_ul_vr_id()

For the sake of Spectrum-2 GRE support, as ul_vr_id field is reserved for
Spectrum-2, Change mlxsw_sp_ipip_lb_ul_vr_id() implementation not to use
the reserved field.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 25f844dd 19-Jan-2019 Nir Dotan <nird@mellanox.com>

mlxsw: spectrum_router: Add underlay RIF ID support

Spectrum-2 GRE tunnels underlay should be given not only the virtual router
information for an encapsulated packet lookup, but also an underlay RIF
object which belongs to a virtual router.

Therefore add ul_rif_id field in struct mlxsw_sp_rif_ipip_lb, to be used
later in Spectrum-2 underlay RIF implementation. This field complements
ul_vr_id field, already present and defined as reserved for Spectrum-2.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a04563e4 19-Jan-2019 Nir Dotan <nird@mellanox.com>

mlxsw: spectrum_router: Mark RIF index as taken before creation

The presence of an allocated RIF in mlxsw_sp->router->rifs[rif_index] marks
that rif_index as taken.
Set the marking of a taken RIF to happen before calling ops->create in
order to allow creation of a GRE underlay RIF, which may be allocated and
created as part of an overlay RIF creation.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3c747500 19-Jan-2019 Nir Dotan <nird@mellanox.com>

mlxsw: spectrum_router: Adjust loopback RIF configuration

In Spectrum-2, the underlay routing table is pointed by an underlay router
interface in contrary to Spectrum where only an underlay virtual router
should be set. That makes the underlay virtual router field in RITR
reserved for Spectrum-2.

Change loopback RIF creation function to support the new underlay RIF
field, however leave this field reserved for Spectrum-1 updates.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1f5b2303 19-Jan-2019 Nir Dotan <nird@mellanox.com>

mlxsw: spectrum: Set RIF ops per ASIC type

Set RIF ops array as member of mlxsw_sp in order to control which RIF
operations callbacks are called per ASIC type. This is needed to control
per ASIC handling of loopback RIF configurations.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 99974468 19-Jan-2019 Nir Dotan <nird@mellanox.com>

mlxsw: spectrum_router: Split RIF ops array for Spectrum-2 support

Split RIF ops array for Spectrum-1 and Spectrum-2 callbacks in order to
support different sets of operations for loopback RIF handling, as
underlying implementation differs between the ASICs.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# afba3e10 19-Jan-2019 Nir Dotan <nird@mellanox.com>

mlxsw: reg: Add fields to RITR - Router Interface Table Register

Add fields relevant for Spectrum-2 Loopback IPinIP router interface
creation. Add additional Loopback RIF protocol value - Generic, used for
creation of an explicit underlay RIF, and also add a field named
underlay_rif used for specifying the underlay RIF of a tunnel.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6685987c 16-Jan-2019 Petr Machata <petrm@mellanox.com>

switchdev: Add extack argument to call_switchdev_notifiers()

A follow-up patch will enable vetoing of FDB entries. Make it possible
to communicate details of why an FDB entry is not acceptable back to the
user.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a2d2a205 20-Dec-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Replace hard-coded default VID with a define

Subsequent patches are going to replace the current default VID (1) with
VLAN_N_VID - 1 (4095).

Prepare for this conversion by replacing the hard-coded '1' with a
define.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f40be47a 20-Dec-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Do not force specific configuration order

In symmetric routing, the only two members in the VLAN corresponding to
the L3 VNI are the router port and the VXLAN tunnel.

In case the VXLAN device is already enslaved to the bridge and only
later the VLAN interface is configured, the tunnel will not be
offloaded.

The reason for this is that when the router interface (RIF)
corresponding to the VLAN interface is configured, it calls the core
fid_get() API which does not check if NVE should be enabled on the FID.

Instead, call into the bridge code which will check if NVE should be
enabled on the FID.

This effectively means that the same code path is used to retrieve a FID
when either a local port or a router port joins the FID.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b61cd7c6 18-Dec-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Hold a reference on RIF's netdev

Previous patches tried to make RIF deletion more robust and avoid
use-after-free situations.

As another precaution, hold a reference on a RIF's netdev and release it
when the RIF is deleted.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 965fa8e6 18-Dec-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Make RIF deletion more robust

In the past we had multiple instances where RIFs were not properly
deleted.

One of the reasons for leaking a RIF was that at the time when IP
addresses were flushed from the respective netdev (prompting the
destruction of the RIF), the netdev was no longer a mlxsw upper. This
caused the inet{,6}addr notification blocks to ignore the NETDEV_DOWN
event and leak the RIF.

Instead of checking whether the netdev is our upper when an IP address
is removed, we can instead check if the netdev has a RIF configured.

To look up a RIF we need to access mlxsw private data, so the patch
stores the notification blocks inside a mlxsw struct. This then allows
us to use container_of() and extract the required private data.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 21ffedb6 18-Dec-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Propagate 'struct mlxsw_sp' further

Next patch is going to make RIF deletion more robust by removing
reliance on fragile mlxsw_sp_lower_get(). This is because a netdev is
not necessarily our upper anymore when its IP addresses are flushed.

The inet{,6}addr notification blocks are going to resolve 'struct
mlxsw_sp' using container_of(), but the functions they call still use
mlxsw_sp_lower_get().

As a preparation for the next patch, propagate 'struct mlxsw_sp' down to
the functions called from the notification blocks and remove reliance on
mlxsw_sp_lower_get().

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 32fd4b49 18-Dec-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Do not destroy RIFs based on FID's reference count

Currently, when a RIF is constructed on top of a FID, the RIF increments
the FID's reference count and the RIF is destroyed when the FID's
reference count drops to 1. This effectively means that when no local
ports are member in the FID, the FID is destroyed regardless if the
router port is a member in the FID or not.

The above can lead to the unexpected behavior in which routes using a
VLAN interface as their nexthop device are no longer offloaded after the
last local port leaves the corresponding VLAN (FID).

Example:
# ip -4 route show dev br0.10
192.0.2.0/24 proto kernel scope link src 192.0.2.1 offload
# bridge vlan del vid 10 dev swp3
# ip -4 route show dev br0.10
192.0.2.0/24 proto kernel scope link src 192.0.2.1

After the patch, the route is offloaded before and after the VLAN is
removed from local port 'swp3', as the RIF corresponding to 'br0.10'
continues to exists.

In order to remove RIFs' reliance on the underlying FID's reference
count, we need to add a reference count to sub-port RIFs, which are RIFs
that correspond to physical ports and their uppers (e.g., LAG devices).

In this case, each {Port, VID} ('struct mlxsw_sp_port_vlan') needs to
hold a reference on the RIF. For example:

bond0.10
|
bond0
|
+-------+
| |
swp1 swp2

Both {Port 1, VID 10} and {Port 2, VID 10} will hold a reference on the
RIF corresponding to 'bond0.10'. When the last reference is dropped, the
RIF will be destroyed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 74bc9939 13-Dec-2018 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Veto unsupported RIF MAC addresses

On NETDEV_PRE_CHANGEADDR, if the change is related to a RIF interface,
verify that it satisfies the criterion that all RIF interfaces have the
same MAC address prefix, as indicated by mlxsw_sp.mac_mask.

Additionally, besides explicit address changes, check that the address
of an interface for which a RIF is about to be added matches the
required pattern as well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9735f2d2 13-Dec-2018 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Generalize mlxsw_sp_netdevice_router_port_event()

Prepare mlxsw_sp_netdevice_router_port_event() for handling of
NETDEV_PRE_CHANGEADDR. Split out the part that deals with the actual
changes and call it for the two events currently handled.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# da93d291 06-Dec-2018 Nir Dotan <nird@mellanox.com>

mlxsw: spectrum_router: Relax GRE decap matching check

GRE decap offload is configured when local routes prefix correspond to the
local address of one of the offloaded GRE tunnels. The matching check was
found to be too strict, such that for a flat GRE configuration, in which
the overlay and underlay traffic share the same non-default VRF, decap flow
was not offloaded.

Relax the check for decap flow offloading. A match occurs if the local
address of the tunnel matches the local route address while both share the
same VRF table.

Fixes: 4607f6d26950 ("mlxsw: spectrum_router: Support IPv4 underlay decap")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c2e7490c 25-Nov-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Flip driver to use emulated 802.1Q FIDs

Replace 802.1Q FIDs and VLAN RIFs with their emulated counterparts.

The emulated 802.1Q FIDs are actually 802.1D FIDs and thus use the same
flood tables, of per-FID type. Therefore, add 4K-1 entries to the
per-FID flood tables for the new FIDs and get rid of the FID-offset
flood tables that were used by the old 802.1Q FIDs.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ba6da02a 25-Nov-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Introduce emulated VLAN RIFs

Router interfaces (RIFs) constructed on top of VLAN-aware bridges are of
"VLAN" type, whereas RIFs constructed on top of VLAN-unaware bridges of
"FID" type.

In other words, the RIF type is derived from the underlying FID type.
VLAN RIFs are used on top of 802.1Q FIDs, whereas FID RIFs are used on
top of 802.1D FIDs.

Since the previous patch emulated 802.1Q FIDs using 802.1D FIDs, this
patch emulates VLAN RIFs using FID RIFs.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4cf178d7 17-Oct-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Configure matching local routes for NVE decap

When a local route that matches the source IP of an offloaded NVE tunnel
is notified, the driver needs to program it to perform NVE decapsulation
instead of merely trapping packets to the CPU.

This patch complements "mlxsw: spectrum_router: Enable local routes
promotion to perform NVE decap" where existing local routes were
promoted to perform NVE decapsulation.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 88782f75 17-Oct-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Allow querying VR ID based on table ID

In the device, different VRFs (routing tables) are represented using
different virtual routers (VRs) and thus the kernel's table IDs are
mapped to VR IDs.

Allow internal users of the IP router to query the VR ID based on a
kernel table ID.

This is needed - for example - when configuring the underlay VR where
VxLAN encapsulated packets will undergo an L3 lookup. In this case, the
kernel's table ID is derived from the VxLAN device's configuration.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0c69e0fc 17-Oct-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Enable local routes promotion to perform NVE decap

When an NVE tunnel with an IP underlay (e.g., VxLAN) is configured the
local route to the tunnel's source IP needs to be promoted to perform
NVE decapsulation.

Expose an API in the unicast IP router to promote / demote local routes.

The case where a local route is configured after the creation of the NVE
tunnel will be handled in a subsequent patch in the set.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 602b74ed 24-Aug-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge

When a bridge device is removed, the VLANs are flushed from each
configured port. This causes the ports to decrement the reference count
on the associated FIDs (filtering identifier). If the reference count of
a FID is 1 and it has a RIF (router interface), then this RIF is
destroyed.

However, if no port is member in the VLAN for which a RIF exists, then
the RIF will continue to exist after the removal of the bridge. To
reproduce:

# ip link add name br0 type bridge vlan_filtering 1
# ip link set dev swp1 master br0
# ip link add link br0 name br0.10 type vlan id 10
# ip address add 192.0.2.0/24 dev br0.10
# ip link del dev br0

The RIF associated with br0.10 continues to exist.

Fix this by iterating over all the bridge device uppers when it is
destroyed and take care of destroying their RIFs.

Fixes: 99f44bb3527b ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9948a064 09-Aug-2018 Jiri Pirko <jiri@mellanox.com>

mlxsw: Replace license text with SPDX identifiers and adjust copyrights

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 64953423 31-Jul-2018 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Handle sysctl_ip_fwd_update_priority

This sysctl setting controls whether packet priority should be updated
after forwarding. Configure RGCR.usp accordingly so that the device is
in sync with the kernel handling.

Note that RGCR doesn't allow changing arbitrary parameters
mid-operation, however "usp" is exempt and can be reconfigured.

Also react to NETEVENT_IPV4_FWD_UPDATE_PRIORITY_UPDATE notifications
that signify change in this configuration.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1f65a33f 31-Jul-2018 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum: Extract work-scheduling into a new function

The boilerplate to schedule NETEVENT_IPV4_MPATH_HASH_UPDATE and
NETEVENT_IPV6_MPATH_HASH_UPDATE handling is almost equivalent to that of
NETEVENT_IPV4_FWD_UPDATE_PRIORITY_UPDATE that's coming in the next
patch. The only difference is which actual worker function should be
called. Extract this boilerplate into a named function in order to allow
reuse.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c3a49540 14-Jul-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Optimize processing of VRRP MACs

Hosts using a VRRP router send their packets with a destination MAC of
the VRRP router which is of the following form [1]:

IPv4 - 00-00-5E-00-01-{VRID}
IPv6 - 00-00-5E-00-02-{VRID}

Where VRID is the ID of the virtual router. Such packets are directed to
the router block in the ASIC by an FDB entry that was added in the
previous patch.

However, in certain cases it is possible to skip this FDB lookup and
send such packets directly to the router. This is accomplished by adding
these special MAC addresses to the RIF cache. If the cache is hit, the
packet will skip the L2 lookup and ingress the router with the RIF
specified in the cache entry.

1. https://tools.ietf.org/html/rfc5798#section-7.3

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2db99378 14-Jul-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Direct macvlans' MACs to router

An IP packet received on a netdev with a macvlan upper whose MAC matches
the packet's destination MAC will be re-injected to the Rx path as if it
was received by the macvlan, and perform an L3 lookup.

Reflect this functionality to the ASIC by programming FDB entries that
will direct MACs of macvlan uppers to the router.

In a similar fashion to router interfaces (RIFs) that are programmed
upon the addition of the first IP address on an interface and destroyed
upon the removal of the last IP address, the FDB entries for the macvlan
are added and destroyed based on the addition of the first and removal
of the last IP address on the macvlan.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c5516185 14-Jul-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Enable macvlan upper devices

In order to allow more unicast MAC addresses (e.g., VRRP virtual MAC) to
be directed to the router we need to enable macvlan uppers on top of
mlxsw netdevs.

Allow macvlan upper devices on top of mlxsw netdevs and sanitize
configurations that can't work. For example, a macvlan can't be enslaved
to a bridge as without ACLs the device doesn't take the destination MAC
into account when classifying a packet to a bridge instance (i.e., a
FID).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0304c005 08-Jul-2018 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_kvdl: Pass entry_count to free function

For the Spectrum-2 KVD linear manager implementation, entry_count will be
needed even for the free function. So pass it down.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4b6b1869 08-Jul-2018 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_kvdl: Pass entry type to alloc/free

Future Spectrum-2 KVD linear manager implementation needs to know type
of the entry to alloc and free. So define the types in an enum and
pass it down to alloc and free functions. Once the entry type
is passed down, KVDL common part knows sizes of each entry types,
so replace size function arg with entry count.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# be9c64b1 06-Jul-2018 Arnd Bergmann <arnd@arndb.de>

mlxsw: spectrum_router: avoid uninitialized variable access

When CONFIG_BRIDGE_VLAN_FILTERING is disabled, gcc correctly points out
that the 'vid' variable is uninitialized whenever br_vlan_get_pvid
returns an error:

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c: In function 'mlxsw_sp_rif_vlan_fid_get':
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:6881:6: error: 'vid' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This changes the condition check to always return -EINVAL here,
which I guess is what the author intended here.

Fixes: e6f1960ae6c7 ("mlxsw: spectrum_router: Allocate FID according to PVID")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 33bd5ac5 03-Jul-2018 David Ahern <dsahern@gmail.com>

net/ipv6: Revert attempt to simplify route replace and append

NetworkManager likes to manage linklocal prefix routes and does so with
the NLM_F_APPEND flag, breaking attempts to simplify the IPv6 route
code and by extension enable multipath routes with device only nexthops.

Revert f34436a43092 and these followup patches:
6eba08c3626b ("ipv6: Only emit append events for appended routes").
ce45bded6435 ("mlxsw: spectrum_router: Align with new route replace logic")
53b562df8c20 ("mlxsw: spectrum_router: Allow appending to dev-only routes")

Update the fib_tests cases to reflect the old behavior.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: David Ahern <dsahern@gmail.com>


# a28b1ebe 25-Jun-2018 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Add mlxsw_sp_rif_fid()

In order to allow querying of the VID for which a RIF was created, add
a new function that returns a FID for a given RIF.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0c41292b 25-Jun-2018 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Publish mlxsw_sp_rif_find_by_dev()

In order to guard against removal of a PVID for which a FID was
allocated, spectrum_switchdev needs to first determine whether there is
a RIF associated with a given bridge. To that end, publish a preexisting
function mlxsw_sp_rif_find_by_dev().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e6f1960a 25-Jun-2018 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Allocate FID according to PVID

For bridge netdevices, instead of assuming that the router traffic is on
VLAN 1, look at the bridge PVID.

This patch assumes that the PVID doesn't change after the router
interface is created (i.e. after the IP address is assigned).

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5f15e257 25-Jun-2018 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Propagate extack to .fid_get()

In the follow-up patch, mlxsw_sp_rif_vlan_fid_get() will be changed in a
way that could fail. Give that function a possibility to explain the
failure through extack.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ce45bded 15-Jun-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Align with new route replace logic

Commit f34436a43092 ("net/ipv6: Simplify route replace and appending
into multipath route") changed the IPv6 route replace logic so that the
first matching route (i.e., same metric) is replaced.

Have mlxsw replace the first matching route as well.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 53b562df 15-Jun-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Allow appending to dev-only routes

Commit f34436a43092 ("net/ipv6: Simplify route replace and appending
into multipath route") changed the IPv6 route append logic so that
dev-only routes can be appended and not only gatewayed routes.

Align mlxsw with the new behaviour.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a15a1b0 21-May-2018 David Ahern <dsahern@gmail.com>

mlxsw: spectrum_router: Add support for route append

Handle append for gateway based routes. Dev-only multipath routes will
be handled by a follow on patch.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 50d10711 02-May-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Return an error for routes added after abort

We currently do not perform accounting in the driver and thus can't
reject routes before resources are exceeded.

However, in order to make users aware of the fact that routes are no
longer offloaded we can return an error for routes configured after the
abort mechanism was triggered.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6290182b 02-May-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Return an error for non-default FIB rules

Since commit 9776d32537d2 ("net: Move call_fib_rule_notifiers up in
fib_nl_newrule") it is possible to forbid the installation of
unsupported FIB rules.

Have mlxsw return an error for non-default FIB rules in addition to the
existing extack message.

Example:
# ip rule add from 198.51.100.1 table 10
Error: mlxsw_spectrum: FIB rules not supported.

Note that offload is only aborted when non-default FIB rules are already
installed and merely replayed during module initialization.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 93c2fb25 18-Apr-2018 David Ahern <dsahern@gmail.com>

net/ipv6: Rename fib6_info struct elements

Change the prefix for fib6_info struct elements from rt6i_ to fib6_.
rt6i_pcpu and rt6i_exception_bucket are left as is given that they
point to rt6_info entries.

Rename only; not functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8d1c802b 17-Apr-2018 David Ahern <dsahern@gmail.com>

net/ipv6: Flip FIB entries to fib6_info

Convert all code paths referencing a FIB entry from
rt6_info to fib6_info.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5e670d84 17-Apr-2018 David Ahern <dsahern@gmail.com>

net/ipv6: Move nexthop data to fib6_nh

Introduce fib6_nh structure and move nexthop related data from
rt6_info and rt6_info.dst to fib6_nh. References to dev, gateway or
lwtstate from a FIB lookup perspective are converted to use fib6_nh;
datapath references to dst version are left as is.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 64ed1b9e 26-Mar-2018 Yuval Mintz <yuvalm@mellanox.com>

mlxsw: spectrum_router: Process IP6MR fib notification

Following previous patches driver is ready to handle notifications
arriving from ip6mr - start processing those when they arrive following
the same manner ipmr currently goes through.

This should enable driver to start offloading ipv6 multicast routes.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eb35da0c 26-Mar-2018 Yuval Mintz <yuvalm@mellanox.com>

mlxsw: spectrum_router: Make IPMR-related APIs family agnostic

spectrum_router and spectrum_mr have several APIs that are used to
manipulate configurations originating from ipmr fib notifications.
Following previous patches all the protocol-specifics that are necessary
for the configuration are hidden within spectrum_mr. This allows us to
clean the API and make sure that other than choosing the mr_table based
on the fib notification family, spectrum_router wouldn't care about the
source of the notification when passing it onward to spectrum_mr.

This would later allow us to leverage the same code for fib
notifications originating from ip6mr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9742f866 26-Mar-2018 Yuval Mintz <yuvalm@mellanox.com>

mlxsw: spectrum_router: Support IPv6 multicast to host CPU

A step toward offloading IPv6 routing, this adds an additional
multicast routing table meant for IPv6 [with its underlying TCAM
region] and populates the default rule for IPv6 multicast packets.

Following this, ingress IPv6 multicast packets would be trapped and
delivered to the host CPU.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8c13af2a 26-Mar-2018 Yuval Mintz <yuvalm@mellanox.com>

ip6mr: Add refcounting to mfc

Since ipmr and ip6mr are using the same mr_mfc struct at their core, we
can now refactor the ipmr_cache_{hold,put} logic and apply refcounting
to both ipmr and ip6mr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 54c4cad9 26-Mar-2018 Yuval Mintz <yuvalm@mellanox.com>

ipmr: Make MFC fib notifiers common

Like vif notifications, move the notifier struct for MFC as well as its
helpers into a common file; Currently they're only used by ipmr.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 68c3cd92 22-Mar-2018 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Handle MTU change of GRE netdevs

Update MTU of overlay loopback in accordance with the setting on the
tunnel netdevice.

Fixes: 0063587d3587 ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 22b99058 22-Mar-2018 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Move mlxsw_sp_rif_ipip_lb_op()

Move the function so that it can be called without forward declaration
from a function that will be added in a follow-up patch.

Fixes: 0063587d3587 ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 918ee507 11-Mar-2018 Petr Machata <petrm@mellanox.com>

net: ipv6: Introduce ip6_multipath_hash_policy()

In order to abstract away access to the
ipv6.sysctl.multipath_hash_policy variable, which is not available on
systems compiled without IPv6 support, introduce a wrapper function
ip6_multipath_hash_policy() that falls back to 0 on non-IPv6 systems.

Use this wrapper from mlxsw/spectrum_router instead of a direct
reference.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5e18b9c55 02-Mar-2018 David Ahern <dsahern@gmail.com>

mlxsw: spectrum_router: Add support for ipv6 hash policy update

Similar to 28678f07f127d ("mlxsw: spectrum_router: Update multipath hash
parameters upon netevents") for IPv4, make sure the kernel and asic are
using the same hash algorithm for path selection.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3192dac6 02-Mar-2018 David Ahern <dsahern@gmail.com>

net: Rename NETEVENT_MULTIPATH_HASH_UPDATE

Rename NETEVENT_MULTIPATH_HASH_UPDATE to
NETEVENT_IPV4_MPATH_HASH_UPDATE to denote it relates to a change
in the IPv4 hash policy.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 803335ac 27-Feb-2018 Petr Machata <petrm@mellanox.com>

mlxsw: Handle config changes pertinent to SPAN

For some netdevices, for which mlxsw offloads mirroring, may have a
complex relationship between the declared intent and low-level
device configuration.

Trying to accurately track which changes might influence offloading
decisions is finicky and error-prone. Instead, this patch introduces a
function mlxsw_sp_span_entry_respin, which re-queries the configuration
anew and, if different, removes the existing offloads and installs new
ones.

Call this function strategically at event handlers that might influence
the mirroring configuration.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d1c95af3 16-Feb-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Do not unconditionally clear route offload indication

When mlxsw replaces (or deletes) a route it removes the offload
indication from the replaced route. This is problematic for IPv4 routes,
as the offload indication is stored in the fib_info which is usually
shared between multiple routes.

Instead of unconditionally clearing the offload indication, only clear
it if no other route is using the fib_info.

Fixes: 3984d1a89fe7 ("mlxsw: spectrum_router: Provide offload indication using nexthop flags")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6c677750 13-Feb-2018 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum: Use NL_SET_ERR_MSG_MOD

Use NL_SET_ERR_MSG_MOD helper which adds the module name instead
of specifying the prefix each time.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e437f3b6 13-Feb-2018 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum: Distinguish between IPv4/6 tunnels

struct ip_tunnel_parm, where GRE and several other tunnel types hold
information, is IPv4-specific. The current router / ipip code in mlxsw
however uses it as if it were generic.

Make it clear that it's not. Rename many functions from _params_ to
_params4_. mlxsw_sp_ipip_parms_saddr() and _daddr() take a proto
argument to dispatch on it. Move the dispatch logic to
mlxsw_sp_ipip_netdev_saddr() and _daddr(), and replace with
single-protocol functions.

In struct mlxsw_sp_ipip_entry, move the "parms" field to a (for the time
being, singleton) union. Update users throughout.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0f2d2b27 13-Feb-2018 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Fix error path in mlxsw_sp_vr_create

Since mlxsw_sp_fib_create() and mlxsw_sp_mr_table_create()
use ERR_PTR macro to propagate int err through return of a pointer,
the return value is not NULL in case of failure. So if one
of the calls fails, one of vr->fib4, vr->fib6 or vr->mr4_table
is not NULL and mlxsw_sp_vr_is_used wrongly assumes
that vr is in use which leads to crash like following one:

[ 1293.949291] BUG: unable to handle kernel NULL pointer dereference at 00000000000006c9
[ 1293.952729] IP: mlxsw_sp_mr_table_flush+0x15/0x70 [mlxsw_spectrum]

Fix this by using local variables to hold the pointers and set vr->*
only in case everything went fine.

Fixes: 76610ebbde18 ("mlxsw: spectrum_router: Refactor virtual router handling")
Fixes: a3d9bc506d64 ("mlxsw: spectrum_router: Extend virtual routers with IPv6 support")
Fixes: d42b0965b1d4 ("mlxsw: spectrum_router: Add multicast routes notification handling functionality")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1ecdaea0 24-Jan-2018 Yuval Mintz <yuvalm@mellanox.com>

mlxsw: spectrum_router: Don't log an error on missing neighbor

Driver periodically samples all neighbors configured in device
in order to update the kernel regarding their state. When finding
an entry configured in HW that doesn't show in neigh_lookup()
driver logs an error message.
This introduces a race when removing multiple neighbors -
it's possible that a given entry would still be configured in HW
as its removal is still being processed but is already removed
from the kernel's neighbor tables.

Simply remove the error message and gracefully accept such events.

Fixes: c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Fixes: 60f040ca11b9 ("mlxsw: spectrum_router: Periodically dump active IPv6 neighbours")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2b52ce02 22-Jan-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Remove unnecessary prefix lengths from LPM tree

In commit fc922bb0dd94 ("mlxsw: spectrum_router: Use one LPM tree for
all virtual routers") I tried to make sure only used prefix lengths are
present in the LPM tree shared between all virtual routers.

However, this optimization had to be removed in commit a69518cf0b4c
("mlxsw: spectrum_router: Avoid expensive lookup during route removal"),
since determining the used prefix lengths required us to traverse all
the active virtual routers, which could result in a hung task depending
on the number of VRFs and whether routes were removed due to abort or
not.

Re-introduce the optimization by moving the prefix usage accounting from
the virtual routers to the LPM tree, as this accounting is only used in
order to determine the tree's structure.

To make the sharing of the trees more explicit, the two trees (for IPv4
and IPv6) are stored in the shared router struct and upon the creation
of a virtual router it is immediately bound to both.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3aad95df 22-Jan-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Pass FIB node to LPM tree unlink function

Next patch will try to optimize the LPM tree and make sure only used
prefix lengths are present, to avoid unnecessary look-ups.

Pass the currently removed FIB node to the unlinking function as its
associated prefix length is a potential candidate for removal from the
tree.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4fd00312 22-Jan-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Use the nodes list as indication for empty FIB

Currently, each FIB (IPv4 / IPv6) in a virtual router holds a prefix
usage that is used to choose a matching LPM tree, but also to check if
the FIB is empty, so that the LPM tree could be unbound.

Next patches will remove the reliance on the per-FIB prefix usage for
LPM tree matching. Keeping it only to check if the FIB is empty is a
waste, since we can use the nodes ({Prefix, Length}) list instead.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ed604c5d 18-Jan-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Free LPM tree upon failure

When a new LPM tree is created, we try to replace the trees in the
existing virtual routers with it. If we fail, the tree needs to be
freed.

Currently, this does not happen in the unlikely case where we fail to
bind the tree to the first virtual router, since its reference count
never transitions from 1 to 0.

Fix that by taking a reference before binding the tree.

Fixes: fc922bb0dd94 ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 48276a29 13-Jan-2018 Yuval Mintz <yuvalm@mellanox.com>

mlxsw: spectrum_router: Configure default routing priority

When routing ip packets, the kernel is setting the SKB's priority
based on the tos field of the packet.
Imitate this behavior in the mlxsw router, having the internal
switch priority of a routed packet determined according to its DS
field.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3743d88a 12-Jan-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add support for IPv6 non-equal-cost multipath

Since commit eb789980d0aa ("mlxsw: spectrum_router: Populate adjacency
entries according to weights") the driver includes support for
non-equal-cost multipath, but IPv4 nexthops were the only user.

Now that the kernel supports weighted IPv6 nexthops, we can extend the
driver to support it as well.

This is done by assigning each nexthop its configured weight, so that it
will be populated accordingly in the device's adjacency table. The
`weight` parameter is also taken into account when comparing nexthop
groups in order not to consolidate non-identical groups.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8764a826 25-Dec-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Fix NULL pointer deref

When we remove the neighbour associated with a nexthop we should always
refuse to write the nexthop to the adjacency table. Regardless if it is
already present in the table or not.

Otherwise, we risk dereferencing the NULL pointer that was set instead
of the neighbour.

Fixes: a7ff87acd995 ("mlxsw: spectrum_router: Implement next-hop routing")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8ba6b30e 17-Dec-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Remove batch neighbour deletion causing FW bug

This reverts commit 63dd00fa3e524c27cc0509190084ab147ecc8ae2.

RAUHT DELETE_ALL seems to trigger a bug in FW. That manifests by later
calls to RAUHT ADD of an IPv6 neighbor to fail with "bad parameter"
error code.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Fixes: 63dd00fa3e52 ("mlxsw: spectrum_router: Add batch neighbour deletion")
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 09dbf629 28-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Update nexthop RIF on update

The function mlxsw_sp_nexthop_rif_update() walks the list of nexthops
associated with a RIF, and updates the corresponding entries in the
switch. It is used in particular when a tunnel underlay netdevice moves
to a different VRF, and all the nexthops are migrated over to a new RIF.
The problem is that each nexthop holds a reference to its RIF, and that
is not updated. So after the old RIF is gone, further activity on these
nexthops (such as downing the underlay netdevice) dereferences a
dangling pointer.

Fix the issue by updating rif of impacted nexthops before calling
mlxsw_sp_nexthop_rif_update().

Fixes: 0c5f1cd5ba8c ("mlxsw: spectrum_router: Generalize __mlxsw_sp_ipip_entry_update_tunnel()")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d97cda5f 28-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Handle encap to demoted tunnels

Some tunnels that are offloadable on their own can nonetheless be
demoted to slow path if their local address is in conflict with that of
another tunnel. When a route is formed for such a tunnel,
mlxsw_sp_nexthop_ipip_init() fails to find the corresponding IPIP entry,
and that triggers a FIB abort.

Resolve the problem by not assuming that a tunnel for which
mlxsw_sp_ipip_ops.can_offload() holds also automatically has an IPIP
entry.

Fixes: af641713e97d ("mlxsw: spectrum_router: Onload conflicting tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cab43d9c 28-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Demote tunnels on VRF migration

The mlxsw driver currently doesn't offload GRE tunnels if they have the
same local address and use the same underlay VRF. When such a situation
arises, the tunnels in conflict are demoted to slow path.

However, the current code only verifies this condition on tunnel
creation and tunnel change, not when a tunnel is moved to a different
VRF. When the tunnel has no bound device, underlay and overlay are the
same. Thus moving a tunnel moves the underlay as well, and that can
cause local address conflict.

So modify mlxsw_sp_netdevice_ipip_ol_vrf_event() to check if there are
any conflicting tunnels, and demote them if yes.

Fixes: af641713e97d ("mlxsw: spectrum_router: Onload conflicting tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 57c77ce4 28-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Offload decap only for up tunnels

When a new local route is added, an IPIP entry is looked up to determine
whether the route should be offloaded as a tunnel decap or as a trap.
That decision should take into account whether the tunnel netdevice in
question is actually IFF_UP, and only install a decap offload if it is.

Fixes: 0063587d3587 ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 63dd00fa 12-Nov-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add batch neighbour deletion

In commit 4a3c67a6e7cd ("mlxsw: spectrum_router: Don't batch neighbour
deletion") I removed the support for batch deletion of neighbours on a
router interface (RIF) since at that time the firmware did not support
it for IPv6 neighbours.

This is now supported by the version enforced by the driver, so there is
no reason to delete neighbours one by one anymore.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 44b0fff1 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Handle down of tunnel underlay

When the bound device of a tunnel device is down, encapsulated packets
are not egressed anymore, but tunnel decap still works. Extend
mlxsw_sp_nexthop_rif_update() to take IFF_UP into consideration when
deciding whether a given next hop should be offloaded.

Because the new logic was added to mlxsw_sp_nexthop_rif_update(), this
fixes the case where a newly-added tunnel has a down bound device, which
would previously be fully offloaded. Now the down state of the bound
device is noted and next hops forwarding to such tunnel are not
offloaded.

In addition to that, notice NETDEV_UP and NETDEV_DOWN of a bound device
to force refresh of tunnel encap route offloads.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4cf04f3f 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum: Handle NETDEV_CHANGE on L3 tunnels

Changes to L3 tunnel netdevices (through `ip tunnel change' as well as
`ip link set') lead to NETDEV_CHANGE being generated on the tunnel
device. Because what is relevant for the tunnel in question depends on
the tunnel type, handling of the event is dispatched to the IPIP module
through a newly-added interface mlxsw_sp_ipip_ops.ol_netdev_change().

IPIP tunnels now remember the last set of tunnel parameters in struct
mlxsw_sp_ipip_entry.parms, and use it to figure out what exactly has
changed.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 61481f2f 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum: Support IPIP underlay VRF migration

When a bound device of a tunnel netdevice changes VRF, the loopback RIF
that backs the tunnel needs to be updated and existing encapsulating
routes need to be refreshed.

Note that several tunnels can share the same bound device, in which case
all the impacted tunnels need to be updated.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# af641713 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Onload conflicting tunnels

The approach for offloading IP tunnels implemented currently by mlxsw
doesn't allow two tunnels that have the same local IP address in the
same (underlay) VRF. Previously, offloads were introduced on demand as
encap routes were formed. When such a route was created that would cause
offload of a conflicting tunnel, mlxsw_sp_ipip_entry_create() would
detect it and return -EEXIST, which would propagate up and cause FIB
abort.

Now however IPIP entries are created as soon as an offloadable netdevice
is created, and the failure prevents creation of such device.
Furthermore, if the driver is installed at the point where such
conflicting tunnels exist, the failure actually prevents successful
modprobe.

Furthermore, follow-up patches implement handling of NETDEV_CHANGE due
to the local address change. However, NETDEV_CHANGE can't be vetoed. The
failure merely means that the offloads weren't updated, but the change
in Linux configuration is not rolled back. It is thus desirable to have
a robust way of handling these conflicts, which can later be reused for
handling NETDEV_CHANGE as well.

To fix this, when a conflicting tunnel is created, instead of failing,
simply pull the old tunnel to slow path and reject offloading the
new one.

Introduce two functions: mlxsw_sp_ipip_entry_demote_tunnel() and
mlxsw_sp_ipip_demote_tunnel_by_saddr() to handle this. Make them both
public, because they will be useful later on in this patchset.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4526cc8a 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Fix saddr deduction in mlxsw_sp_ipip_entry_create()

When trying to determine whether there are other offloaded tunnels with
the same local address, mlxsw_sp_ipip_entry_create() should look for a
tunnel with matching UL protocol, matching saddr, in the same VRF.
However instead of taking into account the UL protocol of the tunnel
netdevice (which mlxsw_sp_ipip_entry_saddr_matches() then compares to
the UL protocol of inspected IPIP entry), it deduces the UL protocol
from the inspected IPIP entry (and that's compared to itself).

This is currently immaterial, because only one tunnel type is offloaded,
and therefore the UL protocol always matches, but introducing support
for a tunnel with IPv6 underlay would uncover this error.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0c5f1cd5 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Generalize __mlxsw_sp_ipip_entry_update_tunnel()

The work that needs to be done to update HW configuration in response to
changes is similar to what __mlxsw_sp_ipip_entry_update_tunnel() already
does, but with a number of twists: each change requires a different
subset of things to happen. Extend the function to support all these
uses, and allow finely-grained configuration of what should happen at
each call through a suite of function arguments.

Publish the updated function to allow use from the spectrum_ipip module.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 65a6121b 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Extract __mlxsw_sp_ipip_entry_update_tunnel()

The work that's done by mlxsw_sp_netdevice_ipip_ol_vrf_event() is a good
basis for a more versatile function that would take care of all sorts of
tunnel updates requests: __mlxsw_sp_ipip_entry_update_tunnel(). Extract
that function. Factor out a helper mlxsw_sp_ipip_entry_ol_lb_update() as
well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7e75af63 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum: Propagate extack for tunnel events

The function mlxsw_sp_rif_create() takes an extack parameter. So far,
for creation of loopback interfaces, NULL was passed. For some events
however the extack can be extracted and passed along. So do that for
NETDEV_CHANGEUPPER handler.

Use the opportunity to update the type of info argument that
mlxsw_sp_netdevice_ipip_ol_event() takes. Follow-up patches will
introduce handling of more changes, and some of them carry an extack as
well, but in an info structure of a different type. Though not strictly
erroneous (the pointer could be cast whichever way), it makes no sense
to pretend the value is always of a certain type, when in fact it isn't.
So change the prototype of the above-mentioned function as well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 47518ca5 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Extract mlxsw_sp_ipip_entry_ol_up_event()

The piece of logic to promote decap route, if any, is useful for generic
tunnel updates, not just for handling of NETDEV_UP events on tunnel
interfaces. Extract it to a separate function.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6d4de445 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Make mlxsw_sp_netdevice_ipip_ol_up_event() void

This function only ever returns 0, so don't pretend it returns anything
useful and just make it void.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a3fe198e 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Extract mlxsw_sp_ipip_entry_ol_down_event()

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 474f0ff6 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum: Move mlxsw_sp_ipip_netdev_{s, d}addr{, 4}()

These functions ideologically belong to the IPIP module, and some
follow-up work will benefit from their presence there.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cafdb2a0 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Extract mlxsw_sp_netdevice_ipip_can_offload()

Some of the code down the road needs this logic as well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 796ec776 03-Nov-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum: Rename IPIP-related netdevice handlers

To distinguish between events related to tunnel device itself and its
bound device, rename a number of functions related to handling tunneling
netdevice events to include _ol_ (for "overlay") in the name. That
leaves room in the namespace for underlay-related functions, which would
have _ul_ in the name.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 28678f07 02-Nov-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Update multipath hash parameters upon netevents

Make sure the device and the kernel are performing the multipath hash
according to the same parameters by updating the device whenever the
relevant netevent is generated.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# af658b6a 02-Nov-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Align multipath hash parameters with kernel's

Up until now we used the hardware's defaults for multipath hash
computation. This patch aligns the hardware's multipath parameters with
the kernel's.

For IPv4 packets, the parameters are determined according to the
'fib_multipath_hash_policy' sysctl during module initialization. In case
L3-mode is requested, only the source and destination IP addresses are
used. There is no special handling of ICMP error packets.

In case L4-mode is requested, a 5-tuple is used: source and destination
IP addresses, source and destination ports and IP protocol. Note that
the layer 4 fields are not considered for fragmented packets.

For IPv6 packets, the source and destination IP addresses are used, as
well as the flow label and the next header fields.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ceb8881d 02-Nov-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Properly name netevent work struct

The struct containing the work item queued from the netevent handler is
named after the only event it is currently used for, which is neighbour
updates.

Use a more appropriate name for the struct, as we are going to use it
for more events.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 48fac885 02-Nov-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Embed netevent notifier block in router struct

We are going to need to respond to netevents notifying us about
multipath hash updates by configuring the device's hash parameters.

Embed the netevent notifier in the router struct so that we could
retrieve it upon notifications and use it to configure the device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1f279233 27-Oct-2017 David Ahern <dsahern@gmail.com>

mlxsw: spectrum_router: Return extack message on abort due to fib rules

Adding a FIB rule on a spectrum platform silently aborts FIB offload:
$ ip ru add pref 99 from all to 192.168.1.1 table 10
$ dmesg -c
[ 623.144736] mlxsw_spectrum 0000:03:00.0: FIB abort triggered. Note that FIB entries are no longer being offloaded to this device.

This patch reworks FIB rule handling to return a message to the user:
$ ip ru add pref 99 from all to 8.8.8.8 table 11
Error: spectrum: FIB rules not supported. Aborting offload.

spectrum currently only checks whether the fib rule is a default rule or
an l3mdev rule, both of which it knows how to handle. Any other it aborts
FIB offload. Move the processing to check the rule type inline with the
user request. If the rule is an unsupported one, then a work queue entry
is used to abort the offload. Change the rule delete handling to just
return since it does nothing at the moment.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eb789980 22-Oct-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Populate adjacency entries according to weights

Up until now the driver assumed all the nexthops have an equal weight
and wrote each to a single adjacency entry.

This patch takes the `weight` parameter into account and populates the
adjacency group according to the relative weight of each nexthop.

Specifically, the weights of all the nexthops that should be offloaded
are first normalized and then used to calculate the upper adjacency
index of each nexthop. This is done according to the hash-threshold
algorithm used by the kernel for IPv4 multi-path routing.

Adjacency groups are currently limited to 32 entries which limits the
weights that can be used, but follow-up patches will introduce groups of
512 entries.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 425a08c6 22-Oct-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Prepare for large adjacency groups

The device has certain restrictions regarding the size of an adjacency
group.

Have the router determine the size of the adjacency group according to
available KVDL allocation sizes and these restrictions.

This was not needed until now since only allocations of up 32 entries
were supported and these are all valid sizes for an adjacency group.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 408bd946 22-Oct-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Store weight in nexthop struct

As the first step towards non-equal-cost multi-path support, store each
nexthop's weight.

For IPv6 nexthops always set the weight to 1, as it only supports ECMP.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e69cd9d7 22-Oct-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_dpipe: Add adjacency group size

The adjacency group size is part of the match on the adjacency group and
should therefore be exposed using dpipe.

When non-equal-cost multi-path support will be introduced, the group's
size will help users understand the exact number of adjacency entries
each nexthop occupies, as a nexthop will no longer correspond to a
single entry.

The output for a multi-path route with two nexthops, one with weight 255
and the second 1 will be:

Example:

$ devlink dpipe table dump pci/0000:01:00.0 name mlxsw_adj
pci/0000:01:00.0:
index 0
match_value:
type field_exact header mlxsw_meta field adj_index value 65536
type field_exact header mlxsw_meta field adj_size value 512
type field_exact header mlxsw_meta field adj_hash_index value 0
action_value:
type field_modify header ethernet field destination mac value e4:1d:2d:a5:f3:64
type field_modify header mlxsw_meta field erif_port mapping ifindex mapping_value 3 value 1

index 1
match_value:
type field_exact header mlxsw_meta field adj_index value 65536
type field_exact header mlxsw_meta field adj_size value 512
type field_exact header mlxsw_meta field adj_hash_index value 510
action_value:
type field_modify header ethernet field destination mac value e4:1d:2d:a5:f3:65
type field_modify header mlxsw_meta field erif_port mapping ifindex mapping_value 4 value 2

Thus, the first nexthop occupies 510 adjacency entries and the second 2,
which leads to a ratio of 255 to 1.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dcbda282 20-Oct-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Configure TIGCR on init

Spectrum tunnels do not default to ttl of "inherit" like the Linux ones
do. Configure TIGCR on router init so that the TTL of tunnel packets is
copied from the overlay packets.

Fixes: ee954d1a91b2 ("mlxsw: spectrum_router: Support GRE tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3c75f9b1 18-Oct-2017 David Ahern <dsahern@gmail.com>

spectrum: Convert fib event handlers to use container_of on info arg

Use container_of to convert the generic fib_notifier_info into
the event specific data structure.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f8fa9b4e 18-Oct-2017 David Ahern <dsahern@gmail.com>

mlxsw: spectrum_router: Add extack message for RIF and VRF overflow

Add extack argument down to mlxsw_sp_rif_create and mlxsw_sp_vr_create
to set an error message on RIF or VR overflow. Now on overflow of
either resource the user gets an informative message as opposed to
failing with EBUSY.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 89d5dd2e 18-Oct-2017 David Ahern <dsahern@gmail.com>

mlxsw: spectrum: router: Add support for address validator notifier

Add support for inetaddr_validator and inet6addr_validator. The
notifiers provide a means for validating ipv4 and ipv6 addresses
before the addresses are installed and on failure the error
is propagated back to the user.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4cccb737 16-Oct-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum: Drop refcounting of IPIP entries

Formerly, IPIP entries were created lazily by next hops that referenced
an offloadable IP-in-IP netdevice. However now that they are created
eagerly as a reaction to events on such netdevices, the reference
counting is useless. Hence drop it.

The routes whose next hops reference an offloaded IP-in-IP netdevice
actually linger around a bit after their device is unregistered.
However, mlxsw_sp_ipip_entry_destroy() also destroys the backing
loopback, and mlxsw_sp_rif_destroy() transitively (via
mlxsw_sp_nexthop_rif_gone_sync()) calls mlxsw_sp_nexthop_ipip_fini(),
which unlinks the IPIP entry from a next hop. Thus no dangling pointers
are left behind for the brief window after netdevice is gone, but routes
not yet.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f63ce4e5 16-Oct-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum: Support IPIP overlay VRF migration

IPIP entries are created as soon as an offloadable device is created.
That means that when such a device is later moved to a different VRF,
the loopback device that backs the tunnel is wrong.

Thus when an offloadable encapsulating netdevice moves from one VRF to
another, make sure that the loopback is updated as necessary.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0063587d 16-Oct-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum: Support decap-only IP-in-IP tunnels

Current code for offloading IP-in-IP tunneling assumes that there is no
decap without encap. But that's never true for IPv6 overlays, and is not
true for IPv4 ones either, if net.ipv4.conf.*.rp_filter is unset.

To support decap-only tunnels, an IPIP entry is now created as soon as
an offloadable tunneling device is created. When that netdevice is up'd,
a decap route is looked up and possibly offloaded. Thus decap is not
handled implicitly as part of mlxsw_sp_ipip_entry_get() call anymore,
but needs to be done explicitly after the get, if desired.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6698c168 16-Oct-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Move mlxsw_sp_netdev_ipip_type()

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b35750f1 09-Oct-2017 Yotam Gigi <yotamg@mellanox.com>

mlxsw: spectrum: router: Export the mlxsw_sp_router_port function

In Spectrum hardware, the router port is a virtual port that is the gateway
to the routing mechanism. Hence, in order for a packet to be L3 forwarded,
it must first be L2 forwarded to the router port inside the hardware.

Further patches in this patchset are going to introduce support in bridge
device used as an mrouter port. In this case, the router port index will be
needed in order to update the MDB entries to include the router port. Thus,
export the mlxsw_sp_router_port function, which returns the index of the
Spectrum router port.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a69518cf 08-Oct-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Avoid expensive lookup during route removal

In commit fc922bb0dd94 ("mlxsw: spectrum_router: Use one LPM tree for
all virtual routers") I increased the scale of supported VRFs by having
all of them share the same LPM tree.

In order to avoid look-ups for prefix lengths that don't exist, each
route removal would trigger an aggregation across all the active virtual
routers to see which prefix lengths are in use and which aren't and
structure the tree accordingly.

With the way the data structures are currently laid out, this is a very
expensive operation. When preformed repeatedly - due to the invocation
of the abort mechanism - and with enough VRFs, this can result in a hung
task.

For now, avoid this optimization until it can be properly re-added in
net-next.

Fixes: fc922bb0dd94 ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 85f44a15 01-Oct-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Drop a redundant condition

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7ff176f8 01-Oct-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Fix a typo

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# de0f43c0 01-Oct-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Track RIF of IPIP next hops

When considering whether to set RTNH_F_OFFLOAD flag on an IPv6 route,
mlxsw_sp_fib6_entry_offload_set() looks up the mlxsw_sp_nexthop
corresponding to a given route, and decides based on whether the next
hop's offloaded flag was set. When looking for the matching next hop, it
also takes into account the device of the route, which must match next
hop's RIF.

IPIP next hops however hitherto didn't set the RIF. As a result, IPv6
routes forwarding traffic to IP-in-IP netdevices are never marked as
offloaded, even when they actually are.

Thus track RIF of IPIP next hops the same way as that of ETHERNET next
hops.

Fixes: 8f28a3097645 ("mlxsw: spectrum_router: Support IPv6 overlay encap")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 28a04c7b 01-Oct-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Move VRF refcounting

When creating a new RIF, bumping RIF count of the containing VR is the
last thing to be done. Symmetrically, when destroying a RIF, RIF count
is first dropped and only then the rest of the cleanup proceeds.

That's a problem for loopback RIFs. Those hold two VR references: one
for overlay and one for underlay. mlxsw_sp_rif_destroy() releases the
overlay one, and the deconfigure() callback the underlay one. But if
both overlay and underlay are the same, and if there are no other
artifacts holding the VR alive, this put actually destroys the VR. Later
on, when mlxsw_sp_rif_destroy() calls mlxsw_sp_vr_put() for the same VR,
the VR will already have been released and the kernel crashes with NULL
pointer dereference.

The underlying problem is that the RIF under destruction ends up
referencing the overlay VR much longer than it claims: all the way until
the call to mlxsw_sp_vr_put(). So line up the reference counting
properly to reflect this. Make corresponding changes in
mlxsw_sp_rif_create() as well for symmetry.

Fixes: 6ddb7426a7d4 ("mlxsw: spectrum_router: Introduce loopback RIFs")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 664375e9 27-Sep-2017 Yotam Gigi <yotamg@mellanox.com>

mlxsw: spectrum: router: Don't ignore IPMR notifications

Make the Spectrum router logic not ignore the RTNL_FAMILY_IPMR FIB
notifications.

Past commits added the IPMR VIF and MFC add/del notifications via the
fib_notifier chain. In addition, a code for handling these notifications in
the Spectrum router logic was added. Make the Spectrum router logic not
ignore these notifications and forward the requests to the Spectrum
multicast router offloading logic.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fd890fe9 27-Sep-2017 Yotam Gigi <yotamg@mellanox.com>

mlxsw: spectrum: Notify multicast router on RIF MTU changes

Due to the fact that multicast routes hold the minimum MTU of all the
egress RIFs and trap packets that don't meet it, notify the mulitcast
router code on RIF MTU changes.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d42b0965 27-Sep-2017 Yotam Gigi <yotamg@mellanox.com>

mlxsw: spectrum_router: Add multicast routes notification handling functionality

Add functionality for calling the multicast routing offloading logic upon
MFC and VIF add and delete notifications. In addition, call the multicast
routing upon RIF addition and deletion events.

As the multicast routing offload logic may sleep, the actual calls are done
in a deferred work. To ensure the MFC object is not freed in that interval,
a reference is held to it. In case of a failure, the abort mechanism is
used, which ejects all the routes from the hardware and triggers the
traffic to flow through the kernel.

Note: At that stage, the FIB notifications are still ignored, and will be
enabled in a further patch.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7e50d435 27-Sep-2017 Yotam Gigi <yotamg@mellanox.com>

mlxsw: spectrum: router: Squash the default route table to main

Currently, the mlxsw Spectrum driver offloads only either the RT_TABLE_MAIN
FIB table or the VRF tables, so the RT_TABLE_LOCAL table is squashed to the
RT_TABLE_MAIN table to allow local routes to be offloaded too.

By default, multicast MFC routes which are not assigned to any user
requested table are put in the RT_TABLE_DEFAULT table.

Due to the fact that offloading multicast MFC routes support in Spectrum
router logic is going to be introduced soon, squash the default table to
MAIN too.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 427e652a 25-Sep-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_dpipe: Add support for controlling nexthop counters

Add support for controlling nexthop counters via dpipe.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a5390278 25-Sep-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum: Add support for setting counters on nexthops

Add support for setting counters on nexthops based on dpipe's adjacency
table counter status. This patch also adds the ability for getting the
counter value, which will be used by the dpipe adjacency table dump
implementation in the next patches.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c556cd28 25-Sep-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Add helpers for nexthop access

This is done as a preparation before introducing the ability to dump the
adjacency table via dpipe, and to count the table size. The current table
implementation avoids tunnel entries, thus a helper for checking if
the nexthop group contains tunnel entries is also provided. The mlxsw's
nexthop representative struct stays private to the router module.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ec2437f4 25-Sep-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Use helper to check for last neighbor

Use list_is_last helper to check for last neighbor.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dbe4598c 25-Sep-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Keep nexthops in a linked list

Keep nexthops in a linked list for easy access.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 91e4d59a 19-Sep-2017 Yotam Gigi <yotamg@mellanox.com>

mlxsw: spectrum_router: Export RIF dev access function

The mlxsw_sp_rif struct, defined as private struct in spectrum_router.c
will be used in the multicast router source file. Due to the fact that the
dev field will be needed by the multicast router logic, add an access
function to it.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8e29f979 15-Sep-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Only handle IPv4 and IPv6 events

The driver doesn't support events from address families other than IPv4
and IPv6, so ignore them. Otherwise, we risk queueing a work item before
it's initialized.

This can happen in case a VRF is configured when MROUTE_MULTIPLE_TABLES
is enabled, as the VRF driver will try to add an l3mdev rule for the
IPMR family.

Fixes: 65e65ec137f4 ("mlxsw: spectrum_router: Don't ignore IPv6 notifications")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Andreas Rammhold <andreas@rammhold.de>
Reported-by: Florian Klink <flokli@flokli.de>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ee954d1a 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Support GRE tunnels

This patch introduces callbacks and tunnel type to offload GRE tunnels.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 92107cfb 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Add loopback accessors

struct mlxsw_sp_rif is a router-private structure, and therefore
everything related to it is as well: parameters, and derived RIF types
including loopbacks. IPIP module needs access to some details of
loopback interfaces, but exporting all the RIF shebang would create too
large an interface.

So instead export just the bare minimum necessary: accessors for RIF
index and underlay VRF ID.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1cc38fb1 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Use existing decap route

The local route that points at IPIP's underlay device (decap route) can
be present long before the GRE device. Thus when an encap route is
added, it's necessary to look inside the underlay FIB if the decap route
is already present. If so, the current trap offload needs to be
withdrawn and replaced with a decap offload.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4607f6d2 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Support IPv4 underlay decap

Unlike encapsulation, which is represented by a next hop forwarding to
an IPIP tunnel, decapsulation is a type of local route. It is created
for local routes whose prefix corresponds to the local address of one of
offloaded IPIP tunnels. When the tunnel is removed (i.e. all the encap
next hops are removed), the decap offload is migrated back to a trap for
resolution in slow path.

This patch assumes that decap route is already present when encap route
is added. A follow-up patch will fix this issue.

Note that this patch only supports IPv4 underlay. Support for IPv6
underlay will be subject to follow-up work apart from this patchset.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f28a309 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Support IPv6 overlay encap

Add the missing bits to recognize IPv6 next hops as IPIP ones to enable
offloading of IPv6 overlay encapsulation.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1012b9ac 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Support IPv4 overlay encap

This introduces some common code for tracking of offloaded IP-in-IP
tunnels, and support for offloading IPv4 overlay encapsulating routes in
particular. A follow-up patch will introduce IPv6 overlay as well.

Offloaded tunnels are kept in a linked list of mlxsw_sp_ipip_entry
objects hooked up in mlxsw_sp_router. A network device that represents
the tunnel is used as a key to look up the corresponding IPIP entry.
Note that in the future, more general keying mechanism will be needed,
because parts of the tunnel information can be provided by the route.

IPIP entries are reference counted, because several next hops may end up
using the same tunnel, and we only want to offload it once.

Encapsulation path hooks into next hop handling. Routes that forward to
a tunnel are now considered gateway routes, thus giving them the same
treatment that other remote routes get. An IPIP next hop type is
introduced.

Details of individual tunnel types are kept in an array of
mlxsw_sp_ipip_ops objects. If a tunnel type doesn't match any of the
known tunnel types, the next-hop is not considered an IPIP next hop.

The list of IPIP tunnel types is currently empty, follow-up patches will
add support for GRE. Traffic to IPIP tunnel types that are not
explicitly recognized by the driver traps and is handled in slow path.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 35225e47 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Make nexthops typed

In the router, some next hops may reference an encapsulating netdevice,
such as GRE or IPIP. To properly offload these next hops, mlxsw needs to
keep track of whether a given next hop is a regular Ethernet entry, or
an IP-in-IP tunneling entry.

To facilitate this book-keeping, add a type field to struct
mlxsw_sp_nexthop. There is, as of this patch, only one next hop type:
MLXSW_SP_NEXTHOP_TYPE_ETH. Follow-up patches will introduce the IP-in-IP
variant.

There are several places where next hops are initialized in the IPv4
path. Instead of replicating the logic at every one of them, factor it
out to a function mlxsw_sp_nexthop4_type_init(). The corresponding fini
is actually protocol-neutral, so put it to mlxsw_sp_nexthop_type_fini(),
but create a corresponding protocoled _fini function that dispatches to
the protocol-neutral one.

The IPv6 path is simpler, but for symmetry with IPv4, create the same
suite of functions with corresponding logic.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f6050ee6 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Extract mlxsw_sp_rt6_is_gateway()

IPv6 counterpart of the previous patch: introduce a function to
determine whether a given route is a gateway route.

The new function takes a mlxsw_sp argument which follow-up patches will
use. Thus mlxsw_sp_fib6_entry_type_set() got that argument as well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9b01451a 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Extract mlxsw_sp_fi_is_gateway()

For IPv4 IP-in-IP offload, routes that direct traffic to IP-in-IP
devices need to be considered gateway routes as well. That involves a
bit more logic, so extract the current test to a separate function,
where the logic can be later added.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6ddb7426 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Introduce loopback RIFs

When offloading L3 tunnels, an adjacency entry is created that loops the
packet back into the underlay router. Loopback interfaces then hold the
corresponding information and are created for IP-in-IP netdevices.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 010cadf9 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Support FID-less RIFs

Loopback RIFs, which will be introduced in a follow-up patch, differ
from other RIFs in that they do not have a FID associated with them.

To support this, demote FID allocation from mlxsw_sp_rif_create to
configure op of the existing RIF types, and likewise the FID release
from mlxsw_sp_rif_destroy to deconfigure op.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 38ebc0f4 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Add mlxsw_sp_ipip_ops

Details of individual tunnel types are kept in an array of
mlxsw_sp_ipip_ops objects. Follow-up patches will use the list to
determine whether a constructed RIF should be a loopback, and to decide
whether a next hop references a tunnel.

The list is currently empty, follow-up patches will add support for GRE.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ff1f06ce 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Publish mlxsw_sp_l3proto

The spectrum_ipip module that will be introduced in the follow-up
patches needs to know the data type.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 89e41982 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: reg: Give mlxsw_reg_ratr_pack a type parameter

To support IPIP, the driver needs to be able to construct an IPIP
adjacency. Change mlxsw_reg_ratr_pack to take an adjacency type as an
argument. Adjust the one existing caller.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9571e828 02-Sep-2017 Petr Machata <petrm@mellanox.com>

mlxsw: reg: Extract mlxsw_reg_ritr_mac_pack()

Unlike other interface types, loopback RIFs do not have MAC address. So
drop the corresponding argument from mlxsw_reg_ritr_pack() and move it
to a new function. Call that from callers of mlxsw_reg_ritr_pack.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 241bc859 01-Sep-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Set abort trap in all virtual routers

When the abort mechanism is invoked a default route directing packets to
the CPU is programmed in all the virtual routers currently in use. This
can result in packet loss in case a new VRF is configured.

Upon abort, program the default route in all virtual routers, whether
they are in use or not.

The patch is directed at net-next since post-abort fixes aren't critical
and packet loss due to a missing default route will be insignificant
compared to packet loss caused by the CPU port policer.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d3b6d377 01-Sep-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Trap packets hitting anycast routes

I relied on the fact that anycast routes use the loopback device as
their nexthop device to trap packets hitting them to the CPU.

After commit 4832c30d5458 ("net: ipv6: put host and anycast routes on
device with address") this is no longer the case and such routes are
programmed with a forward action (note the 'offload' flag):

anycast cafe:: dev enp3s0np7 proto kernel metric 0 offload pref medium

This will prevent the router from locally receiving packets destined to
the Subnet-Router anycast address.

Fix this by specifically programming anycast routes with action trap,
which results in the following output:

anycast cafe:: dev enp3s0np7 proto kernel metric 0 pref medium

Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1ed5574c 31-Aug-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Add support for setting counters on IPv6 neighbors

Add support for setting counters on IPv6 neighbors based on dpipe's host6
table counter status.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0250768c 31-Aug-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Add IPv6 neighbor access helper

Add helper for accessing destination IP in case of IPv6 neighbor.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1d1056d8 31-Aug-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Export IPv6 link local address check helper

Neighbors with link local addresses are not offloaded to the host table,
yet, the are maintained in the driver for adjacency table usage. When
dumping the IPv6 host neighbors this link local neighbors should be
ignored. This patch exports this helper for dpipe usage.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a481d713 24-Aug-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_dpipe: Add support for controlling neighbor counters

Add support for controlling neighbor counters via dpipe.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7cfcbc75 24-Aug-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Add support for setting counters on neighbors

Add support for setting counters on neighbors based on dpipe's host table
counter status. This patch also adds the ability for getting the counter
value, which will be used by the dpipe host table implementation in the
next patches.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f17cc84d 24-Aug-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Add helpers for neighbor access

This is done as a preparation before introducing the ability to dump the
host table via dpipe, and to count the table size. The mlxsw's neighbor
representative struct stays private to the router module.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# df9a21f1 15-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Use correct config option

I made an embarrassing mistake and used 'IPV6' instead of 'CONFIG_IPV6'
around the function that updates the kernel about IPv6 neighbours
activity. This can be a problem if the kernel has more neighbours than a
certain threshold and it starts deleting those that are supposedly
inactive.

Fixes: b5f3e0d43012 ("mlxsw: spectrum_router: Fix build when IPv6 isn't enabled")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fe400799 15-Aug-2017 Ido Schimmel <idosch@mellanox.com>

ipv6: fib: Provide offload indication using nexthop flags

IPv6 routes currently lack nexthop flags as in IPv4. This has several
implications.

In the forwarding path, it requires us to check the carrier state of the
nexthop device and potentially ignore a linkdown route, instead of
checking for RTNH_F_LINKDOWN.

It also requires capable drivers to use the user facing IPv6-specific
route flags to provide offload indication, instead of using the nexthop
flags as in IPv4.

Add nexthop flags to IPv6 routes in the 40 bytes hole and use it to
provide offload indication instead of the RTF_OFFLOAD flag, which is
removed while it's still not part of any official kernel release.

In the near future we would like to use the field for the
RTNH_F_{LINKDOWN,DEAD} flags, but this change is more involved and might
not be ready in time for the current cycle.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e6f3b379 14-Aug-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Add support for nexthop group consolidation for IPv6

Due to limited ASIC resources the maximum number of routes is limited by
the nexthop resource. In order to improve the routing scale nexthop
consolidation should be performed.

This patch adds support for IPv6 neighbor consolidation. The hash value
is calculated based on the nexthop set, by performing bitwise xor on the
ifindexs of the nexthops, in a similar way to IPv4's kernel implementation.
In case of collision a full match is performed between the sets which
include address and ifindex comparison.

Non gateway nexthop groups are not inserted to the hash table due to
lack of nexthop device (ifindex).

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ba31d366 14-Aug-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Prepare nexthop group's hash table for IPv6

This patch does preparation before introducing IPv6 nexthop group
consolidation. Currently the nexthop group hash table is used only by
IPv4 and uses fixed key size. In order to support the IPv6's variable
length key the current table is changed.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fc922bb0 14-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Use one LPM tree for all virtual routers

The number of LPM trees available for lookup is much smaller than the
number of virtual routers, which are used to implement VRFs. In
addition, an LPM tree can only be used by one protocol - either IPv4 or
IPv6.

Therefore, in order to increase the number of supported virtual routers
to the maximum we need to be able to share LPM trees across virtual
routers instead of trying to find an optimized tree for each.

Do that by allocating one LPM tree for each protocol, but make sure it
will only include prefixes that are actually used, so as to not perform
unnecessary lookups.

Since changing the structure of a bound tree isn't recommended, whenever
a new tree it required, it's first created and then bound to each
virtual router, replacing the old one.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0adb214b 14-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Pass argument explicitly

Instead of relying on the LPM tree to be assigned to the virtual router
before binding the two, lets pass it explicitly.

This will later allow us to return upon binding error instead of having
to perform a rollback of the assignment.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cc702670 14-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Return void from deletion functions

There is no point in returning a value from function whose return value
is never checked.

Even if the return value was checked, there wouldn't be anything to do
about it, as these functions are either called from error or deletion
paths.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 65e65ec1 03-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Don't ignore IPv6 notifications

We now have all the necessary IPv6 infrastructure in place, so stop
ignoring these notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f36f5ac6 03-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Abort on source-specific routes

Without resorting to ACLs, the device performs route lookup solely based
on the destination IP address.

In case source-specific routing is needed, an error is returned and the
abort mechanism is activated, thus allowing the kernel to take over
forwarding decisions.

Instead of aborting, we can trap specific destination prefixes where
source-specific routes are present, but this will result in a lot more
code that is unlikely to ever be used.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0a7fd1ac 03-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add support for route replace

In case we got a replace event, then the replaced route must exist. If
the route isn't capable of multipath, then replace first matching
non-multipath capable route.

If the route is capable of multipath and matching multipath capable
route is found, then replace it. Otherwise, replace first matching
non-multipath capable route.

The new route is inserted before the replaced one. In case the replaced
route is currently offloaded, then it's overwritten in the device's table
by the new route and later deleted, thus not impacting routed traffic.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 428b851f 03-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add support for IPv6 routes addition / deletion

Allow directly connected and remote unicast IPv6 routes to be programmed
to the device's tables.

As with IPv4, identical routes - sharing the same destination prefix -
are ordered in a FIB node according to their table ID and then the
metric. While the kernel doesn't share the same trie for the local and
main table, this does happen in the device, so ordering according to
table ID is needed.

Since individual nexthops can be added and deleted in IPv6, each FIB
entry stores a linked list of the rt6_info structs it represents. Upon
the addition or deletion of a nexthop, a new nexthop group is allocated
according to the new configuration and the old one is destroyed.
Identical groups aren't currently consolidated, but will be in a
follow-up patchset.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 583419fd 03-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Sanitize IPv6 FIB rules

We only allow FIB offload in the presence of default rules or an l3mdev
rule. In a similar fashion to IPv4 FIB rules, sanitize IPv6 rules.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 66a5763a 03-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Demultiplex FIB event based on family

The FIB notification block currently only handles IPv4 events, but we
want to start handling IPv6 events soon, so lay the groundwork now.

Do that by preparing the work item and process it according to the
notified address family.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 64e5e825 03-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Ignore address families other than IPv4

We're about to add IPv6 notifications in the FIB notification chain, but
the driver currently doesn't support these, so ignore them.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 04b1d4e5 03-Aug-2017 Ido Schimmel <idosch@mellanox.com>

net: core: Make the FIB notification chain generic

The FIB notification chain is currently soley used by IPv4 code.
However, we're going to introduce IPv6 FIB offload support, which
requires these notification as well.

As explained in commit c3852ef7f2f8 ("ipv4: fib: Replay events when
registering FIB notifier"), upon registration to the chain, the callee
receives a full dump of the FIB tables and rules by traversing all the
net namespaces. The integrity of the dump is ensured by a per-namespace
sequence counter that is incremented whenever a change to the tables or
rules occurs.

In order to allow more address families to use the chain, each family is
expected to register its fib_notifier_ops in its pernet init. These
operations allow the common code to read the family's sequence counter
as well as dump its tables and rules in the given net namespace.

Additionally, a 'family' parameter is added to sent notifications, so
that listeners could distinguish between the different families.

Implement the common code that allows listeners to register to the chain
and for address families to register their fib_notifier_ops. Subsequent
patches will implement these operations in IPv6.

In the future, ipmr and ip6mr will be extended to provide these
notifications as well.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 77d964e6 02-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Refresh offload indication upon group refresh

Now that we provide offload indication using the nexthop's flags we must
refresh the offload indication whenever the offload state within the
group changes.

This didn't matter until now, as offload indication was provided using
the FIB info flags and multipath routes were marked as offloaded as long
as one of the nexthops was offloaded.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1353ee70 02-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Don't check state when refreshing offload indication

Previous patch removed the reliance on the counter in the FIB info to
set the offload indication, so we no longer need to keep an offload
state on each FIB entry and can just set or unset the RTNH_F_OFFLOAD
flag in each nexthop.

This is also necessary because we're going to need to refresh the
offload indication whenever the nexthop group associated with the FIB
entry is refreshed. Current check would prevent us from marking a newly
resolved nexthop as offloaded if the FIB entry is already marked as
offloaded.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3984d1a8 02-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Provide offload indication using nexthop flags

In a similar fashion to previous patch, use the nexthop flags to provide
offload indication instead of the FIB info's flags.

In case a nexthop in a multipath route can't be offloaded (gateway's MAC
can't be resolved, for example), then its offload flag isn't set.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 213666a3 31-Jul-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Simplify a piece of code

Express the same logic more succinctly.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 56b8a9ed 31-Jul-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Clarify a piece of code

Prefer logical operator that expresses the intent to bitwise one that
happens to give the same result.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f1b1f273 31-Jul-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Simplify a piece of code

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8de3c178 31-Jul-2017 Petr Machata <petrm@mellanox.com>

mlxsw: spectrum_router: Fix a typo

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b5f3e0d4 24-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Fix build when IPv6 isn't enabled

When IPv6 isn't enabled the following error is generated:

ERROR: "nd_tbl" [drivers/net/ethernet/mellanox/mlxsw/mlxsw_spectrum.ko]
undefined!

Fix it by replacing 'arp_tbl' and 'nd_tbl' with 'tbl->family' wherever
possible and reference 'nd_tbl' only when IPV6 is enabled.

Fixes: d5eb89cf68d6 ("mlxsw: spectrum_router: Reflect IPv6 neighbours to the device")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4a3c67a6 21-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Don't batch neighbour deletion

Current firmware supported by the driver doesn't support batch deletion
of IPv6 neighbours on a given router interface (RIF).

Until a new version that supports this functionality is made available,
delete neighbours one by one.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1819ae3d 21-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Don't offload routes next in list

Each FIB node holds a linked list of routes sharing the same prefix and
length. In the case of IPv4 it's ordered according to table ID, metric
and TOS and only the first route in the list is actually programmed to
the device.

In case a gatewayed route is added somewhere in the list, then after its
nexthop group will be refreshed and become valid (due to the resolution
of its gateway), it'll mistakenly overwrite the existing entry.

Example:
192.168.200.0/24 dev enp3s0np3 scope link metric 1000 offload
192.168.200.0/24 via 192.168.100.1 dev enp3s0np3 metric 1000 offload

Both routes are marked as offloaded despite the fact only the first one
should actually be present in the device's table.

When refreshing the nexthop group, don't write the route to the device's
table unless it's the first in its node.

Fixes: 9aecce1c7d97 ("mlxsw: spectrum_router: Correctly handle identical routes")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7dcc18ad 18-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Update prefix count for IPv6

The number of possible prefix lengths for IPv6 is 129 and not 128.

Fixes following warning from UBSAN when /128 routes are offloaded:

UBSAN: Undefined behaviour in
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:2510:27 index 128 is out
of range for type 'long unsigned int [128]'

Fixes: 5e9c16cc83a7 ("mlxsw: spectrum_router: Implement private fib")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 80c238f9 18-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Rename functions to add / delete a FIB entry

These functions aren't specific to IPv4 and can be re-used for IPv6.

Drop the '4' designation from their name.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9efbee6f 18-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Drop unnecessary parameter

Functions that take as argument a FIB entry don't need to take FIB node
as well, as it can be extracted from the entry.

Remove unnecessary FIB node parameter.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0e6ea2a4 18-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Mark IPv4 specific function accordingly

The functions to create and destroy a nexthop group are IPv4 specific
and should be renamed accordingly, so that they won't be confused with
the IPv6 specific functions in follow-up patches.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4f1c7f1f 18-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Create IPv4 specific entry struct

Some of the parameters stored in the FIB entry structure are specific to
IPv4 and therefore better placed in an IPv4 specific structure.

Create an IPv4 specific structure that encapsulates the common FIB entry
structure and contains IPv4 specific parameters.

In a follow-up patchset an IPv6 specific structure will be introduced.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bc65a8a4 18-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Set abort trap for IPv6

When we fail to insert a route we invoke the abort mechanism which
flushes all the tables and inserts a default route in each, so that all
packets incoming to the router will be trapped to the CPU.

Upon abort, add an IPv6 default route to the IPv6 tables.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9dbf4d76 18-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Allow IPv6 routes to be programmed

Take advantage of previous patch and allow the RALUE register to be
called with IPv6 routes.

In order to re-use as much code as possible between IPv4 and IPv6, only
the lowest-level function that actually does the register packing is
demuxed based on the passed protocol.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a3d9bc50 18-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Extend virtual routers with IPv6 support

A Virtual Router (VR) is an entity which corresponds to a VRF and
performs FIB lookup in an LPM tree according to the {VR, IP Proto} ->
Tree binding.

Extend the virtual router data structure towards IPv6 FIB offload.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 731ea1ca 18-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Make FIB node retrieval family agnostic

A FIB node is an entity which stores routes sharing the same prefix and
length. The data structure itself is already family agnostic, but we
make some of its operations agnostic as well and thus re-use them for
IPv6 offload.

Instead of passing an IPv4-specific structure to fib4_node_get(), pass
general routing parameters and rename the function accordingly.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 160e22aa 18-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Don't create FIB node during lookup

When looking up a FIB entry we shouldn't create the FIB node where it's
supposed to be linked in case the node doesn't already exist.

Instead, lookup the node and fail if it doesn't exist.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 58adf2c4 18-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Don't assume neighbour type

Thankfully, the neighbour subsystem is agnostic to the upper protocol
and used by both IPv4 and IPv6. By removing assumptions regarding the
neighbour type we can thus re-use much of the neighbour-related code for
both IPv4 and IPv6.

For each nexthop, store its gateway IP and for nexthop group store the
neighbour table used by its nexthops.

Use this information throughout the code and remove assumption about the
neighbour type.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a6c9b5d1 18-Jul-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Set activity interval according to both neighbour tables

The neighbours' activity is currently dumped according to the ARP
table's DELAY_PROBE time, but with the introduction of IPv6 offload we
should set the interval according to the minimum between the ARP and
ndisc tables.

Signed-off-by: Arkadi Sharshvesky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 60f040ca 18-Jul-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Periodically dump active IPv6 neighbours

In addition to IPv4, periodically dump IPv6 neighbours and update the
kernel about them.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d5eb89cf 18-Jul-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Reflect IPv6 neighbours to the device

As with IPv4, listen to NEIGH_UPDATE events from the ndisc table and
program relevant neighbours to the device's neighbour table.

Note that neighbours with a link-local IP address aren't programmed, as
packets with a link-local destination IP are trapped after LPM lookup
and never reach the neighbour table.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5ea1237f 18-Jul-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Configure RIFs based on IPv6 addresses

When a netdev is configured with an IP address a router interface (RIF)
should be configured for it in the device. Allow configuration of RIFs
based on IPv6 address notifications as well as IPv4.

Note that the RIF exists as long as an IP address is configured on the
netdev, regardless of the address family.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0d284818 18-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Flood unregistered multicast packets to router

Up until now we only flooded broadcast packets to the router when an L3
interface was configured on top of a bridge. However, IPv6 Neighbour
Discovery packets are trapped to the CPU inside the router and these can
be sent with a multicast address.

Flood unregistered multicast packets to the router port, so that
relevant packets could be trapped there.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e29237e7 18-Jul-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Enable IPv6 router

Before we add IPv6 constructs like traps and router interfaces, we first
need to enable IPv6 routing in the device.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7387dbbc 12-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Fix use-after-free in route replace

While working on IPv6 route replace I realized we can have a
use-after-free in IPv4 in case the replaced route is offloaded and the
only one using its FIB info.

The problem is that fib_table_insert() drops the reference on the FIB
info of the replaced routes which is eventually freed via call_rcu().
Since the driver doesn't hold a reference on this FIB info it can cause
a use-after-free when it tries to clear the RTNH_F_OFFLOAD flag stored
in fi->fib_flags.

After running the following commands in a loop for enough time with a
KASAN enabled kernel I finally got the below trace.

$ ip route add 192.168.50.0/24 via 192.168.200.1 dev enp3s0np3
$ ip route replace 192.168.50.0/24 dev enp3s0np5
$ ip route del 192.168.50.0/24 dev enp3s0np5

BUG: KASAN: use-after-free in mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
Read of size 4 at addr ffff8803717d9820 by task kworker/u4:2/55
[...]
? mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
? mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
? mlxsw_sp_router_neighs_update_work+0x1cd0/0x1ce0 [mlxsw_spectrum]
? mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
__asan_load4+0x61/0x80
mlxsw_sp_fib_entry_offload_unset+0xa7/0x120 [mlxsw_spectrum]
mlxsw_sp_fib_entry_offload_refresh+0xb6/0x370 [mlxsw_spectrum]
mlxsw_sp_router_fib_event_work+0xd1c/0x2780 [mlxsw_spectrum]
[...]
Freed by task 5131:
save_stack_trace+0x16/0x20
save_stack+0x46/0xd0
kasan_slab_free+0x70/0xc0
kfree+0x144/0x570
free_fib_info_rcu+0x2e7/0x410
rcu_process_callbacks+0x4f8/0xe30
__do_softirq+0x1d3/0x9e2

Fix this by taking a reference on the FIB info when creating the nexthop
group it represents and drop it when the group is destroyed.

Fixes: 599cf8f95f22 ("mlxsw: spectrum_router: Add support for route replace")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a4e75b76 12-Jul-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add missing rollback

With this patch the error path of mlxsw_sp_nexthop_init() is symmetric
with mlxsw_sp_nexthop_fini(). Noticed during code review.

Fixes: a8c970142798 ("mlxsw: spectrum_router: Refactor nexthop init routine")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6b27c8ad 28-Jun-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Fix NULL pointer dereference

In case a VLAN device is enslaved to a bridge we shouldn't create a
router interface (RIF) for it when it's configured with an IP address.
This is already handled by the driver for other types of netdevs, such
as physical ports and LAG devices.

If this IP address is then removed and the interface is subsequently
unlinked from the bridge, a NULL pointer dereference can happen, as the
original 802.1d FID was replaced with an rFID which was then deleted.

To reproduce:
$ ip link set dev enp3s0np9 up
$ ip link add name enp3s0np9.111 link enp3s0np9 type vlan id 111
$ ip link set dev enp3s0np9.111 up
$ ip link add name br0 type bridge
$ ip link set dev br0 up
$ ip link set enp3s0np9.111 master br0
$ ip address add dev enp3s0np9.111 192.168.0.1/24
$ ip address del dev enp3s0np9.111 192.168.0.1/24
$ ip link set dev enp3s0np9.111 nomaster

Fixes: 99724c18fc66 ("mlxsw: spectrum: Introduce support for router interfaces")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Petr Machata <petrm@mellanox.com>
Tested-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d7a60306 08-Jun-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Mark only first LPM tree as reserved

In new firmware versions (that we can now enforce via
request_firmware()), only the first LPM tree is reserved and not the
first two as in older versions.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# de5ed99e 04-Jun-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Align RIF index allocation with existing code

The way we usually allocate an index is by letting the allocation
function return an error instead of an invalid index.

Do the same for RIF index.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e4f3c1c1 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Implement common RIF core

The mlxsw driver currently implements three types of RIFs. VLAN and FID
RIFs for L3 interfaces on top of VLAN-aware and VLAN-unaware bridges
(respectively) and Subport RIFs for all other L3 interfaces.

All the RIF types follow a common configuration procedure, which only
differs in the type-specific bits. The patch exploits this fact and
consolidates the common code paths, thereby simplifying the code and
making it more extensible.

This work also prepares the driver for use with future ASICs, where the
range of the Subport RIFs will be extended and their configuration
modified accordingly. By merely implementing a new RIF operations and
selecting it during initialization, the same driver could be re-used.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a1107487 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Implement common FID core

The device supports three types of FIDs. 802.1Q and 802.1D FIDs for
VLAN-aware and VLAN-unaware bridges (respectively) and rFIDs to
transport packets to the router block.

The different users (e.g., bridge, router, ACLs) of the FIDs
infrastructure need not know about the internal FIDs implementation and
can therefore interact with it using a restricted set of exported
functions.

By encapsulating the entire FID logic and hiding it from the rest of the
driver we get a code base that it much simpler and easier to work with
and extend.

For example, in the current Spectrum ASIC only 802.1D FIDs can be
assigned a VNI, but future ASICs will also support 802.1Q FIDs. With
this patch in place, support for future ASICs can be easily added by
implementing a new FID operations according to their capabilities.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c9ec53f0 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Determine VR first when creating RIF

All RIF types are associated with a virtual router (VR), so determine VR
first when creating a RIF.

That way, we can more easily integrate the common RIF core in the
following patches.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8e3482d6 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Flood packets to router after RIF creation

If a packet ingress the router but can't be assigned an ingress RIF,
it's dropped.

Therefore, in the case of RIF configured on top of a bridge, it makes
sense to start flooding broadcast packets to the router only after the
RIF was created.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1b8f09a0 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Destroy RIF only based on its struct

Now that all the information to create a RIF is contained within the RIF
struct itself, we can also simplify the destruction logic.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ab01ae91 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Configure RIFs based on RIF struct

All the information necessary for the configuration of RIFs can now be
found in the RIF struct itself, so reduce the arguments list.

This gets us one step closer to the common RIF core.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4d93ceeb 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Extend the RIF struct

Currently, when a Subport RIF is configured, the LAG status and VLAN of
the underlying port are read from the port itself. This is problematic,
as we would like to have common code to configure all types of RIFs,
which aren't necessarily bound to a port.

Instead, embed the RIF in a struct specific to the Subport type, which
contains all the necessary information.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a13a594d 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Allocate RIF prior to its configuration

In the following patches the RIF's configuration function is going to
expect a RIF struct with all the necessary information.

Therefore, allocate the RIF just before it's configured to the device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# caa3ddf8 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Allocate FID prior to RIF configuration

The following patches are going to re-arrange the FID and RIF code, so
that when the RIF is configured to the device based on the information
present in the RIF struct (which points to a FID).

For this reason, move the FID allocation to just before the RIF
configuration.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c57529e1 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Replace vPorts with Port-VLAN

As explained in the cover letter, since the introduction of the bridge
offload in the mlxsw driver, information related to the offloaded bridge
and bridge ports was stored in the individual port struct,
mlxsw_sp_port.

This lead to a bloated struct storing both physical properties of the
port (e.g., autoneg status) as well as logical properties of an upper
bridge port (e.g., learning, mrouter indication). While this might work
well for simple devices, it proved to be hard to extend when stacked
devices were taken into account and more advanced use-cases (e.g., IGMP
snooping) considered.

This patch removes the excess information from the above struct and
instead stores it in more appropriate structs that represent the bridge
port, the bridge itself and a VLAN configured on the bridge port.

The membership of a port in a bridge is denoted using the Port-VLAN
struct, which points to the bridge port and also member in the bridge
VLAN group of the VLAN it represents. This allows us to completely
remove the vPort abstraction and consolidate many of the code paths
relating to VLAN-aware and unaware bridges.

Note that the FID / vFID code is currently duplicated, but this will
soon go away when the common FID core will be introduced.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ed9ddd3a 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Don't create FIDs upon creation of VLAN uppers

Up until now we used to create FIDs upon the creation of VLAN uppers on
top of the VLAN-aware bridge. This was done so that in case a router
interface (RIF) was configured on top of the bridge, the FID would
already be there.

Instead, simplify the code and only create the FID upon RIF creation.

This is an intermediary step towards the introduction of the common FID
core, in which this code would be completely removed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7cbecf24 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Replace vPorts with Port-VLAN

We're going to get rid of vPorts completely later in the patchset, but
the router code is self-contained, so it's a good candidate to start the
transition with.

Convert all the functions that expects to operate on a vPort to operate
on a Port-VLAN instead.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ce95e154 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Change signature of FID leave function

When a vPort is destroyed, it leaves the FID it's currently mapped to
(if any) and drops the reference. The FID's leave function expects to
get the vPort as its argument, but this will have to change when the
vPort model is retired.

Change the function signature to expect a Port-VLAN struct instead and
patch the call sites accordingly.

The code introduced in this patch will be removed later in the patchset,
but this intermediary step is required in order to ease the code review.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4aafc368 26-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Set port's mode according to FID mappings

We currently transition the port to "Virtual mode" upon the creation of
its first VLAN upper, as we need to classify incoming packets to a FID
using {Port, VID} and not only the VID.

However, it's more appropriate to transition the port to this mode when
the {Port, VID} are actually mapped to a FID. Either during the
enslavement of the VLAN upper to a VLAN-unaware bridge or the
configuration of a router port.

Do this change now in preparation for the introduction of the FID core,
where this operation will be encapsulated.

To prevent regressions, this patch also explicitly configures an OVS
slave to "Virtual mode". Otherwise, a packet that didn't hit an ACL rule
could be classified to an existing FID based on a global VID-to-FID
mapping, thus not incurring a FID mis-classification, which would
otherwise trap the packet to the CPU to be processed by the OVS daemon.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 03ea01e9 23-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Adjust RIF configuration for new firmware versions

In new firmware versions, when configuring a {Port, VID} as a router
interface, the driver is responsible for enabling the STP filter and
disabling learning. Otherwise, packets are discarded.

This change doesn't break existing firmware versions, but is required
for newer firmware versions.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6b1206bb 18-May-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Fix rif counter freeing routine

During rif counter freeing the counter index can be invalid. Add check
of validity before freeing the counter.

Fixes: e0c0afd8aa4e ("mlxsw: spectrum: Support for counters on router interfaces")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 348b8fc3 16-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Initialize RIFs in a separate function

The router interfaces (RIFs) array is currently initialized together
with the general router configuration. However, in a follow-up patchset
we're going to introduce a common RIF core that will require us to
initialize more RIF constructs, so move the RIF initialization to its
own function.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7e39d115 16-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Move FIB notification block to router struct

The FIB notification block logically belongs inside the router specific
struct, so move it there.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5f9efffb 16-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Move RIFs array to its rightful place

The router interfaces (RIFs) array is of no interest to code outside the
routing realm, so declare it inside the router specific struct instead
of the chip-wide one.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5f6935c6 16-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_switchdev: Reduce scope of bridge struct

Some attributes in the global chip struct are only relevant for bridge
operation, so encapsulate them in their own struct that isn't exposed to
non-bridge code.

This will also help us later, when we add more bridge-specific
attributes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9011b677 16-May-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Reduce scope of router struct

In a similar fashion to previous patch, the router structure
('mlxsw_sp_router') doesn't need to be accessible to anyone, but the
router code located at spectrum_router.c

Make this apparent and reduce its scope by defining it there.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b1e45526 30-Apr-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Simplify VRF enslavement

When a netdev is enslaved to a VRF master, its router interface (RIF)
needs to be destroyed (if exists) and a new one created using the
corresponding virtual router (VR).

>From the driver's perspective, the above is equivalent to an inetaddr
event sent for this netdev. Therefore, when a port netdev (or its
uppers) are enslaved to a VRF master, call the same function that
would've been called had a NETDEV_UP was sent for this netdev in the
inetaddr notification chain.

This patch also fixes a bug when a LAG netdev with an existing RIF is
enslaved to a VRF. Before this patch, each LAG port would drop the
reference on the RIF, but would re-join the same one (in the wrong VR)
soon after. With this patch, the corresponding RIF is first destroyed
and a new one is created using the correct VR.

Fixes: 7179eb5acd59 ("mlxsw: spectrum_router: Add support for VRFs")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2b94e58d 18-Apr-2017 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum: Allow ports to work under OVS master

>From now on, a port can become a slave of OVS master. All vlans
are enabled, STP state is set to "forwarding". It is up to the OVS
userspace daemon to setup the flows either in kernel or in HW using TC
flower offload.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fd1b9d41 28-Mar-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Add rif helper functions

Add rif helper function to access the rif index and rif devices ifindex.
This functions will be used by dpipe in order to dump the rif table.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e0c0afd8 28-Mar-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum: Support for counters on router interfaces

Add support for counter allocation on router interfaces. The allocation
depends on the counter state of relevant table. In case the counting is
disabled or no counters left the counter index will be set as invalid.

Also a counter pool for router allocation is added.

Signed-off-by: Arakdi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 13124443 25-Mar-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_kvdl: Cosmetic kvdl allocator API change

Currently the return allocated index and err value are multiplexed.
This patch changes the API to decouple the ret value from the allocated
index.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5ec2ee7d 24-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: Query maximum number of ports from firmware

We currently hard code the maximum number of ports in the driver, but
this may change in future devices, so query it from the firmware
instead.

Fallback to a maximum of 64 ports in case this number can't be queried.
This should only happen in SwitchX-2 for which this number is correct.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8494ab06 24-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Query number of LPM trees from firmware

Instead of hard coding the number of LPM trees in the driver, query it
from the firmware, as it may change in future devices.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bf95233e 17-Mar-2017 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum: Cosmetic naming change

Currently the struct representing router interface "mlxsw_sp_rif"
is reffered as "r" in various places in the driver. Furthermore it
contains a member which specify the index which is called "rif".
This patch change "r" to "rif" and "rif" to "rif_index".

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c7f6e665 16-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Don't abort on l3mdev rules

Now that port netdevs can be enslaved to a VRF master we need to make
sure the device's routing tables won't be flushed upon the insertion of
a l3mdev rule.

Note that we assume the notified l3mdev rule is a simple rule as used by
the VRF master. We don't check for the presence of other selectors such
as 'iif' and 'oif'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3d70e458 16-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add support for VRFs on top of bridges

In a similar fashion to the previous patch, allow bridges and VLAN
devices on top of bridges to be enslaved to a VRF master device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7179eb5a 16-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add support for VRFs

Allow port netdevs, LAG and VLAN devices stacked on top of these to be
enslaved to a VRF master device.

Upon enslavement, create a router interface (RIF) for the enslaved
netdev and associate it with a virtual router (VR) based on the VRF's
table ID.

If a RIF already exists for the netdev (f.e., due to the existence of an
IP address), then it's deleted and a new one is created with the
appropriate VR binding.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9db032bb 16-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Don't destroy RIF if L3 slave

We usually destroy the netdev's router interface (RIF) when the last IP
address is removed from it.

However, we shouldn't do that if it's enslaved to an L3 master device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 57837885 16-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Associate RIFs with correct VR

When a router interface (RIF) is created due to a netdev being enslaved
to a VRF master, then it should be associated with the appropriate
virtual router (VR) and not the default one.

If netdev is a VRF slave, lookup the VR based on the VRF's table ID.
Otherwise default to the MAIN table.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5d7bfd14 16-Mar-2017 Ido Schimmel <idosch@mellanox.com>

ipv4: fib_rules: Dump FIB rules when registering FIB notifier

In commit c3852ef7f2f8 ("ipv4: fib: Replay events when registering FIB
notifier") we dumped the FIB tables and replayed the events to the
passed notification block.

However, we merely sent a RULE_ADD notification in case custom rules
were in use. As explained in previous patches, this approach won't work
anymore. Instead, we should notify the caller about all the FIB rules
and let it act accordingly.

Upon registration to the FIB notification chain, replay a RULE_ADD
notification for each programmed FIB rule, custom or not. The integrity
of the dump is ensured by the mechanism introduced in the above
mentioned commit.

Prevent regressions by making sure current listeners correctly sanitize
the notified rules.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b5d90e6d 10-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Make abort mechanism VR-aware

When the abort mechanism is invoked it binds the first virtual router
(VR) to an LPM tree and inserts a default route to direct packets to the
CPU.

With VRFs, we can have router interfaces (RIFs) bound to multiple VRs,
so we need to make sure packets are trapped from all VRs and not just
the first one.

Upon abort invocation, bind all active VRs to the same LPM tree and
insert a default route in each.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6913229e 10-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Explicitly Associate RIFs with VRs

Up until now we implicitly associated all the router interfaces (RIFs)
with the first virtual router (VR). This must be changed in order to
enable VRF offload. Otherwise, a packet received via a VRF slave would
do a FIB lookup in the same table used by other VRFs.

Instead, bind the RIF to a VR according to the table where FIB lookup
should be performed for packets received via the RIF.

Currently, we only care about the MAIN and LOCAL tables (which we squash
together).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 76610ebb 10-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Refactor virtual router handling

A virtual router (VR) is an entity within the device to which routing
tables and interfaces can be bound to. It can be used to implement VRFs.

In the initial implementation we associated the VR with a specific
protocol (e.g., IPv4) and an LPM tree. However, this isn't really
accurate, as the same VR can be used for both IPv4 and IPv6 traffic, by
binding a different LPM tree to a {VR, Proto} pair.

This patch aims to restructure the VR code according to the above logic,
so that VRs are more accurately represented by the driver's data
structures. The main motivation behind this change is to prepare the
driver for VRF offload.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 382dbb40 10-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Simplify LPM tree allocation

When looking for a new LPM tree we should always consider all the unused
trees. It doesn't matter if the new tree is required due to changes in
currently used prefixes inside an existing routing table or because a
route was inserted into an empty table.

Both cases are functionally identical and therefore should be treated
the same.

When looking for a new LPM tree, consider all unused trees and don't
reserve trees for specific cases.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4724ba56 10-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Place RIF related code with router code

The inetaddr notification block is currently implemented in the main
driver file, but this isn't really appropriate, as it mainly creates and
destroys router interfaces (RIFs) which belong with the rest of the
router code.

This will become even more apparent later on when we'll need to bind
these RIFs to virtual routers according to the VRF's table.

Structure the driver better and prevent unnecessary function exports by
moving the RIF related code with the rest of the router code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 97989ee0 10-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Allow more route types to be programmed

Allow 'unreachable', 'blackhole' and 'prohibit' route types to be
programmed into the device by sending any packet hitting them to the
CPU.

This is needed so that users will be able to program a default route
into the VRF's table, thereby preventing lookup from leaking to other
tables.

Audit the code paths to make sure we don't rely on the presence of a
nexthop netdev, as it doesn't exist for above mentioned route types.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f7df4923 28-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Avoid potential packets loss

When the structure of the LPM tree changes (f.e., due to the addition of
a new prefix), we unbind the old tree and then bind the new one. This
may result in temporary packet loss.

Instead, overwrite the old binding with the new one.

Fixes: 6b75c4807db3 ("mlxsw: spectrum_router: Add virtual router management")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 599cf8f9 09-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add support for route replace

Upon the reception of an ENTRY_REPLACE notification, resolve the FIB
node corresponding to the prefix and length and insert the new route
before the first matching entry.

Since the notification also signals the deletion of the replaced route,
delete it from the driver's cache.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4283bce5 09-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add support for route append

When a new route is appended, it's placed after existing routes sharing
the same parameters (prefix, length, table ID, TOS and priority).

While the device supports only one route with the same prefix and length
in a single table, it's important to correctly place the appended route
in the driver's cache, as when a route is deleted the next one is
programmed into the device.

Following the reception of an ENTRY_APPEND notification, resolve the
FIB node corresponding to the prefix and length and correctly place the
new entry in its entry list.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9aecce1c 09-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Correctly handle identical routes

In the device, routes are indexed in a routing table based on the prefix
and its length. This is in contrast to the kernel's FIB where several
FIB aliases can exist with these parameters being identical. In such
cases, the routes will be sorted by table ID (LOCAL first, then MAIN),
TOS and finally priority (metric).

During lookup, these routes will be evaluated in order. In case the
packet's TOS field is non-zero and a FIB alias with a matching TOS is
found, then it's selected. Otherwise, the lookup defaults to the route
with TOS 0 (if it exists). However, if the requested scope is narrower
than the one found, then the lookup continues.

To best reflect the kernel's datapath we should take the above into
account. Given a prefix and its length, the reflected route will always
be the first one in the FIB alias list. However, if the route has a
non-zero TOS then its action will be converted to trap instead of
forward, since we currently don't support TOS-based routing. If this
turns out to be a real issue, we can add support for that using
policy-based switching.

The route's scope can be effectively ignored as any packet being routed
by the device would've been looked-up using the widest scope (UNIVERSE).

To achieve that we need to do two changes. Firstly, we need to create
another struct (FIB node) that will hold the list of FIB entries sharing
the same prefix and length. This struct will be hashed using these two
parameters.

Secondly, we need to change the route reflection to match the above
logic, so that the first FIB entry in the list will be programmed into
the device while the rest will remain in the driver's cache in case of
subsequent changes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# df6dd79b 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Don't reflect LINKDOWN nexthops

The kernel resolves the nexthops for a given route using
FIB_LOOKUP_IGNORE_LINKSTATE which means a notification can be sent for a
route with one of its nexthops being LINKDOWN.

In case IGNORE_ROUTES_WITH_LINKDOWN is set for the nexthop netdev, then
we shouldn't reflect the nexthop to the device's table.

Once the nexthop netdev's carrier goes up we'll be notified using NH_ADD
and reflect it to the device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9665b745 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Flush resources when RIF is deleted

When the last IP address is removed from a netdev, its RIF is deleted.
However, if user didn't first remove neighbours and nexthops using this
interface, then they would still be present in the device's tables.

Therefore, whenever a RIF is deleted, make sure all the neighbours and
nexthops (adjacency entries) using it are removed from the relevant
tables as well.

The action associated with any route using this RIF would be refreshed,
most likely to trap. If the kernel decides to remove the route (f.e.,
because all the nexthops are now DEAD), then an event would be sent,
causing the route to be removed from the device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ad178c8e 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Reflect nexthop status changes

When a packet hits a multipath route in the device's routing table, a
hash is computed over its headers, which is then used to select the
appropriate nexthop from the device's adjacency table.

There are situations in which the kernel removes a nexthop from a
multipath route (e.g., no carrier) and the device should do the same.

Upon the reception of NH_{ADD,DEL} events, add or remove a nexthop from
the device's adjacency table and refresh all the routes using the
nexthop group. If all the nexthops of a multipath route are invalid,
then any packet hitting the route would be trapped to the CPU for
forwarding.

If all the nexthops are DEAD, then the kernel would remove the route
entirely. On the other hand, if all the nexthops are merely LINKDOWN,
then the kernel would keep the route and forward any incoming packet
using a different route.

While the last case might sound like a problem, it's expected that a
routing daemon running in user space would remove such a route from the
FIB as it's dumped with the DEAD flag set.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 70ad3506 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Use trap action only for some route types

The device can have one of three actions associated with a route:

1) Remote - packets continue to the adjacency table
2) Local - packets continue to the neighbour table
3) Trap - packets continue to the CPU

The first two actions can also trap packets to the CPU, but they do so
using a different trap ID, which has a lower traffic class and less
allotted bandwidth.

We currently use the third action for both RTN_{LOCAL,BROADCAST} routes
and RTN_UNICAST routes not pointing to the switch ports.

However, packets that merely need to be forwarded by the switch are
likely not control packets and can be therefore scheduled towards the
CPU using a lower traffic class.

Achieve the above by assigning the third action only to local and
broadcast routes and have any other route use either of the first two
actions, based on whether the route is gatewayed or not.

This will also allow us to refresh routes using the local action and
have them trap packets when their RIF is no longer valid following a
NH_DEL event.

One side effect of this patch is that we no longer give special
treatment to multipath routes using both switch and non-switch ports
towards their nexthops. If at least one of the nexthops can be resolved,
then the device will forward the packets instead of trapping them.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4b411477 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Determine offload status using generic function

The previous patch introduced a generic function to determine whether a
route should be offloaded or not. Make use of it here.

In the future we're going to add more conditions to this test (e.g.,
whether TOS is non-zero), so it makes sense to centralize it instead of
open coding it in a few places.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 013b20f9 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: More accurately set offload flag

We currently set the RTNH_F_OFFLOAD flag for all routes using remote
action, but this isn't always correct. If none of the nexthops
associated with a gatewayed route can be offloaded into the device, then
any packet hitting it would be trapped to the CPU and forwarded by the
kernel.

Solve this by pushing the setting of the offload flag to after the route
was programmed into the device, thereby allowing us to take all the
parameters into account.

This change will also help us further in the patchset, when we refresh
routes following the reception of NH_{ADD,DEL} events.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a8c97014 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Refactor nexthop init routine

The nexthop init and de-init functions both have symmetric parts
concerned with the reflection of the neighbour entry into the device's
adjacency table, in case it's used by a gatewayed route.

These sections of code also need to be called when a nexthop is marked
as valid / invalid following NH_{ADD,DEL} events. Break these out into
appropriate functions, so that they could be invoked following the
reception of above events.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c8b03077 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Remove FIB info from FIB entry struct

After the previous changes, the FIB info is embedded in every nexthop
group struct, which in turn is embedded in every FIB entry struct.

We can therefore safely remove the FIB info from the entry struct. This
has the added advantage of making the router-related structs more
generic and suitable for use with IPv6 offloads.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b8399a1e 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Store routes in a more generic way

Up until now, the only FIB entries that were associated with a nexthop
group were routes to remote networks where all the nexthop devices had a
valid router interface (RIF). This is in contrast to the FIB code,
where all the routes are associated with a FIB info. The same design
choice needs to be applied to the driver's cache.

Based on the NH_{ADD,DEL} events which will be added later in the
patchset, we need to be able to change the action (forward / trap)
associated with all the routes using the nexthop group. However, if we
can't link between the nexthop and the routes using it, then the above
is impossible.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b3e8d1eb 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add gateway indication to nexthop group

The next patch is going to generalize the way in which we store routes.
Instead of attaching a nexthop group only to gatewayed routes, one will
be attached to each route, in a similar way to the way the FIB code
stores its routes.

The above means that any function operating on a nexthop group cannot
assume the group represents only gatewayed nexthops. One such function
is the one that refreshes a nexthop group and updates the adjacency
table following nexthop changes.

For a nexthop group that doesn't represent any gateways this function
would essentially be a NOP, but it would be useful if it did update the
action associated with any route using it. This will allow us to later
consolidate code paths when a nexthop changes following NH_{ADD,DEL}
events.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d55409cb 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Use nexthop's scope to set action type

We currently use the scope of the FIB info to distinguish between a
direct unicast route and a gatewayed one. However, the kernel is
perfectly happy to configure a route with scope UNIVERSE to a directly
connected network.

Instead, we can rely on the first nexthop's scope to check if the route
is gatewayed or not.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c53b8e1b 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Store nexthops in a hash table

Later in the patchset we'll add the NH_{ADD,DEL} events which will let
us know when a nexthop is considered to be dead. Based on these events
we need to be able to add or remove the nexthop from the device's
tables.

Therefore, store the private nexthop structs in a hash table and use the
kernel's fib_nh struct as the key, so that we'll be able to easily find
them when the events are received.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e9ad5e7d 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Store nexthop groups in a hash table

Currently, when we're notified about a new RTN_UNICAST route we perform
a lookup on the nexthop group list looking for a group with a matching
configuration to that found in the FIB info. This is quite inefficient.

Instead, we can simply rely on the kernel to consolidate several FIB
configurations into the same FIB info and use the FIB info as the key
for our private nexthop group struct.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e58be79e 08-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Nullify nexthop's neigh pointer

When we invalidate a nexthop we should also invalidate its neighbour
entry pointer as it might be destroyed later on. This makes the nexthop
de-init function symmetric with its init and also ensures nobody will
try to access the neighbour entry.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fd76d910 06-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Fix typo in comment

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 01b1aa35 06-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Don't read 'nud_state' without lock

We periodically ask the neighbouring system to try and resolve
neighbours that are used for nexthops, but aren't currently resolved.

However, 'nud_state' is protected by the neighbour lock, so we shouldn't
access it without taking it. Instead, we can simply check the
'connected' field of the neighbour entry, which we update upon
NEIGH_UPDATE events.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8a0b7275 06-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Remove redundant check

We only add neighbour entries that are also used for nexthops to
'nexthop_neighs_list', so when iterating over this list there's no need
to check that the entry is indeed used for nexthops.

Remove the redundant check.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5c8802f1 06-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Simplify neighbour reflection

Up until now we had two interfaces for neighbour related configuration:
ndo_neigh_{construct,destroy} and NEIGH_UPDATE netevents. The ndos were
used to add and remove neighbours from the driver's cache, whereas the
netevent was used to reflect the neighbours into the device's tables.

However, if the NUD state of a neighbour isn't NUD_VALID or if the
neighbour is dead, then there's really no reason for us to keep it
inside our cache. The only exception to this rule are neighbours that
are also used for nexthops, which we periodically refresh to get them
resolved.

We can therefore eliminate the ndo entry point into the driver and
simplify the code, making it similar to the FIB reflection, which is
based solely on events. This also helps us avoid a locking issue, in
which the RIF cache was traversed without proper locking during
insertion into the neigh entry cache.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# de04b6a3 06-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Remove unused variable

Since commit 33b1341cd1bf ("mlxsw: spectrum_router: Fix handling of
neighbour structure") we no longer use destination IP for neighbour
lookup, so remove it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e60234dd 06-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Use ordered workqueue for neigh updates

We currently associate each neighbour entry with a work item, so it's
not possible to have multiple events queued for the same neighbour
entry. However, this is about to be changed so that the neighbour entry
is only resolved when the work item is scheduled.

The above can result in a mismatch between the kernel's and the device's
neighbour table, unless the associated work items are processed in the
order in which they were submitted.

Do that by migrating the NEIGH_UPDATE work items to be processed in the
ordered workqueue which was recently introduced in mlxsw in commit
a3832b31898f ("mlxsw: core: Create an ordered workqueue for FIB
offload").

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a0e4761d 06-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Queue work immediately instead of delaying it

We always use zero delay before queueing a work on the ordered workqueue
('mlxsw_owq'), so use work_struct directly instead of delayable work.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a59b7e02 23-Jan-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Correctly reallocate adjacency entries

mlxsw_sp_nexthop_group_mac_update() is called in one of two cases:

1) When the MAC of a nexthop needs to be updated
2) When the size of a nexthop group has changed

In the second case the adjacency entries for the nexthop group need to
be reallocated from the adjacency table. In this case we must write to
the entries the MAC addresses of all the nexthops that should be
offloaded and not only those whose MAC changed. Otherwise, these entries
would be filled with garbage data, resulting in packet loss.

Fixes: a7ff87acd995 ("mlxsw: spectrum_router: Implement next-hop routing")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 58312125 23-Dec-2016 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Correctly remove nexthop groups

At the end of the nexthop initialization process we determine whether
the nexthop should be offloaded or not based on the NUD state of the
neighbour representing it. After all the nexthops were initialized we
refresh the nexthop group and potentially offload it to the device, in
case some of the nexthops were resolved.

Make the destruction of a nexthop group symmetric with its creation by
marking all nexthops as invalid and then refresh the nexthop group to
make sure it was removed from the device's tables.

Fixes: b2157149b0b0 ("mlxsw: spectrum_router: Add the nexthop neigh activity update")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 93a87e5e 23-Dec-2016 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Don't reflect dead neighs

When a neighbour is considered to be dead, we should remove it from the
device's table regardless of its NUD state.

Without this patch, after setting a port to be administratively down we
get the following errors when we periodically try to update the kernel
about neighbours activity:

[ 461.947268] mlxsw_spectrum 0000:03:00.0 sw1p3: Failed to find
matching neighbour for IP=192.168.100.2

Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c3852ef7 03-Dec-2016 Ido Schimmel <idosch@mellanox.com>

ipv4: fib: Replay events when registering FIB notifier

Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
introduced a new notification chain to notify listeners (f.e., switchdev
drivers) about addition and deletion of routes.

However, upon registration to the chain the FIB tables can already be
populated, which means potential listeners will have an incomplete view
of the tables.

Solve that by dumping the FIB tables and replaying the events to the
passed notification block. The dump itself is done using RCU in order
not to starve consumers that need RTNL to make progress.

The integrity of the dump is ensured by reading the FIB change sequence
counter before and after the dump under RTNL. This allows us to avoid
the problematic situation in which the dumping process sends a ENTRY_ADD
notification following ENTRY_DEL generated by another process holding
RTNL.

Callers of the registration function may pass a callback that is
executed in case the dump was inconsistent with current FIB tables.

The number of retries until a consistent dump is achieved is set to a
fixed number to prevent callers from looping for long periods of time.
In case current limit proves to be problematic in the future, it can be
easily converted to be configurable using a sysctl.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3057224e 03-Dec-2016 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Implement FIB offload in deferred work

FIB offload is currently done in process context with RTNL held, but
we're about to dump the FIB tables in RCU critical section, so we can no
longer sleep.

Instead, defer the operation to process context using deferred work. Make
sure fib info isn't freed while the work is queued by taking a reference
on it and releasing it after the operation is done.

Deferring the operation is valid because the upper layers always assume
the operation was successful. If it's not, then the driver-specific
abort mechanism is called and all routed traffic is directed to slow
path.

The work items are submitted to an ordered workqueue to prevent a
mismatch between the kernel's FIB table and the device's.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d331d303 16-Nov-2016 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Adjust placement of FIB abort warning

The recent merge commit bb598c1b8c9b ("Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") would cause
the FIB abort warning to fire whenever we flush the FIB tables - either
during module removal or actual abort.

Move it back to its rightful location in the FIB abort function.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ac571de9 14-Nov-2016 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Flush FIB tables during fini

Since commit b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications
instead of switchdev calls") we reflect to the device the entire FIB
table and not only FIBs that point to netdevs created by the driver.

During module removal, FIBs of the second type are removed following
NETDEV_UNREGISTER events sent. The other FIBs are still present in both
the driver's cache and the device's table.

Fix this by iterating over all the FIB tables in the device and flush
them. There's no need to take locks, as we're the only writer.

Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8d419324 09-Nov-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Add FIB abort warning

Add a warning that the abort mechanism was triggered for device.
Also avoid going through the procedure if abort was already done.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 42cdb338 11-Nov-2016 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum_router: Correctly dump neighbour activity

The device's neighbour table is periodically dumped in order to update
the kernel about active neighbours. A single dump session may span
multiple queries, until the response carries less records than requested
or when a record (can contain up to four neighbour entries) is not full.
Current code stops the session when the number of returned records is
zero, which can result in infinite loop in case of high packet rate.

Fix this by stopping the session according to the above logic.

Fixes: c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0e3715c9 09-Nov-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Ignore FIB notification events for non-init namespaces

Since now, the table with same id in multiple netnamespaces were squashed
to a single virtual router. That is not only incorrect, it also causes
error messages when trying to use RALUE register to do double remove
of FIB entries, like this one:

mlxsw_spectrum 0000:03:00.0: EMAD reg access failed (tid=facb831c00007b20,reg_id=8013(ralue),type=write,status=7(bad parameter))

Since we don't allow ports to change namespaces (NETIF_F_NETNS_LOCAL),
and the infrastructure is not yet prepared to handle netnamespaces, just
ignore FIB notification events for non-init namespaces. That is clear to
do since we don't need to offload them.

Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 33b1341c 09-Nov-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Fix handling of neighbour structure

__neigh_create function works in a different way than assumed.
It passes "n" as a parameter to ndo_neigh_construct. But this "n" might
be destroyed right away before __neigh_create() returns in case there is
already another neighbour struct in the hashtable with the same dev and
primary key. That is not expected by mlxsw_sp_router_neigh_construct()
and the stored "n" points to freed memory, eventually leading to crash.

Fix this by doing tight 1:1 coupling between neighbour struct and
internal driver neigh_entry. That allows to narrow down the key in
internal driver hashtable to do lookups by "n" only.

Fixes: 6cf3c971dc84 ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8c9583a8 27-Oct-2016 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Remove extra whitespace

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8b99becd 25-Oct-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Compare only trees which are in use during tree get

Only trees which are in use should be compared to requested prefix usage.

Fixes: 53342023eed9 ("mlxsw: spectrum_router: Implement LPM trees management")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2083d367 25-Oct-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Save requested prefix bitlist when creating tree

Currently, the prefix bitlist is not saved for LPM trees, causing the
compare to always fail which causes the tree to be destroyed and created
for every inserted and removed FIB entry. So fix this by saving
the bitlist as it should have been done from the very beginning.

Fixes: 53342023eed9 ("mlxsw: spectrum_router: Implement LPM trees management")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c1a38311 21-Oct-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: Convert resources into array

Since the number of resources is going to get much bigger, ease up the
addition by simly defining IDs. Convert the existing structure members
to a set array, one for validity, one for values. Introduce a set of
getters and setters for easy access.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 37956d78 20-Oct-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Make mlxsw_sp_router_fib4_del return void and remove warn

The function return value is not checked anywhere. Also, the warning
causes huge slowdown when removing large number of FIB entries which
were not offloaded, because of ordering issue. Ido's preparing
a patchset to fix the ordering issue, but that is definitelly not
net tree material.

Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 19271c1a 20-Oct-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Use correct tree index for binding

By a mistake, there is tree index 0 passed to RALTB. Should be
MLXSW_SP_LPM_TREE_MIN.

Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Reported-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ab580705 30-Sep-2016 Arnd Bergmann <arnd@arndb.de>

mlxsw: spectrum_router: avoid potential uninitialized data usage

If fi->fib_nhs is zero, the router interface pointer is uninitialized, as shown by
this warning:

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c: In function 'mlxsw_sp_router_fib_event':
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:1674:21: error: 'r' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:1643:23: note: 'r' was declared here

This changes the loop so we handle the case the same way as finding no router
interface pointer attached to one of the nexthops to ensure we always
trap here instead of using uninitialized data.

Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b45f64d1 25-Sep-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls

Until now, in order to offload a FIB entry to HW we use switchdev op.
However that has limits. Mainly in case we need to make the HW aware of
all route prefixes configured in kernel. HW needs to know those in order
to properly trap appropriate packets and pass the to kernel to do
the forwarding. Abort mechanism is now handled within the mlxsw driver.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f8a62d4 20-Sep-2016 Nogah Frankel <nogahf@mellanox.com>

mlxsw: spectrum: Implement max rif resource

Replace max rif const with using the result from resource query.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9497c042 20-Sep-2016 Nogah Frankel <nogahf@mellanox.com>

mlxsw: spectrum: Implement max virtual routers resource

Replace max virtual routers const with the result from
the resource query.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1a9234e66 19-Sep-2016 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Fix sparse warnings

drivers/net/ethernet/mellanox/mlxsw//spectrum.c:251:28: warning: symbol
'mlxsw_sp_span_entry_find' was not declared. Should it be static?
drivers/net/ethernet/mellanox/mlxsw//spectrum.c:265:28: warning: symbol
'mlxsw_sp_span_entry_get' was not declared. Should it be static?
drivers/net/ethernet/mellanox/mlxsw//spectrum.c:367:56: warning: mixing
different enum types
drivers/net/ethernet/mellanox/mlxsw//spectrum.c:367:56: int enum
mlxsw_sp_span_type versus
drivers/net/ethernet/mellanox/mlxsw//spectrum.c:367:56: int enum
mlxsw_reg_mpar_i_e
...
drivers/net/ethernet/mellanox/mlxsw//spectrum_buffers.c:598:32: warning:
mixing different enum types
drivers/net/ethernet/mellanox/mlxsw//spectrum_buffers.c:598:32: int
enum mlxsw_reg_sbxx_dir versus
drivers/net/ethernet/mellanox/mlxsw//spectrum_buffers.c:598:32: int
enum devlink_sb_pool_type
drivers/net/ethernet/mellanox/mlxsw//spectrum_buffers.c:600:39: warning:
mixing different enum types
drivers/net/ethernet/mellanox/mlxsw//spectrum_buffers.c:600:39: int
enum mlxsw_reg_sbpr_mode versus
drivers/net/ethernet/mellanox/mlxsw//spectrum_buffers.c:600:39: int
enum devlink_sb_threshold_type
...
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:255:54: warning:
mixing different enum types
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:255:54: int
enum mlxsw_sp_l3proto versus
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:255:54: int
enum mlxsw_reg_ralxx_protocol
...
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1749:6: warning:
symbol 'mlxsw_sp_fib_entry_put' was not declared. Should it be static?

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 40d25904 08-Sep-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Fix error path in mlxsw_sp_router_init

When neigh_init fails, we have to do proper cleanup including
router_fini call.

Fixes: 6cf3c971dc84cb ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e7322638 01-Sep-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Fix netevent notifier registration

Currently the notifier is registered for every asic instance, however the
same block. Fix this by moving the registration to module init.

Fixes: c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7146da31 01-Sep-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Fix fib entry update path

Originally, I expected that there would be needed to call update
operation in case RALUE record action is changed. However, that is not
needed since write operation takes care of that nicely. Remove prepared
construct and always call the write operation.

Fixes: 61c503f976b5 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5b004412 01-Sep-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Fix failure caused by double fib removal from HW

In mlxsw we squash tables 254 and 255 together into HW. Kernel adds/dels
/32 ip to/from both 254 and 255. On del path, that causes the same
prefix being removed twice. Fix this by introducing reference counting
for private mlxsw fib entries. That required a bit of code reshuffle.
Also put dev into fib entry key so the same prefix could be represented
once per every router interface.

Fixes: 61c503f976b5 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 51af96b5 24-Aug-2016 Yotam Gigi <yotamg@mellanox.com>

mlxsw: router: Enable neighbors to be created on stacked devices

Make the function mlxsw_router_neigh_construct search the rif according
to the neighbour dev other than the dev that was passed to the ndo, thus
allowing creating neigbhours upon stacked devices.

Fixes: 6cf3c971dc84 ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eb8fc323 14-Aug-2016 Vincent <vincent.stehle@laposte.net>

mlxsw: spectrum_router: Fix use after free

In mlxsw_sp_router_fib4_add_info_destroy(), the fib_entry pointer is used
after it has been freed by mlxsw_sp_fib_entry_destroy(). Use a temporary
variable to fix this.

Fixes: 61c503f976b5449e ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a1e3e737 14-Jul-2016 Christophe Jaillet <christophe.jaillet@wanadoo.fr>

mlxsw: spectrum_router: Return -ENOENT in case of error

'vr' should be a valid pointer here, so returning 'PTR_ERR(vr)' is wrong.
Return an explicit error code (-ENOENT) instead.

Fixes: 61c503f976 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0b2361d9 05-Jul-2016 Yotam Gigi <yotamg@mellanox.com>

mlxsw: Add the unresolved next-hops probes

Now, the driver sends arp probes for all unresolved neighbours that are
currently a nexthop for some route on the system. The job is set
periodically every 5 seconds.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b2157149 05-Jul-2016 Yotam Gigi <yotamg@mellanox.com>

mlxsw: spectrum_router: Add the nexthop neigh activity update

For nexthop neighbours we need to make kernel to think there is a traffic
flowing to them preventing it from going to stale state. Otherwise
kernel would stale it and eventually the neigh would be removed from HW
and nexthop as well. That would reduce ECMP group in HW.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a7ff87ac 05-Jul-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Implement next-hop routing

Implement next-hop routing offload including ECMP. To make it possible,
introduce next-hop group entity. This entity keeps track of resolved
neighbours and updates HW adjacency table accordingly. Note that HW
next-hops are stored in this adjacency table, in form of MAC.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a6bf9e93 05-Jul-2016 Yotam Gigi <yotamg@mellanox.com>

mlxsw: spectrum_router: Offload neighbours based on NUD state change

Listen to any NEIGH_UPDATE events sent and program the device
accordingly. If NUD state is VALID and neighbour isn't yet offloaded,
then program it into the device's table. Otherwise, just edit its
parameters.

If NUD state machine transitioned neighbour out of VALID state and it's
present in the device's table, then remove it.

Note that the device is programmed in delayed work, as the netevent
notification chain is atomic and prevents us from going to sleep.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c723c735 05-Jul-2016 Yotam Gigi <yotamg@mellanox.com>

mlxsw: spectrum_router: Periodically update the kernel's neigh table

As previously explained, the driver should periodically poll the device
for neighbours activity according to the configured DELAY_PROBE_TIME.
This will prevent active neighbours from staying in STALE state for long
periods of time.

During init configure the polling interval according to the
DELAY_PROBE_TIME used in the default table. In addition, register a
netevent notification block, so that the interval is updated whenever
DELAY_PROBE_TIME changes.

Using the computed interval schedule a delayed work, which will update
the kernel via neigh_event_send() on any active neighbour since the last
delayed work.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6cf3c971 05-Jul-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Add private neigh table

We need to hold some private data for every neigh entry. It would be
possible to do it using neigh_priv_len/ndo_neigh_construct/
ndo_neigh_destroy however only for the port device itself. That would not
work for stacked devices like bridge/team/bond. So introduce a private
neigh table. Hook onto ndos neigh_construct/destroy and add/remove
table entry according to that.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 61c503f9 04-Jul-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops

Implement ipv4 FIB entries addition and removal. Initially, we support
local and broadcast routes using "ip2me" trap action.
Also, unicast routes without nexthop are supported using "local" action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6b75c480 04-Jul-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Add virtual router management

Virtual router is a construct used inside HW. In this implementation
we map kernel tables to virtual routers one to one. Introduce management
logic to create virtual routers when needed and destroy in case they are
no longer in use. According to that, call into LPM tree management.
Each virtual router is always bound to one LPM tree.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 53342023 04-Jul-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Implement LPM trees management

Introduce basic LPM tree management allowing to share the trees in
between tables if the used prefixes in the tables are the same.
Build the tree structure according to the used prefixes. Although it is
not optimal for many use cases, this initial implementation does only
simple linear left-tree. More advanced structures will be introduced
later on, possibly including mechanisms to change trees on the fly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5e9c16cc 04-Jul-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_router: Implement private fib

Shadow FIB is needed in order to hold additional information for FIB
entries and keep track of used prefixes. That is needed for the LPM tree
construction to be introduced later on in this set.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 464dce18 02-Jul-2016 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_router: Add basic ipv4 router initialization

Create a skeleton router file and do basic HW initialization of router.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>