History log of /linux-master/drivers/net/ethernet/mellanox/mlxsw/core.c
Revision Date Author Comments
# 976c44af 18-Apr-2024 Ido Schimmel <idosch@nvidia.com>

mlxsw: core: Unregister EMAD trap using FORWARD action

The device's manual (PRM - Programmer's Reference Manual) classifies the
trap that is used to deliver EMAD responses as an "event trap". Among
other things, it means that the only actions that can be associated with
the trap are TRAP and FORWARD (NOP).

Currently, during driver de-initialization the driver unregisters the
trap by setting its action to DISCARD, which violates the above
guideline. Future firmware versions will prevent such misuses by
returning an error. This does not prevent the driver from working, but
an error will be printed to the kernel log during module removal /
devlink reload:

mlxsw_spectrum 0000:03:00.0: Reg cmd access status failed (status=7(bad parameter))
mlxsw_spectrum 0000:03:00.0: Reg cmd access failed (reg_id=7003(hpkt),type=write)

Suppress the error message by aligning the driver to the manual and use
a FORWARD (NOP) action when unregistering the trap.

Fixes: 4ec14b7634b2 ("mlxsw: Add interface to access registers and process events")
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/753a89e14008fde08cb4a2c1e5f537b81d8eb2d6.1713446092.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 09591595 20-Nov-2023 Petr Machata <petrm@nvidia.com>

mlxsw: core, pci: Add plumbing related to CFF mode

CFF mode, for Compressed FID Flooding, is a way of organizing flood vectors
in the PGT table. The bus module determines whether CFF is supported, can
configure flood mode to CFF if it is, and knows what flood mode has been
configured. Therefore add a bus callback to determine the configured flood
mode. Also add to core an API to query it.

Since after this patch, we rely on mlxsw_pci->flood_mode being set, it
becomes a coding error if a driver invokes this function with a set of
fields that misses the initialization. Warn and bail out in that case.

The CFF mode is not used as of this patch. The code to actually use it will
be added later.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/889d58759dd40f5037f2206b9fc4a78a9240da80.1700503644.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# b2e9b1fe 18-Oct-2023 Petr Machata <petrm@nvidia.com>

mlxsw: core, pci: Add plumbing related to LAG mode

lag_mode describes where the responsibility for LAG table placement lies:
SW or FW. The bus module determines whether LAG is supported, can configure
it if it is, and knows what (if any) configuration has been applied.
Therefore add a bus callback to determine the configured LAG mode. Also add
to core an API to query it.

The LAG mode is for now kept at the default value of 0 for FW-managed. The
code to actually toggle it will be added later.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1d434b48 18-Oct-2023 Przemek Kitszel <przemyslaw.kitszel@intel.com>

mlxsw: core: devlink health: use retained error fmsg API

Drop unneeded error checking.

devlink_fmsg_*() family of functions is now retaining errors,
so there is no need to check for them after each call.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f58a3e4d 25-May-2023 Jiri Pirko <jiri@resnulli.us>

devlink: move port_split/unsplit() ops into devlink_port_ops

Move port_split/unsplit() from devlink_ops into newly introduced
devlink_port_ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 865a1a1b 25-May-2023 Jiri Pirko <jiri@resnulli.us>

mlxsw_core: register devlink port with ops

Use newly introduce devlink port registration function variant and
register devlink port passing ops.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Tested-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 9d9a90cd 06-Feb-2023 Ido Schimmel <idosch@nvidia.com>

mlxsw: core: Register devlink instance before sub-objects

Recent changes made it possible to register the devlink instance before
its sub-objects and under the instance lock. Among other things, it
allows us to avoid warnings such as this one [1]. The warning is
generated because a buggy firmware is generating a health event during
driver initialization, before the devlink instance is registered.

Move the registration of the devlink instance to the beginning of the
initialization flow to avoid such problems.

A similar change was implemented in netdevsim in commit 82a3aef2e6af
("netdevsim: move devlink registration under the instance lock").

[1]
WARNING: CPU: 3 PID: 49 at net/devlink/leftover.c:7509 devlink_recover_notify.constprop.0+0xaf/0xc0
[...]
Call Trace:
<TASK>
devlink_health_report+0x45/0x1d0
mlxsw_core_health_event_work+0x24/0x30 [mlxsw_core]
process_one_work+0x1db/0x390
worker_thread+0x49/0x3b0
kthread+0xe5/0x110
ret_from_fork+0x1f/0x30
</TASK>

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 74cbc3c0 06-Feb-2023 Ido Schimmel <idosch@nvidia.com>

mlxsw: spectrum_acl_tcam: Move devlink param to TCAM code

Cited commit added 'DEVLINK_CMD_PARAM_DEL' notifications whenever the
network namespace of the devlink instance is changed. Specifically, the
notifications are generated after calling reload_down(), but before
calling reload_up(). At this stage, the data structures accessed while
reading the value of the "acl_region_rehash_interval" devlink parameter
are uninitialized, resulting in a use-after-free [1].

Fix by moving the registration and unregistration of the devlink
parameter to the TCAM code where it is actually used. This means that
the parameter is unregistered during reload_down() and then
re-registered during reload_up(), avoiding the use-after-free between
these two operations.

Reproducer:

# ip netns add test123
# devlink dev reload pci/0000:06:00.0 netns test123

[1]
BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xb2/0xd0
Read of size 4 at addr ffff888162fd37d8 by task devlink/1323
[...]
Call Trace:
<TASK>
dump_stack_lvl+0x95/0xbd
print_report+0x181/0x4a1
kasan_report+0xdb/0x200
mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xb2/0xd0
mlxsw_sp_params_acl_region_rehash_intrvl_get+0x32/0x80
devlink_nl_param_fill.constprop.0+0x29a/0x11e0
devlink_param_notify.constprop.0+0xb9/0x250
devlink_notify_unregister+0xbc/0x470
devlink_reload+0x1aa/0x440
devlink_nl_cmd_reload+0x559/0x11b0
genl_family_rcv_msg_doit.isra.0+0x1f8/0x2e0
genl_rcv_msg+0x558/0x7f0
netlink_rcv_skb+0x170/0x440
genl_rcv+0x2d/0x40
netlink_unicast+0x53f/0x810
netlink_sendmsg+0x961/0xe80
__sys_sendto+0x2a4/0x420
__x64_sys_sendto+0xe5/0x1c0
do_syscall_64+0x38/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 7d7e9169a3ec ("devlink: move devlink reload notifications back in between _down() and _up() calls")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# fb8421a9 27-Jan-2023 Jiri Pirko <jiri@nvidia.com>

devlink: remove devlink features

Devlink features were introduced to disallow devlink reload calls of
userspace before the devlink was fully initialized. The reason for this
workaround was the fact that devlink reload was originally called
without devlink instance lock held.

However, with recent changes that converted devlink reload to be
performed under devlink instance lock, this is redundant so remove
devlink features entirely.

Note that mlx5 used this to enable devlink reload conditionally only
when device didn't act as multi port slave. Move the multi port check
into mlx5_devlink_reload_down() callback alongside with the other
checks preventing the device from reload in certain states.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 075935f0 26-Jan-2023 Jiri Pirko <jiri@nvidia.com>

devlink: protect devlink param list by instance lock

Commit 1d18bb1a4ddd ("devlink: allow registering parameters after
the instance") as the subject implies introduced possibility to register
devlink params even for already registered devlink instance. This is a
bit problematic, as the consistency or params list was originally
secured by the fact it is static during devlink lifetime. So in order to
protect the params list, take devlink instance lock during the params
operations. Introduce unlocked function variants and use them in drivers
in locked context. Put lock assertions to appropriate places.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 49f5b769 19-Jan-2023 Amit Cohen <amcohen@nvidia.com>

mlxsw: Add support of latency TLV

The latency of each EMAD can be measured by firmware. The driver can get
the measurement via latency TLV which can be added to each EMAD. This TLV
is optional, when EMAD is sent with this TLV, the EMAD's response will
include the TLV and the field 'latency_time' will contain the firmware
measurement.

This information can be processed using BPF program for example, to
create a histogram and average of the latency per register. In addition,
it is possible to measure the end-to-end latency, and then reduce firmware
measurement, which will result in the latency of the software overhead.
This information can be useful to improve the driver performance.

Add support for latency TLV by default for all EMADs. First we planned to
enable latency TLV per demand, using devlink-param. After some tests, we
know that the usage of latency TLV does not impact the end-to-end latency,
so it is OK to enable it by default.

Note that similar to string TLV, the latency TLV is not supported in all
firmware versions. Enable the usage of this TLV only after verifying it is
supported by the current firmware version by querying the Management
General Information Register (MGIR).

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 6ee0d3a9 19-Jan-2023 Amit Cohen <amcohen@nvidia.com>

mlxsw: core: Define latency TLV fields

The next patch will add support for latency TLV as part of EMAD (Ethernet
Management Datagrams) packets. As preparation, add the relevant fields.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 563bd3c4 19-Jan-2023 Amit Cohen <amcohen@nvidia.com>

mlxsw: core: Do not worry about changing 'enable_string_tlv' while sending EMADs

Till now, the field 'mlxsw_core->emad.enable_string_tlv' is set as part
of mlxsw_sp_init(), this means that it can be changed during
emad_reg_access(). To avoid such change, this field is read once in
emad_reg_access() and the value is used all the way.

The previous patch sets this value according to MGIR output, as part of
mlxsw_emad_init(), so now it cannot be changed while sending EMADs.

Do not save 'enable_string_tlv' and do not pass it to functions, just pass
'struct mlxsw_core' and use the value directly from it.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# d84e2359 19-Jan-2023 Amit Cohen <amcohen@nvidia.com>

mlxsw: Enable string TLV usage according to MGIR output

String TLV is not supported by old firmware versions, therefore
'struct mlxsw_core' stores the field 'emad.enable_string_tlv', which is
set to true only after firmware version check.

Instead of assuming that firmware version check is enough to enable
string TLV, a better solution is to query if this TLV is supported from
MGIR register. Add such query and initialize 'emad.enable_string_tlv'
accordingly.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# dfdfd130 18-Jan-2023 Jiri Pirko <jiri@nvidia.com>

devlink: protect health reporter operation with instance lock

Similar to other devlink objects, protect the reporters list
by devlink instance lock. Alongside add unlocked versions
of health reporter create/destroy functions and use them in drivers
on call paths where the instance lock is held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 226bf980 29-Nov-2022 Vincent Mailhol <mailhol.vincent@wanadoo.fr>

net: devlink: let the core report the driver name instead of the drivers

The driver name is available in device_driver::name. Right now,
drivers still have to report this piece of information themselves in
their devlink_ops::info_get callback function.

In order to factorize code, make devlink_nl_info_fill() add the driver
name attribute.

Now that the core sets the driver name attribute, drivers are not
supposed to call devlink_info_driver_name_put() anymore. Remove
devlink_info_driver_name_put() and clean-up all the drivers using this
function in their callback.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Tested-by: Ido Schimmel <idosch@nvidia.com> # mlxsw
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ac73d4bf 02-Nov-2022 Jiri Pirko <jiri@nvidia.com>

net: make drivers to use SET_NETDEV_DEVLINK_PORT to set devlink_port

Benefit from the previously implemented tracking of netdev events in
devlink code and instead of calling devlink_port_type_eth_set() and
devlink_port_type_clear() to set devlink port type and link to related
netdev, use SET_NETDEV_DEVLINK_PORT() macro to assign devlink_port
pointer to netdevice which is about to be registered.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f029c781 30-Aug-2022 Wolfram Sang <wsa+renesas@sang-engineering.com>

net: ethernet: move from strlcpy with unused retval to strscpy

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Petr Machata <petrm@nvidia.com> # For drivers/net/ethernet/mellanox/mlxsw
Acked-by: Geoff Levand <geoff@infradead.org> # For ps3_gelic_net and spider_net_ethtool
Acked-by: Tom Lendacky <thomas.lendacky@amd.com> # For drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c
Acked-by: Marcin Wojtas <mw@semihalf.com> # For drivers/net/ethernet/marvell/mvpp2
Reviewed-by: Leon Romanovsky <leonro@nvidia.com> # For drivers/net/ethernet/mellanox/mlx{4|5}
Reviewed-by: Shay Agroskin <shayagr@amazon.com> # For drivers/net/ethernet/amazon/ena
Acked-by: Krzysztof HaƂasa <khalasa@piap.pl> # For IXP4xx Ethernet
Link: https://lore.kernel.org/r/20220830201457.7984-3-wsa+renesas@sang-engineering.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# cf735d4c 26-Aug-2022 Amit Cohen <amcohen@nvidia.com>

mlxsw: Add a helper function for getting maximum LAG ID

Currently the driver queries the maximum supported LAG ID from firmware.
This will not be accurate anymore once the driver will configure 'max_lag'
via CONFIG_PROFILE command.

For resource query, firmware returns the maximum LAG ID which is supported
by hardware. Software can configure firmware to do not allocate entries for
all the supported LAGs, and to limit LAG IDs. In this case, the resource
query will not return the actual maximum LAG ID.

Add a helper function for getting this value. In case that 'max_lag' field
was set during initialization, return the value which was used, otherwise,
query firmware for the maximum supported ID.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 12be3edf 24-Aug-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: Remove unused mlxsw_core_port_type_get()

Function mlxsw_core_port_type_get() is no longer used. So remove it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 04a1b674 24-Aug-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: Remove unused port_type_set devlink op

port_type_set devlink op is no longer used by any mlxsw driver,
so remove it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 3471ac9b 24-Aug-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: Remove unused IB stuff

There are some IB leftovers that are no longer used in the code.
So remove them.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 2ab4e709 21-Aug-2022 Vadim Pasternak <vadimp@nvidia.com>

mlxsw: core: Add registration APIs for system event handler

The purpose of system event handler is to handle system interrupts.
Such interrupts are raised to CPU from system programmable logic
devices, upon specific system wide changes, like line card activation
and deactivation.

The purpose is to create an alternative to trap mechanism, which
delivers these events to driver over PCI bus, but not available for
the driver working over I2C bus.

Mechanism is system dependent and applicable only for the systems
equipped with programmable devices with custom logic.

Add APIs for event handler registration and un-registration and API
which should be invoked from the registered callbacks when system
interrupt is raised to CPU.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 644a66c6 29-Jul-2022 Jiri Pirko <jiri@nvidia.com>

net: devlink: convert reload command to take implicit devlink->lock

Convert reload command to behave the same way as the rest of the
commands and let if be called with devlink->lock held. Remove the
temporary devl_lock taking from drivers. As the DEVLINK_NL_FLAG_NO_LOCK
flag is no longer used, remove it alongside.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bbd30057 27-Jul-2022 Danielle Ratson <danieller@nvidia.com>

mlxsw: Query UTC sec and nsec PCI offsets and values

Query UTC sec and nsec PCI offsets during the pci_init(), to be able to
read UTC time later.

Implement functions to read UTC seconds and nanoseconds from the offset
which was read as part of initialization.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 42823208 27-Jul-2022 Danielle Ratson <danieller@nvidia.com>

mlxsw: Support CQEv2 for SDQ in Spectrum-2 and newer ASICs

Currently, Tx completions are reported using Completion Queue Element
version 1 (CQEv1). These elements do not contain the Tx time stamp,
which is fine as Spectrum-1 reads Tx time stamps via a dedicated FIFO
and Spectrum-2 does not currently support PTP.

In preparation for Spectrum-2 PTP support, use CQEv2 for Spectrum-2 and
newer ASICs, as this CQE format encodes the Tx time stamp.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9ca6a7a5 25-Jul-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: core_linecards: Implement line card device flashing

Implement flash_update() devlink op for the line card devlink instance
to allow user to update line card gearbox FW using MDDT register
and mlxfw.

Example:
$ devlink dev flash auxiliary/mlxsw_core.lc.0 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# bd02fd76 25-Jul-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: core_linecards: Introduce per line card auxiliary device

In order to be eventually able to expose line card gearbox version and
possibility to flash FW, model the line card as a separate device on
auxiliary bus.

Add the auxiliary device for provisioned line card in order to be able
to expose provisioned line card info over devlink dev info. When the
line card becomes active, there may be other additional info added to
the output.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 1b5995e3 21-Jul-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: core: Fix use-after-free calling devl_unlock() in mlxsw_core_bus_device_unregister()

Do devl_unlock() before freeing the devlink in
mlxsw_core_bus_device_unregister() function.

Reported-by: Ido Schimmel <idosch@nvidia.com>
Fixes: 72a4c8c94efa ("mlxsw: convert driver to use unlocked devlink API during init/fini")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220721142424.3975704-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 72a4c8c9 16-Jul-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: convert driver to use unlocked devlink API during init/fini

Prepare for devlink reload being called with devlink->lock held and
convert the mlxsw driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 6a4b02b8 13-Jun-2022 Petr Machata <petrm@nvidia.com>

mlxsw: Revert "Introduce initial XM router support"

This reverts commit 75c2a8fe8e39 ("Merge branch
'mlxsw-introduce-initial-xm-router-support'").

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 7b261af9 19-Apr-2022 Vadim Pasternak <vadimp@nvidia.com>

mlxsw: core: Add bus argument to environment init API

Pass bus argument to mlxsw_env_init(). The purpose is to get access to
device handle, which is to be provided to error message in case of line
card activation failure.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6445eef0 18-Apr-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum: Add port to linecard mapping

For each port get slot_index using PMLP register. For ports residing
on a linecard, identify it with the linecard by setting mapping
using devlink_port_linecard_set() helper. Use linecard slot index for
PMTDB register queries.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 45bf3b72 18-Apr-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: core: Extend driver ops by remove selected ports op

In case of line card implementation, the core has to have a way to
remove relevant ports manually. Extend the Spectrum driver ops by an op
that implements port removal of selected ports upon request.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b217127e 18-Apr-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: core_linecards: Add line card objects and implement provisioning

Introduce objects for line cards and an infrastructure around that.
Use devlink_linecard_create/destroy() to register the line card with
devlink core. Implement provisioning ops with a list of supported
line cards.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 706217c1 15-Mar-2022 Jakub Kicinski <kuba@kernel.org>

devlink: pass devlink_port to port_split / port_unsplit callbacks

Now that devlink ports are protected by the instance lock
it seems natural to pass devlink_port as an argument to
the port_split / port_unsplit callbacks.

This should save the drivers from doing a lookup.

In theory drivers may have supported unsplitting ports
which were not registered prior to this change.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 5e8930aa 15-Mar-2022 Jakub Kicinski <kuba@kernel.org>

eth: mlxsw: switch to explicit locking for port registration

Explicitly lock the devlink instance and use devl_ API.

This will be used by the subsequent patch to invoke
.port_split / .port_unsplit callbacks with devlink
instance lock held.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# cc4d3de9 22-Feb-2022 Ido Schimmel <idosch@nvidia.com>

mlxsw: Remove resource query check

Since SwitchX-2 support was removed in commit b0d80c013b04 ("mlxsw:
Remove Mellanox SwitchX-2 ASIC support"), all the ASICs supported by
mlxsw support the resource query command.

Therefore, remove the resource query check and always query resources
from the device.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 902992d1 22-Feb-2022 Vadim Pasternak <vadimp@nvidia.com>

mlxsw: core: Unify method of trap support validation

Currently there are several different features defined in 'mlxsw_driver'
for trap support validation. There is no reason to have dedicated
features for specific traps. Perform validation of all of them by
testing feature 'MLXSW_BUS_F_TXRX'.

Remove trap capability validation from 'core_env.c' which is redundant
after validation has been added to mlxsw_core_trap_register().

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bfb82c9c 22-Feb-2022 Vadim Pasternak <vadimp@nvidia.com>

mlxsw: core_thermal: Remove obsolete API for query resource

Remove obsolete API mlxsw_core_res_query_enabled(), which is only
relevant for end-of-life SwitchX-2 ASICs. Support for these ASICs was
removed in commit b0d80c013b04 ("mlxsw: Remove Mellanox SwitchX-2 ASIC
support").

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c035ea76 22-Feb-2022 Vadim Pasternak <vadimp@nvidia.com>

mlxsw: core: Prevent trap group setting if driver does not support EMAD

Avoid trap group setting if driver is not capable of EMAD support.
For example, "mlxsw_minimal" driver works over I2C bus, overs which
EMADs cannot be sent.
Validation is performed by testing feature 'MLXSW_BUS_F_TXRX'.

Fixes: 74e0494d35ac ("mlxsw: core: Move basic_trap_groups_set() call out of EMAD init code")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 636d3ad2 27-Jan-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: core: Consolidate trap groups to a single event group

For event traps which are used in core, avoid having a separate trap
group for each event. Instead of that introduce a single core event trap
group and use it for all event traps.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 981f1d18 27-Jan-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: core: Move functions to register/unregister array of traps to core.c

These functions belong to core.c alongside the functions that
register/unregister a single trap. Move it there. Make the functions
possibly usable by other parts of mlxsw code.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 8ae89cf4 27-Jan-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: core: Move basic trap group initialization from spectrum.c

Instead of initializing the trap groups used by core in spectrum.c
over op, do it directly in core.c

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 74e0494d 27-Jan-2022 Jiri Pirko <jiri@nvidia.com>

mlxsw: core: Move basic_trap_groups_set() call out of EMAD init code

The call inits the EMAD group, but other groups as well. Therefore, move
it out of EMAD init code and call it before.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 239cdd3f 19-Dec-2021 Danielle Ratson <danieller@nvidia.com>

mlxsw: core: Extend devlink health reporter with new events and parameters

Extend the devlink health reporter registered by mlxsw to report new
health events and their related parameters. These are meant to aid in
debugging of hardware / firmware issues.

Beside the test event ('MLXSW_REG_MFDE_EVENT_ID_TEST') that is triggered
following the devlink health 'test' sub-command, the new events are used
to report the triggering of asserts in firmware code
('MLXSW_REG_MFDE_EVENT_ID_FW_ASSERT') and hardware issues
('MLXSW_REG_MFDE_EVENT_ID_FATAL_CAUSE').

Each event is accompanied with a severity parameter and per-event
parameters that are meant to help root cause the detected issue.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4bcbf502 19-Dec-2021 Danielle Ratson <danieller@nvidia.com>

mlxsw: core: Convert a series of if statements to switch case

Convert a series of if statements that handle different events to a
switch case statement. Encapsulate the per-event code in different
functions to simplify the code.

This is a preparation for subsequent patches that will add more events
that need to be handled.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cbbd5fff 19-Dec-2021 Danielle Ratson <danieller@nvidia.com>

mlxsw: Fix naming convention of MFDE fields

Currently, the MFDE register field names are using the convention:
reg_mfde_<NAME_OF_FIELD>, and do not consider the name of the MFDE
event.

Fix the field names so they fit the more accurate convention:
reg_mfde_<NAME_OF_EVENT>_<NAME_OF_FIELD>.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c934757d 01-Dec-2021 Amit Cohen <amcohen@nvidia.com>

mlxsw: Use u16 for local_port field instead of u8

Currently, local_port field is saved as u8, which means that maximum 256
ports can be used.

As preparation for Spectrum-4, which will support more than 256 ports,
local_port field should be extended.

Save local_port as u16 to allow use of additional ports.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4c897cfc 29-Nov-2021 Leon Romanovsky <leon@kernel.org>

devlink: Simplify devlink resources unregister call

The devlink_resources_unregister() used second parameter as an
entry point for the recursive removal of devlink resources. None
of the callers outside of devlink core needed to use this field,
so let's remove it.

As part of this removal, the "struct devlink_resource" was moved
from .h to .c file as it is not possible to use in any place in
the code except devlink.c.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 82465bec 12-Oct-2021 Leon Romanovsky <leon@kernel.org>

devlink: Delete reload enable/disable interface

Commit a0c76345e3d3 ("devlink: disallow reload operation during device
cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload
operation from racing with device probe/dismantle.

After recent changes to move devlink_register() to the end of device
probe and devlink_unregister() to the beginning of device dismantle,
these races can no longer happen. Reload operations will be denied if
the devlink instance is unregistered and devlink_unregister() will block
until all in-flight operations are done.

Therefore, remove these devlink_reload_{enable,disable}() APIs.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# bd032e35 12-Oct-2021 Leon Romanovsky <leon@kernel.org>

devlink: Allow control devlink ops behavior through feature mask

Introduce new devlink call to set feature mask to control devlink
behavior during device initialization phase after devlink_alloc()
is already called.

This allows us to set reload ops based on device property which
is not known at the beginning of driver initialization.

For the sake of simplicity, this API lacks any type of locking and
needs to be called before devlink_register() to make sure that no
parallel access to the ops is possible at this stage.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# b2ab483f 25-Sep-2021 Leon Romanovsky <leon@kernel.org>

mlxsw: core: Register devlink instance last

Make sure that devlink is open to receive user input when all
parameters are initialized.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# db4278c5 22-Sep-2021 Leon Romanovsky <leon@kernel.org>

devlink: Make devlink_register to be void

devlink_register() can't fail and always returns success, but all drivers
are obligated to check returned status anyway. This adds a lot of boilerplate
code to handle impossible flow.

Make devlink_register() void and simplify the drivers that use that
API call.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Vladimir Oltean <olteanv@gmail.com> # dsa
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 25a91f83 15-Sep-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: core: Remove mlxsw_core_is_initialized()

After the previous patch, the switch driver is always initialized last,
making this function redundant.

Remove it.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3d7a6f67 15-Sep-2021 Ido Schimmel <idosch@nvidia.com>

mlxsw: core: Initialize switch driver last

Commit 961cf99a074f ("mlxsw: core: Re-order initialization sequence")
changed the initialization sequence so that the switch driver (e.g.,
mlxsw_spectrum) is initialized before registration with the hwmon and
thermal subsystems.

This was done in order to avoid situations where hwmon/thermal code uses
features not supported by current firmware version, which is only
validated as part of switch driver initialization.

Later, commit b79cb787ac70 ("mlxsw: Move fw flashing code into core.c")
moved firmware validation and flashing code from the switch driver to
mlxsw_core so that it is performed before driver initialization.

Therefore, change the initialization sequence back to its original form.

In addition to being more straightforward, it will allow us to simplify
parts of the code in subsequent patches and future patchsets.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 32ada69b 14-Sep-2021 Jiri Pirko <jiri@nvidia.com>

mlxsw: spectrum: Use PMTDB register to obtain split info

Newly introduced PMTDB register is there to provide all needed info
about particular requested port split configuration. Use it instead of
figuring the info out manually in the driver.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 919d13a7 08-Aug-2021 Leon Romanovsky <leon@kernel.org>

devlink: Set device as early as possible

All kernel devlink implementations call to devlink_alloc() during
initialization routine for specific device which is used later as
a parent device for devlink_register().

Such late device assignment causes to the situation which requires us to
call to device_register() before setting other parameters, but that call
opens devlink to the world and makes accessible for the netlink users.

Any attempt to move devlink_register() to be the last call generates the
following error due to access to the devlink->dev pointer.

[ 8.758862] devlink_nl_param_fill+0x2e8/0xe50
[ 8.760305] devlink_param_notify+0x6d/0x180
[ 8.760435] __devlink_params_register+0x2f1/0x670
[ 8.760558] devlink_params_register+0x1e/0x20

The simple change of API to set devlink device in the devlink_alloc()
instead of devlink_register() fixes all this above and ensures that
prior to call to devlink_register() everything already set.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7dafcc4c 25-May-2021 Jiri Pirko <jiri@nvidia.com>

mlxsw: core: use PSID string define in devlink info

Instead of having the string spelled out in the driver, use the global
define with the same value.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f55c998c 25-May-2021 Jiri Pirko <jiri@nvidia.com>

mlxsw: core: Expose FW version over defined keyword

To be aligned with the rest of the drivers, expose FW version under "fw"
keyword in devlink dev info, in addition to the existing "fw.version",
which is currently Mellanox-specific.

devlink output before:
running:
fw.version 30.2008.2018
after:
running:
fw.version 30.2008.2018
fw 30.2008.2018

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 8c2b58e6 17-May-2021 Ido Schimmel <idosch@OSS.NVIDIA.COM>

mlxsw: core: Avoid unnecessary EMAD buffer copy

mlxsw_emad_transmit() takes care of sending EMAD transactions to the
device. Since these transactions can time out, the driver performs up to
5 retransmissions, each time copying the skb with the original request.

The data of the skb does not change throughout the process, so there is
no need to copy it each time. Instead, only the skb itself can be
copied. Therefore, use skb_clone() instead of skb_copy().

This reduces the latency of the function by about 16%.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4734a750 10-Mar-2021 Danielle Ratson <danieller@nvidia.com>

mlxsw: Adjust some MFDE fields shift and size to fw implementation

MFDE.irisc_id and MFDE.event_id were adjusted according to what is
actually implemented in firmware.

Adjust the shift and size of these fields in mlxsw as well.

Note that the displacement of the first field is not a regression.
It was always incorrect and therefore reported "0".

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 315afd20 10-Mar-2021 Danielle Ratson <danieller@nvidia.com>

mlxsw: core: Expose MFDE.log_ip to devlink health

Add the MFDE.log_ip field to devlink health reporter in order to ease
firmware debug. This field encodes the instruction pointer that triggered
the CR space timeout.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 321f7ab0 21-Jan-2021 Danielle Ratson <danieller@nvidia.com>

mlxsw: Register physical ports as a devlink resource

The switch ASIC has a limited capacity of physical ('flavour physical'
in devlink terminology) ports that it can support. While each system is
brought up with a different number of ports, this number can be
increased via splitting up to the ASIC's limit.

Expose physical ports as a devlink resource so that user space will have
visibility to the maximum number of ports that can be supported and the
current occupancy.

In addition, add a "Generic Resources" section in devlink-resource
documentation so the different drivers will be aligned by the same resource
name when exposing to user space.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 50779c33 14-Dec-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: Ignore ports that are connected to eXtended mezanine

Use the info stored in the bus_info struct about the eXtended mezanine
connected ports and don't expose them.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 4834ad80 06-Dec-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: core: Trace EMAD events

Currently, mlxsw triggers the 'devlink:devlink_hwmsg' tracepoint
whenever a request is sent to the device and whenever a response is
received from it. However, the tracepoint is not triggered when an event
(e.g., port up / down) is received from the device.

Also trace EMAD events in order to log a more complete picture of all
the exchanged hardware messages.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b44cfd4f 18-Nov-2020 Jacob Keller <jacob.e.keller@intel.com>

devlink: move request_firmware out of driver

All drivers which implement the devlink flash update support, with the
exception of netdevsim, use either request_firmware or
request_firmware_direct to locate the firmware file. Rather than having
each driver do this separately as part of its .flash_update
implementation, perform the request_firmware within net/core/devlink.c

Replace the file_name parameter in the struct devlink_flash_update_params
with a pointer to the fw object.

Use request_firmware rather than request_firmware_direct. Although most
Linux distributions today do not have the fallback mechanism
implemented, only about half the drivers used the _direct request, as
compared to the generic request_firmware. In the event that
a distribution does support the fallback mechanism, the devlink flash
update ought to be able to use it to provide the firmware contents. For
distributions which do not support the fallback userspace mechanism,
there should be essentially no difference between request_firmware and
request_firmware_direct.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 1f492eab 17-Nov-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: core: Use variable timeout for EMAD retries

The driver sends Ethernet Management Datagram (EMAD) packets to the
device for configuration purposes and waits for up to 200ms for a reply.
A request is retried up to 5 times.

When the system is under heavy load, replies are not always processed in
time and EMAD transactions fail.

Make the process more robust to such delays by using exponential
backoff. First wait for up to 200ms, then retransmit and wait for up to
400ms and so on.

Fixes: caf7297e7ab5 ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Reported-by: Denis Yulevich <denisyu@nvidia.com>
Tested-by: Denis Yulevich <denisyu@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 0daf2bf5 24-Oct-2020 Amit Cohen <amcohen@nvidia.com>

mlxsw: core: Fix use-after-free in mlxsw_emad_trans_finish()

Each EMAD transaction stores the skb used to issue the EMAD request
('trans->tx_skb') so that the request could be retried in case of a
timeout. The skb can be freed when a corresponding response is received
or as part of the retry logic (e.g., failed retransmit, exceeded maximum
number of retries).

The two tasks (i.e., response processing and retransmits) are
synchronized by the atomic 'trans->active' field which ensures that
responses to inactive transactions are ignored.

In case of a failed retransmit the transaction is finished and all of
its resources are freed. However, the current code does not mark it as
inactive. Syzkaller was able to hit a race condition in which a
concurrent response is processed while the transaction's resources are
being freed, resulting in a use-after-free [1].

Fix the issue by making sure to mark the transaction as inactive after a
failed retransmit and free its resources only if a concurrent task did
not already do that.

[1]
BUG: KASAN: use-after-free in consume_skb+0x30/0x370
net/core/skbuff.c:833
Read of size 4 at addr ffff88804f570494 by task syz-executor.0/1004

CPU: 0 PID: 1004 Comm: syz-executor.0 Not tainted 5.8.0-rc7+ #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xf6/0x16e lib/dump_stack.c:118
print_address_description.constprop.0+0x1c/0x250
mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
check_memory_region_inline mm/kasan/generic.c:186 [inline]
check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192
instrument_atomic_read include/linux/instrumented.h:56 [inline]
atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
refcount_read include/linux/refcount.h:147 [inline]
skb_unref include/linux/skbuff.h:1044 [inline]
consume_skb+0x30/0x370 net/core/skbuff.c:833
mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
mlxsw_emad_process_response drivers/net/ethernet/mellanox/mlxsw/core.c:651 [inline]
mlxsw_emad_rx_listener_func+0x5c9/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:672
mlxsw_core_skb_receive+0x4df/0x770 drivers/net/ethernet/mellanox/mlxsw/core.c:2063
mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline]
mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651
tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550
__do_softirq+0x223/0x964 kernel/softirq.c:292
asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711

Allocated by task 1006:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:494 [inline]
__kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
slab_post_alloc_hook mm/slab.h:586 [inline]
slab_alloc_node mm/slub.c:2824 [inline]
slab_alloc mm/slub.c:2832 [inline]
kmem_cache_alloc+0xcd/0x2e0 mm/slub.c:2837
__build_skb+0x21/0x60 net/core/skbuff.c:311
__netdev_alloc_skb+0x1e2/0x360 net/core/skbuff.c:464
netdev_alloc_skb include/linux/skbuff.h:2810 [inline]
mlxsw_emad_alloc drivers/net/ethernet/mellanox/mlxsw/core.c:756 [inline]
mlxsw_emad_reg_access drivers/net/ethernet/mellanox/mlxsw/core.c:787 [inline]
mlxsw_core_reg_access_emad+0x1ab/0x1420 drivers/net/ethernet/mellanox/mlxsw/core.c:1817
mlxsw_reg_trans_query+0x39/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:1831
mlxsw_sp_sb_pm_occ_clear drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:260 [inline]
mlxsw_sp_sb_occ_max_clear+0xbff/0x10a0 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c:1365
mlxsw_devlink_sb_occ_max_clear+0x76/0xb0 drivers/net/ethernet/mellanox/mlxsw/core.c:1037
devlink_nl_cmd_sb_occ_max_clear_doit+0x1ec/0x280 net/core/devlink.c:1765
genl_family_rcv_msg_doit net/netlink/genetlink.c:669 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:714 [inline]
genl_rcv_msg+0x617/0x980 net/netlink/genetlink.c:731
netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2470
genl_rcv+0x24/0x40 net/netlink/genetlink.c:742
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg+0x150/0x190 net/socket.c:671
____sys_sendmsg+0x6d8/0x840 net/socket.c:2359
___sys_sendmsg+0xff/0x170 net/socket.c:2413
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2446
do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 73:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
kasan_set_free_info mm/kasan/common.c:316 [inline]
__kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
slab_free_hook mm/slub.c:1474 [inline]
slab_free_freelist_hook mm/slub.c:1507 [inline]
slab_free mm/slub.c:3072 [inline]
kmem_cache_free+0xbe/0x380 mm/slub.c:3088
kfree_skbmem net/core/skbuff.c:622 [inline]
kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:616
__kfree_skb net/core/skbuff.c:679 [inline]
consume_skb net/core/skbuff.c:837 [inline]
consume_skb+0xe1/0x370 net/core/skbuff.c:831
mlxsw_emad_trans_finish+0x64/0x1c0 drivers/net/ethernet/mellanox/mlxsw/core.c:592
mlxsw_emad_transmit_retry.isra.0+0x9d/0xc0 drivers/net/ethernet/mellanox/mlxsw/core.c:613
mlxsw_emad_trans_timeout_work+0x43/0x50 drivers/net/ethernet/mellanox/mlxsw/core.c:625
process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
kthread+0x355/0x470 kernel/kthread.c:291
ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293

The buggy address belongs to the object at ffff88804f5703c0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 212 bytes inside of
224-byte region [ffff88804f5703c0, ffff88804f5704a0)
The buggy address belongs to the page:
page:ffffea00013d5c00 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0
flags: 0x100000000000200(slab)
raw: 0100000000000200 dead000000000100 dead000000000122 ffff88806c625400
raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88804f570380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff88804f570400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88804f570480: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88804f570500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88804f570580: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc

Fixes: caf7297e7ab5f ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# adc80b6c 24-Oct-2020 Ido Schimmel <idosch@nvidia.com>

mlxsw: core: Fix memory leak on module removal

Free the devlink instance during the teardown sequence in the non-reload
case to avoid the following memory leak.

unreferenced object 0xffff888232895000 (size 2048):
comm "modprobe", pid 1073, jiffies 4295568857 (age 164.871s)
hex dump (first 32 bytes):
00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........".......
10 50 89 32 82 88 ff ff 10 50 89 32 82 88 ff ff .P.2.....P.2....
backtrace:
[<00000000c704e9a6>] __kmalloc+0x13a/0x2a0
[<00000000ee30129d>] devlink_alloc+0xff/0x760
[<0000000092ab3e5d>] 0xffffffffa042e5b0
[<000000004f3f8a31>] 0xffffffffa042f6ad
[<0000000092800b4b>] 0xffffffffa0491df3
[<00000000c4843903>] local_pci_probe+0xcb/0x170
[<000000006993ded7>] pci_device_probe+0x2c2/0x4e0
[<00000000a8e0de75>] really_probe+0x2c5/0xf90
[<00000000d42ba75d>] driver_probe_device+0x1eb/0x340
[<00000000bcc95e05>] device_driver_attach+0x294/0x300
[<000000000e2bc177>] __driver_attach+0x167/0x2f0
[<000000007d44cd6e>] bus_for_each_dev+0x148/0x1f0
[<000000003cd5a91e>] driver_attach+0x45/0x60
[<000000000041ce51>] bus_add_driver+0x3b8/0x720
[<00000000f5215476>] driver_register+0x230/0x4e0
[<00000000d79356f5>] __pci_register_driver+0x190/0x200

Fixes: a22712a96291 ("mlxsw: core: Fix devlink unregister flow")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Vadim Pasternak <vadimp@nvidia.com>
Tested-by: Oleksandr Shamray <oleksandrs@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# dc64cc7c 07-Oct-2020 Moshe Shemesh <moshe@mellanox.com>

devlink: Add devlink reload limit option

Add reload limit to demand restrictions on reload actions.
Reload limits supported:
no_reset: No reset allowed, no down time allowed, no link flap and no
configuration is lost.

By default reload limit is unspecified and so no constraints on reload
actions are required.

Some combinations of action and limit are invalid. For example, driver
can not reinitialize its entities without any downtime.

The no_reset reload limit will have usecase in this patchset to
implement restricted fw_activate on mlx5.

Have the uapi parameter of reload limit ready for future support of
multiselection.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ccdf0721 07-Oct-2020 Moshe Shemesh <moshe@mellanox.com>

devlink: Add reload action option to devlink reload command

Add devlink reload action to allow the user to request a specific reload
action. The action parameter is optional, if not specified then devlink
driver re-init action is used (backward compatible).
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.
Reload actions supported are:
driver_reinit: driver entities re-initialization, applying devlink-param
and devlink-resource values.
fw_activate: firmware activate.

command examples:
$devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
driver_reinit

$devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
driver_reinit fw_activate

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 943585c9 27-Sep-2020 Amit Cohen <amcohen@nvidia.com>

mlxsw: Update transceiver_overheat counter according to MTWE

MTWE (Management Temperature Warning Event) is triggered when module's
temperature is higher than its threshold.

Register for MTWE events and increase the module's overheat counter when
its corresponding sensor goes above the configured threshold.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0652ac07 27-Sep-2020 Amit Cohen <amcohen@nvidia.com>

mlxsw: core: Add an infrastructure to track transceiver overheat counter

Initialize an array that stores per-module overheat state and a counter
indicating how many times the module was in overheat state.

Export a function to query the counter according to module number.
Will be used later on by the switch driver (i.e., mlxsw_spectrum) to expose
module's overheat counter as part of ethtool statistics.

Initialize mlxsw_env after driver initialization to be able to query
number of modules from MGPIR register.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bc75c054 25-Sep-2020 Jacob Keller <jacob.e.keller@intel.com>

devlink: convert flash_update to use params structure

The devlink core recently gained support for checking whether the driver
supports a flash_update parameter, via `supported_flash_update_params`.
However, parameters are specified as function arguments. Adding a new
parameter still requires modifying the signature of the .flash_update
callback in all drivers.

Convert the .flash_update function to take a new `struct
devlink_flash_update_params` instead. By using this structure, and the
`supported_flash_update_params` bit field, a new parameter to
flash_update can be added without requiring modification to existing
drivers.

As before, all parameters except file_name will require driver opt-in.
Because file_name is a necessary field to for the flash_update to make
sense, no "SUPPORTED" bitflag is provided and it is always considered
valid. All future additional parameters will require a new bit in the
supported_flash_update_params bitfield.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Bin Luo <luobin9@huawei.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 22ec3d23 25-Sep-2020 Jacob Keller <jacob.e.keller@intel.com>

devlink: check flash_update parameter support in net core

When implementing .flash_update, drivers which do not support
per-component update are manually checking the component parameter to
verify that it is NULL. Without this check, the driver might accept an
update request with a component specified even though it will not honor
such a request.

Instead of having each driver check this, move the logic into
net/core/devlink.c, and use a new `supported_flash_update_params` field
in the devlink_ops. Drivers which will support per-component update must
now specify this by setting DEVLINK_SUPPORT_FLASH_UPDATE_COMPONENT in
the supported_flash_update_params in their devlink_ops.

This helps ensure that drivers do not forget to check for a NULL
component if they do not support per-component update. This also enables
a slightly better error message by enabling the core stack to set the
netlink bad attribute message to indicate precisely the unsupported
attribute in the message.

Going forward, any new additional parameter to flash update will require
a bit in the supported_flash_update_params bitfield.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Bin Luo <luobin9@huawei.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Danielle Ratson <danieller@mellanox.com>
Cc: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7d83ee11 15-Sep-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: core: Introduce fw_fatal health reporter

Introduce devlink health reporter to report FW fatal events. Implement
the event listener using MFDE trap and enable the events to be
propagated using MFGD register configuration.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 703db0ce 15-Sep-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: Move fw_load_policy devlink param into core.c

As the fw flashing code was moved to core.c, move the param which is
related to it there as well. Remove unnecessary parentheses on the way.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1fb0a495 15-Sep-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: core: Push code doing params register/unregister into separate helpers

Extract the code calling params register/unregister driver ops into
separate functions. Call publish/unpublish unconditionally.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b79cb787 15-Sep-2020 Jiri Pirko <jiri@nvidia.com>

mlxsw: Move fw flashing code into core.c

As the firmware flashing is not specific to Spectrum, move the code to
core.c and avoid one op call and 2 exported symbols. Also, this allows
to do flash before call of driver->init function and possibly do other
core calls in between.

Do some small renaming here and there on the way to be consistent with
the rest of core.c code.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# c88e11e0 03-Aug-2020 Ido Schimmel <idosch@mellanox.com>

devlink: Pass extack when setting trap's action and group's parameters

A later patch will refuse to set the action of certain traps in mlxsw
and also to change the policer binding of certain groups. Pass extack so
that failure could be communicated clearly to user space.

Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3c8ce24b 28-Jul-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Free EMAD transactions using kfree_rcu()

The lifetime of EMAD transactions (i.e., 'struct mlxsw_reg_trans') is
managed using RCU. They are freed using kfree_rcu() once the transaction
ends.

However, in case the transaction failed it is freed immediately after being
removed from the active transactions list. This is problematic because it is
still possible for a different CPU to dereference the transaction from an RCU
read-side critical section while traversing the active transaction list in
mlxsw_emad_rx_listener_func(). In which case, a use-after-free is triggered
[1].

Fix this by freeing the transaction after a grace period by calling
kfree_rcu().

[1]
BUG: KASAN: use-after-free in mlxsw_emad_rx_listener_func+0x969/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:671
Read of size 8 at addr ffff88800b7964e8 by task syz-executor.2/2881

CPU: 0 PID: 2881 Comm: syz-executor.2 Not tainted 5.8.0-rc4+ #44
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xf6/0x16e lib/dump_stack.c:118
print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
mlxsw_emad_rx_listener_func+0x969/0xac0 drivers/net/ethernet/mellanox/mlxsw/core.c:671
mlxsw_core_skb_receive+0x571/0x700 drivers/net/ethernet/mellanox/mlxsw/core.c:2061
mlxsw_pci_cqe_rdq_handle drivers/net/ethernet/mellanox/mlxsw/pci.c:595 [inline]
mlxsw_pci_cq_tasklet+0x12a6/0x2520 drivers/net/ethernet/mellanox/mlxsw/pci.c:651
tasklet_action_common.isra.0+0x13f/0x3e0 kernel/softirq.c:550
__do_softirq+0x223/0x964 kernel/softirq.c:292
asm_call_on_stack+0x12/0x20 arch/x86/entry/entry_64.S:711
</IRQ>
__run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
do_softirq_own_stack+0x109/0x140 arch/x86/kernel/irq_64.c:77
invoke_softirq kernel/softirq.c:387 [inline]
__irq_exit_rcu kernel/softirq.c:417 [inline]
irq_exit_rcu+0x16f/0x1a0 kernel/softirq.c:429
sysvec_apic_timer_interrupt+0x4e/0xd0 arch/x86/kernel/apic/apic.c:1091
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/irqflags.h:85 [inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x3b/0x40 kernel/locking/spinlock.c:191
Code: e8 2a c3 f4 fc 48 89 ef e8 12 96 f5 fc f6 c7 02 75 11 53 9d e8 d6 db 11 fd 65 ff 0d 1f 21 b3 56 5b 5d c3 e8 a7 d7 11 fd 53 9d <eb> ed 0f 1f 00 55 48 89 fd 65 ff 05 05 21 b3 56 ff 74 24 08 48 8d
RSP: 0018:ffff8880446ffd80 EFLAGS: 00000286
RAX: 0000000000000006 RBX: 0000000000000286 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa94ecea9
RBP: ffff888012934408 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: fffffbfff57be301 R12: 1ffff110088dffc1
R13: ffff888037b817c0 R14: ffff88802442415a R15: ffff888024424000
__do_sys_perf_event_open+0x1b5d/0x2bd0 kernel/events/core.c:11874
do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:384
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x473dbd
Code: Bad RIP value.
RSP: 002b:00007f21e5e9cc28 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
RAX: ffffffffffffffda RBX: 000000000057bf00 RCX: 0000000000473dbd
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000040
RBP: 000000000057bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000003 R11: 0000000000000246 R12: 000000000057bf0c
R13: 00007ffd0493503f R14: 00000000004d0f46 R15: 00007f21e5e9cd80

Allocated by task 871:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc mm/kasan/common.c:494 [inline]
__kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:467
kmalloc include/linux/slab.h:555 [inline]
kzalloc include/linux/slab.h:669 [inline]
mlxsw_core_reg_access_emad+0x70/0x1410 drivers/net/ethernet/mellanox/mlxsw/core.c:1812
mlxsw_core_reg_access+0xeb/0x540 drivers/net/ethernet/mellanox/mlxsw/core.c:1991
mlxsw_sp_port_get_hw_xstats+0x335/0x7e0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1130
update_stats_cache+0xf4/0x140 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1173
process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
kthread+0x355/0x470 kernel/kthread.c:291
ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293

Freed by task 871:
save_stack+0x1b/0x40 mm/kasan/common.c:48
set_track mm/kasan/common.c:56 [inline]
kasan_set_free_info mm/kasan/common.c:316 [inline]
__kasan_slab_free+0x12c/0x170 mm/kasan/common.c:455
slab_free_hook mm/slub.c:1474 [inline]
slab_free_freelist_hook mm/slub.c:1507 [inline]
slab_free mm/slub.c:3072 [inline]
kfree+0xe6/0x320 mm/slub.c:4052
mlxsw_core_reg_access_emad+0xd45/0x1410 drivers/net/ethernet/mellanox/mlxsw/core.c:1819
mlxsw_core_reg_access+0xeb/0x540 drivers/net/ethernet/mellanox/mlxsw/core.c:1991
mlxsw_sp_port_get_hw_xstats+0x335/0x7e0 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1130
update_stats_cache+0xf4/0x140 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1173
process_one_work+0xa3e/0x17a0 kernel/workqueue.c:2269
worker_thread+0x9e/0x1050 kernel/workqueue.c:2415
kthread+0x355/0x470 kernel/kthread.c:291
ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:293

The buggy address belongs to the object at ffff88800b796400
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 232 bytes inside of
512-byte region [ffff88800b796400, ffff88800b796600)
The buggy address belongs to the page:
page:ffffea00002de500 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffea00002de500 order:2 compound_mapcount:0 compound_pincount:0
flags: 0x100000000010200(slab|head)
raw: 0100000000010200 dead000000000100 dead000000000122 ffff88806c402500
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88800b796380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88800b796400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88800b796480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88800b796500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88800b796580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: caf7297e7ab5 ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7d8e8f34 28-Jul-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Increase scope of RCU read-side critical section

The lifetime of the Rx listener item ('rxl_item') is managed using RCU,
but is dereferenced outside of RCU read-side critical section, which can
lead to a use-after-free.

Fix this by increasing the scope of the RCU read-side critical section.

Fixes: 93c1edb27f9e ("mlxsw: Introduce Mellanox switch driver core")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5dbaeb87 20-Jul-2020 Liu Jian <liujian56@huawei.com>

mlxsw: destroy workqueue when trap_register in mlxsw_emad_init

When mlxsw_core_trap_register fails in mlxsw_emad_init,
destroy_workqueue() shouled be called to destroy mlxsw_core->emad_wq.

Fixes: d965465b60ba ("mlxsw: core: Fix possible deadlock")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6a8c101e 14-Jul-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Use mirror reason during Rx listener lookup

The Rx listener abstraction allows the switch driver (e.g.,
mlxsw_spectrum) to register a function that is called when a packet is
received (trapped) for a specific reason.

Up until now, the Rx listener lookup was solely based on the trap
identifier. However, when a packet is mirrored to the CPU the trap
identifier merely indicates that the packet was mirrored, but not why it
was mirrored. This makes it impossible for the switch driver to register
different Rx listeners for different mirror reasons.

Solve this by allowing the switch driver to register a Rx listener with
a mirror reason and by extending the Rx listener lookup to take the
mirror reason into account.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a0f49b54 09-Jul-2020 Danielle Ratson <danieller@mellanox.com>

devlink: Add a new devlink port split ability attribute and pass to netlink

Add a new attribute that indicates the split ability of devlink port.

Drivers are expected to set it via devlink_port_attrs_set(), before
registering the port.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1b604efb 09-Jul-2020 Danielle Ratson <danieller@mellanox.com>

mlxsw: Set port split ability attribute in driver

Currently, port attributes like flavour, port number and whether the port
was split are set when initializing a port.

Set the split ability of the port as well, based on port_mapping->width
field and split attribute of devlink port in spectrum, so that it could be
easily passed to devlink in the next patch.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a21cf0a8 09-Jul-2020 Danielle Ratson <danieller@mellanox.com>

devlink: Add a new devlink port lanes attribute and pass to netlink

Add a new devlink port attribute that indicates the port's number of lanes.

Drivers are expected to set it via devlink_port_attrs_set(), before
registering the port.

The attribute is not passed to user space in case the number of lanes is
invalid (0).

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 622d3e92 09-Jul-2020 Danielle Ratson <danieller@mellanox.com>

mlxsw: Set number of port lanes attribute in driver

Currently, port attributes like flavour, port number and whether the
port was split are set when initializing a port.

Set the number of lanes of the port as well so that it could be easily
passed to devlink in the next patch.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 71ad8d55 09-Jul-2020 Danielle Ratson <danieller@mellanox.com>

devlink: Replace devlink_port_attrs_set parameters with a struct

Currently, devlink_port_attrs_set accepts a long list of parameters,
that most of them are devlink port's attributes.

Use the devlink_port_attrs struct to replace the relevant parameters.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 39defcbb 30-Mar-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_trap: Add support for setting of packet trap group parameters

Implement support for setting of packet trap group parameters by
invoking the trap_group_init() callback with the new parameters.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 13f2e64b 30-Mar-2020 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_trap: Add devlink-trap policer support

Register supported packet trap policers with devlink and implement
callbacks to change their parameters and read their counters.

Prevent user space from passing invalid policer parameters down to the
device by checking their validity and communicating the failure via an
appropriate extack message.

v2:
* Remove the max/min validity checks from __mlxsw_sp_trap_policer_set()

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ec4a514a 27-Feb-2020 Jiri Pirko <jiri@mellanox.com>

mlxsw: reg: Update module_type values in PMTM register and map them to width

There are couple new values that PMTM register can return
in module_type field. Add them and map them to module width in
mlxsw_core_module_max_width(). Fix the existing names on the way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dbd1ddad 24-Feb-2020 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Extend MLXSW_RXL_DIS to register disabled trap group

Extend the mlxsw_listener struct to contain trap group for disabled
traps too. Rename the original "trap_group" item to "en_trap_group" as
it represents enabled state. Let both groups be the same for MLXSW_RXL
however extend MLXSW_RXL_DIS to register separate groups for enable and
disable.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c83da292 24-Feb-2020 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Allow to enable/disable rx_listener for trap

For source traps, the "thin policer" is going to be used in order
to reduce the amount of trapped packets to minimum. However, there
will be still small number of packets coming in that need to be dropped
in the driver. Allow to enable/disable rx_listener related to specific
trap in order to prevent unwanted packets to go up the stack.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4a23d45a 24-Feb-2020 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum_trap: Prepare mlxsw_core_trap_action_set() to handle not only action

Rename function mlxsw_core_trap_action_set() to
mlxsw_core_trap_state_set() and pass bool enabled instead of action.
Figure out the action according to the enabled state there.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 76d4067f 24-Feb-2020 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Allow to register disabled traps using MLXSW_RXL_DIS

Introduce a new macro MLXSW_RXL_DIS that allows to register listeners
as disabled. That allows that from now on, the "action" can be
understood always as "enabled action" and "unreg_action" as "disabled
action". Rename them and treat them accordingly.

Use the new macro for defining drops in spectrum_trap.c.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d356b3e8 23-Feb-2020 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Remove priv from listener equality comparison

During packet receive, only the first matching RX listener
in rx_listener_list is going to get the packet. So there is no
meaning in registering two equal listeners with different privs.
Remove priv from equality comparison and disable possibility
of doing it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8ec80a8b 23-Feb-2020 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Remove dummy union name from struct mlxsw_listener

The dummy name for union is no longer needed, remove it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e99f8e7f 18-Feb-2020 Gustavo A. R. Silva <gustavo@embeddedor.com>

mlxsw: Replace zero-length array with flexible-array member

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5d716ab4 11-Nov-2019 Shalom Toledo <shalomt@mellanox.com>

mlxsw: core: Add support for using EMAD string TLV

In case the firmware had an error while processing EMADs, it can send back
an ASCII string with the reason using EMAD string TLV.

This patch adds the support for using EMAD string TLV. In case of an error,
reports the reason using devlink hwerr tracepoint.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 72c8f428 11-Nov-2019 Shalom Toledo <shalomt@mellanox.com>

mlxsw: core: Extend EMAD information reported to devlink hwerr

Extend EMAD information reported to devlink hwerr tracepoint with
transaction id and reg id (both, hex and string).

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2aa4aa20 11-Nov-2019 Shalom Toledo <shalomt@mellanox.com>

mlxsw: core: Add support for EMAD string TLV parsing

During parsing of incoming EMADs, fill the string TLV's offset when it is
used.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 664b3dd9 11-Nov-2019 Shalom Toledo <shalomt@mellanox.com>

mlxsw: core: Add EMAD string TLV

Add EMAD string TLV, an ASCII string the driver can receive from the
firmware in case of an error.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5aa4165c 11-Nov-2019 Shalom Toledo <shalomt@mellanox.com>

mlxsw: core: Parse TLVs' offsets of incoming EMADs

Until now the code assumes a fixed structure which makes it difficult to
support EMADs with and without new TLVs.

Make it more generic by parsing the TLVs when the EMADs are received and
store the offset to the different TLVs in the control block. Using these
offsets to extract information from the EMADs without relying on a specific
structure.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5b67a3ed 10-Nov-2019 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Enable devlink reload only on probe

Call devlink enable only during probe time and avoid deadlock
during reload.

Reported-by: Shalom Toledo <shalomt@mellanox.com>
Fixes: a0c76345e3d3 ("devlink: disallow reload operation during device cleanup")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 73a533ec 10-Nov-2019 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Enable devlink reload only on probe

Call devlink enable only during probe time and avoid deadlock
during reload.

Reported-by: Shalom Toledo <shalomt@mellanox.com>
Fixes: 5a508a254bed ("devlink: disallow reload operation during device cleanup")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a508a25 09-Nov-2019 Jiri Pirko <jiri@mellanox.com>

devlink: disallow reload operation during device cleanup

There is a race between driver code that does setup/cleanup of device
and devlink reload operation that in some drivers works with the same
code. Use after free could we easily obtained by running:

while true; do
echo "0000:00:10.0" >/sys/bus/pci/drivers/mlxsw_spectrum2/bind
devlink dev reload pci/0000:00:10.0 &
echo "0000:00:10.0" >/sys/bus/pci/drivers/mlxsw_spectrum2/unbind
done

Fix this by enabling reload only after setup of device is complete and
disabling it at the beginning of the cleanup process.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Fixes: 2d8dc5bbf4e7 ("devlink: Add support for reload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a0c76345 08-Nov-2019 Jiri Pirko <jiri@mellanox.com>

devlink: disallow reload operation during device cleanup

There is a race between driver code that does setup/cleanup of device
and devlink reload operation that in some drivers works with the same
code. Use after free could we easily obtained by running:

while true; do
echo 10 > /sys/bus/netdevsim/new_device
devlink dev reload netdevsim/netdevsim10 &
echo 10 > /sys/bus/netdevsim/del_device
done

Fix this by enabling reload only after setup of device is complete and
disabling it at the beginning of the cleanup process.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Fixes: 2d8dc5bbf4e7 ("devlink: Add support for reload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 25911e1b 31-Oct-2019 Jiri Pirko <jiri@mellanox.com>

mlxsw: spectrum: Use PMTM register to get max module width

Currently the max module width is hard-coded according to ASIC type.
That is not entirely correct, as the max module width might differ
per-board. Use PMTM register to query FW for maximal width of a module.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b7265a0d 30-Oct-2019 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Unpublish devlink parameters during reload

The devlink parameter "acl_region_rehash_interval" is a runtime
parameter whose value is stored in a dynamically allocated memory. While
reloading the driver, this memory is freed and then allocated again. A
use-after-free might happen if during this time frame someone tries to
retrieve its value.

Since commit 070c63f20f6c ("net: devlink: allow to change namespaces
during reload") the use-after-free can be reliably triggered when
reloading the driver into a namespace, as after freeing the memory (via
reload_down() callback) all the parameters are notified.

Fix this by unpublishing and then re-publishing the parameters during
reload.

Fixes: 98bbf70c1c41 ("mlxsw: spectrum: add "acl_region_rehash_interval" devlink param")
Fixes: 7c62cfb8c574 ("devlink: publish params only after driver init is done")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 762effaa 06-Oct-2019 Vadim Pasternak <vadimp@mellanox.com>

mlxsw: core: Push minor/subminor fw version check into helper

Add new API for FW "minor" and "subminor" version validation for
sharing it between "spectrum" and "minimal" drivers.
Use it in "spectrum" driver.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 070c63f2 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

net: devlink: allow to change namespaces during reload

All devlink instances are created in init_net and stay there for a
lifetime. Allow user to be able to move devlink instances into
namespaces during devlink reload operation. That ensures proper
re-instantiation of driver objects, including netdevices.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5bcfb6a4 03-Oct-2019 Jiri Pirko <jiri@mellanox.com>

mlxsw: Propagate extack down to register_fib_notifier()

During the devlink reaload the extack is present, so propagate it all
the way down to register_fib_notifier() call in spectrum_router.c.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 28b1987e 16-Sep-2019 Shalom Toledo <shalomt@mellanox.com>

mlxsw: spectrum: Register CPU port with devlink

Register CPU port with devlink.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2670ac26 12-Sep-2019 Jiri Pirko <jiri@mellanox.com>

net: devlink: move reload fail indication to devlink core and expose to user

Currently the fact that devlink reload failed is stored in drivers.
Move this flag into devlink core. Also, expose it to the user.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 97691069 12-Sep-2019 Jiri Pirko <jiri@mellanox.com>

net: devlink: split reload op into two

In order to properly implement failure indication during reload,
split the reload op into two ops, one for down phase and one for
up phase.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b5ce611f 21-Aug-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum: Add devlink-trap support

Register supported packet traps (layer 2 drops only, currently) and
associated trap group with devlink during driver initialization.

The amount of traffic generated by these packet drop traps is capped at
10Kpps to ensure the CPU is not overwhelmed by incoming packets.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b7bf0270 21-Aug-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Add API to set trap action

Up until now the action of a trap was never changed during its lifetime.
This is going to change by subsequent patches that will allow devlink to
control the action of certain traps.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0714256c 30-Jun-2019 Petr Machata <petrm@mellanox.com>

mlxsw: pci: PTP: Hook into packet transmit path

On Spectrum-1, timestamps are delivered separately from the packets, and
need to paired up. Therefore, at some point after mlxsw_sp_port_xmit()
is invoked, it is necessary to involve the chip-specific driver code to
allow it to do the necessary bookkeeping and matching.

On Spectrum-2, timestamps are delivered in CQE. For that reason,
position the point of driver involvement into mlxsw_pci_cqe_sdq_handle()
to make it hopefully easier to extend for Spectrum-2 in the future.

To tell the driver what port the packet was sent on, keep tx_info
in SKB control buffer.

Introduce a new driver core interface mlxsw_core_ptp_transmitted(), a
driver callback ptp_transmitted, and a PTP op transmitted. The callee is
responsible for taking care of releasing the SKB passed to the new
interfaces, and correspondingly have the new stub callbacks just call
dev_kfree_skb_any().

Follow-up patches will introduce the actual content into
mlxsw_sp1_ptp_transmitted() in particular.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 34dacb4d 11-Jun-2019 Shalom Toledo <shalomt@mellanox.com>

mlxsw: core: Add a new interface for reading the hardware free running clock

Add two new bus operations for reading the hardware free running clock.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a9d204a6 04-Jun-2019 Jiri Pirko <jiri@mellanox.com>

mlxsw: Move firmware flash implementation to devlink

Benefit from the devlink flash update implementation and ethtool
fallback to it and move firmware flash implementation there.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 961cf99a 29-May-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Re-order initialization sequence

The driver core first registers with the hwmon and thermal subsystems
and only then proceeds to initialize the switch driver (e.g.,
mlxsw_spectrum). It is only during the last stage that the current
firmware version is validated and a newer one flashed, if necessary.

The above means that if a new firmware feature is utilized by the
hwmon/thermal code, the driver will not be able to load.

Solve this by re-ordering initializing the switch driver before
registering with the hwmon and thermal subsystems.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c52ecff7 18-May-2019 Vadim Pasternak <vadimp@mellanox.com>

mlxsw: core: Prevent QSFP module initialization for old hardware

Old Mellanox silicons, like switchx-2, switch-ib do not support reading
QSFP modules temperature through MTMP register. Attempt to access this
register on systems equipped with the this kind of silicon will cause
initialization flow failure.
Test for hardware resource capability is added in order to distinct
between old and new silicon - old silicons do not have such capability.

Fixes: 6a79507cfe94 ("mlxsw: core: Extend thermal module with per QSFP module thermal zones")
Fixes: 5c42eaa07bd0 ("mlxsw: core: Extend hwmon interface with QSFP module temperature attributes")
Reported-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f686206 21-Apr-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: spectrum_buffers: Add extack messages for invalid configurations

Add extack messages to better communicate invalid configuration to the
user.

Example:

# devlink sb pool set pci/0000:01:00.0 pool 0 size 104857600 thtype dynamic
Error: mlxsw_spectrum: Exceeded shared buffer size.
devlink answers: Invalid argument

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f2ad1a52 21-Apr-2019 Ido Schimmel <idosch@mellanox.com>

net: devlink: Add extack to shared buffer operations

Add extack to shared buffer set operations, so that meaningful error
messages could be propagated to the user.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b442fed1 10-Apr-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw workqueue

The workqueue is used to periodically update the networking stack about
activity / statistics of various objects such as neighbours and TC
actions.

It should not be called as part of memory reclaim path, so remove the
WQ_MEM_RECLAIM flag.

Fixes: 3d5479e92087 ("mlxsw: core: Remove deprecated create_workqueue")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4af06997 10-Apr-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw ordered workqueue

The ordered workqueue is used to offload various objects such as routes
and neighbours in the order they are notified.

It should not be called as part of memory reclaim path, so remove the
WQ_MEM_RECLAIM flag. This can also result in a warning [1], if a worker
tries to flush a non-WQ_MEM_RECLAIM workqueue.

[1]
[97703.542861] workqueue: WQ_MEM_RECLAIM mlxsw_core_ordered:mlxsw_sp_router_fib6_event_work [mlxsw_spectrum] is flushing !WQ_MEM_RECLAIM events:rht_deferred_worker
[97703.542884] WARNING: CPU: 1 PID: 32492 at kernel/workqueue.c:2605 check_flush_dependency+0xb5/0x130
...
[97703.542988] Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
[97703.543049] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib6_event_work [mlxsw_spectrum]
[97703.543061] RIP: 0010:check_flush_dependency+0xb5/0x130
...
[97703.543071] RSP: 0018:ffffb3f08137bc00 EFLAGS: 00010086
[97703.543076] RAX: 0000000000000000 RBX: ffff96e07740ae00 RCX: 0000000000000000
[97703.543080] RDX: 0000000000000094 RSI: ffffffff82dc1934 RDI: 0000000000000046
[97703.543084] RBP: ffffb3f08137bc20 R08: ffffffff82dc18a0 R09: 00000000000225c0
[97703.543087] R10: 0000000000000000 R11: 0000000000007eec R12: ffffffff816e4ee0
[97703.543091] R13: ffff96e06f6a5c00 R14: ffff96e077ba7700 R15: ffffffff812ab0c0
[97703.543097] FS: 0000000000000000(0000) GS:ffff96e077a80000(0000) knlGS:0000000000000000
[97703.543101] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[97703.543104] CR2: 00007f8cd135b280 CR3: 00000001e860e003 CR4: 00000000003606e0
[97703.543109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[97703.543112] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[97703.543115] Call Trace:
[97703.543129] __flush_work+0xbd/0x1e0
[97703.543137] ? __cancel_work_timer+0x136/0x1b0
[97703.543145] ? pwq_dec_nr_in_flight+0x49/0xa0
[97703.543154] __cancel_work_timer+0x136/0x1b0
[97703.543175] ? mlxsw_reg_trans_bulk_wait+0x145/0x400 [mlxsw_core]
[97703.543184] cancel_work_sync+0x10/0x20
[97703.543191] rhashtable_free_and_destroy+0x23/0x140
[97703.543198] rhashtable_destroy+0xd/0x10
[97703.543254] mlxsw_sp_fib_destroy+0xb1/0xf0 [mlxsw_spectrum]
[97703.543310] mlxsw_sp_vr_put+0xa8/0xc0 [mlxsw_spectrum]
[97703.543364] mlxsw_sp_fib_node_put+0xbf/0x140 [mlxsw_spectrum]
[97703.543418] ? mlxsw_sp_fib6_entry_destroy+0xe8/0x110 [mlxsw_spectrum]
[97703.543475] mlxsw_sp_router_fib6_event_work+0x6cd/0x7f0 [mlxsw_spectrum]
[97703.543484] process_one_work+0x1fd/0x400
[97703.543493] worker_thread+0x34/0x410
[97703.543500] kthread+0x121/0x140
[97703.543507] ? process_one_work+0x400/0x400
[97703.543512] ? kthread_park+0x90/0x90
[97703.543523] ret_from_fork+0x35/0x40

Fixes: a3832b31898f ("mlxsw: core: Create an ordered workqueue for FIB offload")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Semion Lisyansky <semionl@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a8c133b0 10-Apr-2019 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Do not use WQ_MEM_RECLAIM for EMAD workqueue

The EMAD workqueue is used to handle retransmission of EMAD packets that
contain configuration data for the device's firmware.

Given the workers need to allocate these packets and that the code is
not called as part of memory reclaim path, remove the WQ_MEM_RECLAIM
flag.

Fixes: d965465b60ba ("mlxsw: core: Fix possible deadlock")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a9c8336f 08-Apr-2019 Shalom Toledo <shalomt@mellanox.com>

mlxsw: core: Add support for devlink info command

Expose the following ASIC information via devlink info command:
- Driver name
- Hardware revision
- Firmware PSID
- Running firmware version

Standard output example:
$ devlink dev info pci/0000:03:00.0
pci/0000:03:00.0:
driver mlxsw_spectrum
versions:
fixed:
hw.revision A0
fw.psid MT_2750110033
running:
fw.version 13.1910.622

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cdf29f4a 03-Apr-2019 Jiri Pirko <jiri@mellanox.com>

mlxsw: Pass switch ID through devlink_port_attrs_set()

Pass the switch ID down the to devlink through devlink_port_attrs_set()
so it can be used by devlink_compat_switch_id_get().

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bec5267c 03-Apr-2019 Jiri Pirko <jiri@mellanox.com>

net: devlink: extend port attrs for switch ID

Extend devlink_port_attrs_set() to pass switch ID for ports which are
part of switch and store it in port attrs. For other ports, this is
NULL.

Note that this allows the driver to group devlink ports into one or more
switches according to the actual topology.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 59a6b35a 28-Mar-2019 Jiri Pirko <jiri@mellanox.com>

mlxsw: Remove ndo_get_phys_port_name implementation

Rely on the previously introduced fallback and let the core call
devlink directly in order to get the physical port name.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 011d3256 28-Mar-2019 Jiri Pirko <jiri@mellanox.com>

mlxsw: Implement ndo_get_devlink_port

In order for devlink compat functions to work, implement
ndo_get_devlink_port.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e519418f 24-Mar-2019 Jiri Pirko <jiri@mellanox.com>

mlxsw: Move devlink_port_attrs_set() call before register

Since attrs are static during the existence of devlink port, set the
before registration of the port.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e5ba7803 03-Mar-2019 Vadim Pasternak <vadimp@mellanox.com>

mlxsw: core: Move resource query API to common location

Move mlxsw_pci_resources_query() to a common location to allow reuse by
the different drivers and over all the supported physical buses. Rename
it to mlxsw_core_resources_query().

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7c62cfb8 07-Feb-2019 Jiri Pirko <jiri@mellanox.com>

devlink: publish params only after driver init is done

Currently, user can do dump or get of param values right after the
devlink params are registered. However the driver may not be initialized
which is an issue. The same problem happens during notification
upon param registration. Allow driver to publish devlink params
whenever it is ready to handle get() ops. Note that this cannot
be resolved by init reordering, as the "driverinit" params have
to be available before the driver is initialized (it needs the param
values there).

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d32d02a5 04-Feb-2019 Nir Dotan <nird@mellanox.com>

mlxsw: core: Trace EMAD errors

Trace EMAD errors returned from HW.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cf0b70e7 18-Dec-2018 Shalom Toledo <shalomt@mellanox.com>

mlxsw: core: Increase timeout during firmware flash process

During the firmware flash process, some of the EMADs get timed out, which
causes the driver to send them again with a limit of 5 retries. There are
some situations in which 5 retries is not enough and the EMAD access fails.
If the failed EMAD was related to the flashing process, the driver fails
the flashing.

The reason for these timeouts during firmware flashing is cache misses in
the CPU running the firmware. In case the CPU needs to fetch instructions
from the flash when a firmware is flashed, it needs to wait for the
flashing to complete. Since flashing takes time, it is possible for pending
EMADs to timeout.

Fix by increasing EMADs' timeout while flashing firmware.

Fixes: ce6ef68f433f ("mlxsw: spectrum: Implement the ethtool flash_device callback")
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 064501c5 03-Dec-2018 Shalom Toledo <shalomt@mellanox.com>

mlxsw: spectrum: Load firmware version based on devlink parameter

Load firmware version based on 'fw_load_policy' devlink parameter. The
driver supports these two options:
* DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DRIVER (0)
Default, load firmware version preferred by the driver
* DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH (1)
Load firmware currently stored in flash

The second option, 'flash', allow the device to run with different firmware
version than preferred by the driver for testing and/or debugging purposes.
For example, testing a firmware bug fix.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 03bffcad 03-Dec-2018 Shalom Toledo <shalomt@mellanox.com>

mlxsw: core: Reset firmware after flash during driver initialization

After flashing new firmware during the driver initialization flow (reload
or not), the driver should do a firmware reset when it gets -EAGAIN in
order to load the new one.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a22712a9 29-Oct-2018 Shalom Toledo <shalomt@mellanox.com>

mlxsw: core: Fix devlink unregister flow

After a failed reload, the driver is still registered to devlink, its
devlink instance is still allocated and the 'reload_fail' flag is set.
Then, in the next reload try, the driver's allocated devlink instance will
be freed without unregistering from devlink and its components (e.g,
resources). This scenario can cause a use-after-free if the user tries to
execute command via devlink user-space tool.

Fix by not freeing the devlink instance during reload (failed or not).

Fixes: 24cc68ad6c46 ("mlxsw: core: Add support for reload")
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9b3bc7db 17-Oct-2018 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Fix use-after-free when flashing firmware during init

When the switch driver (e.g., mlxsw_spectrum) determines it needs to
flash a new firmware version it resets the ASIC after the flashing
process. The bus driver (e.g., mlxsw_pci) then registers itself again
with mlxsw_core which means (among other things) that the device
registers itself again with the hwmon subsystem again.

Since the device was registered with the hwmon subsystem using
devm_hwmon_device_register_with_groups(), then the old hwmon device
(registered before the flashing) was never unregistered and was
referencing stale data, resulting in a use-after free.

Fix by removing reliance on device managed APIs in mlxsw_hwmon_init().

Fixes: c86d62cc410c ("mlxsw: spectrum: Reset FW after flash")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4f23c43c 09-Aug-2018 YueHaibing <yuehaibing@huawei.com>

mlxsw: core: remove unnecessary function mlxsw_core_driver_put

The function mlxsw_core_driver_put only traverse mlxsw_core_driver_list
to find the matched mlxsw_driver,but never used it.
So it can be removed safely.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9948a064 09-Aug-2018 Jiri Pirko <jiri@mellanox.com>

mlxsw: Replace license text with SPDX identifiers and adjust copyrights

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3fcc773b 05-Jun-2018 David Ahern <dsahern@gmail.com>

mlxsw: Add extack messages for port_{un, }split failures

Return messages in extack for port split/unsplit errors. e.g.,
$ devlink port split swp1s1 count 4
Error: mlxsw_spectrum: Port cannot be split further.
devlink answers: Invalid argument

$ devlink port unsplit swp4
Error: mlxsw_spectrum: Port was not split.
devlink answers: Invalid argument

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ac0fc8a1 05-Jun-2018 David Ahern <dsahern@gmail.com>

devlink: Add extack to reload and port_{un, }split operations

Add extack argument to reload, port_split and port_unsplit operations.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f3a52c61 27-May-2018 Jiri Pirko <jiri@mellanox.com>

mlxsw: pci: Utilize MRSR register to perform FW reset

So far, the PCI BAR0 register is used for triggering FW reset. However,
that is a legacy attitude and it is recommended to use MRSR to perform
reset instead. So do that. Move the reset into init() function as
the cmd interface needs to be used. With that, IRQ initialization needs
to be moved as well. As a side effect, the reset move simplifies
the devlink reload flow.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2a360bf0 27-May-2018 Jiri Pirko <jiri@mellanox.com>

mlxsw: cmd: Handle error after reset gracefully

There is an exception in command interface processing in case the MRSR
register is written to. The register triggers FW reset and during the
reset FW returns an error. So handle this by ignoring this error while
writing to MRSR register.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ec932fbd 18-May-2018 Jiri Pirko <jiri@mellanox.com>

mlxsw: use devlink helper to generate physical port name

Since devlink knows the info needed to generate the physical port name
in a generic way for all devlink users, use the helper to do the job.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5ec1380a 18-May-2018 Jiri Pirko <jiri@mellanox.com>

devlink: extend attrs_set for setting port flavours

Devlink ports can have specific flavour according to the purpose of use.
This patch extend attrs_set so the driver can say which flavour port
has. Initial flavours are:
physical, cpu, dsa
User can query this to see right away what is the purpose of each port.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b9ffcbaf 18-May-2018 Jiri Pirko <jiri@mellanox.com>

devlink: introduce devlink_port_attrs_set

Change existing setter for split port information into more generic
attrs setter. Alongside with that, allow to set port number and subport
number for split ports.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8ccc1131 10-May-2018 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()'

Resources are not freed in the reverse order of the allocation.
Labels are also mixed-up.

Fix it and reorder code and labels in the error handling path of
'mlxsw_core_bus_device_register()'

Fixes: ef3116e5403e ("mlxsw: spectrum: Register KVD resources with devlink")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ad3f20b2 01-Apr-2018 Jiri Pirko <jiri@mellanox.com>

mlxsw: Move "resources_query_enable" out of mlxsw_config_profile

As struct mlxsw_config_profile is mapped to the payload of the FW
command of the same name, resources_query_enable flag does not belong
there. Move it to struct mlxsw_driver.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 24cc68ad 15-Jan-2018 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: core: Add support for reload

Add support for hot reload. First, all the driver/core resources are
released but the PCI and devlink instances, then reset is performed
through the PCI interface. Finally the driver performs initialization.

In case of reload failure the driver is left in a partially initialized
state. Special care is taken during the driver removal in order to
properly handle this state.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e21d21ca 15-Jan-2018 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: pci: Add support for getting resource through devlink

Up until now the KVD partition was static. This patch introduces the
ability to get the resource sizes via devlink. In case the resource is not
available the default configuration is used.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ef3116e5 15-Jan-2018 Arkadi Sharshevsky <arkadis@mellanox.com>

mlxsw: spectrum: Register KVD resources with devlink

Register the KVD resources with devlink. The KVD is a memory resource
which is subdivided into three partitions which are the linear, hash
single and hash double.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d965465b 16-Oct-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Fix possible deadlock

When an EMAD is transmitted, a timeout work item is scheduled with a
delay of 200ms, so that another EMAD will be retried until a maximum of
five retries.

In certain situations, it's possible for the function waiting on the
EMAD to be associated with a work item that is queued on the same
workqueue (`mlxsw_core`) as the timeout work item. This results in
flushing a work item on the same workqueue.

According to commit e159489baa71 ("workqueue: relax lockdep annotation
on flush_work()") the above may lead to a deadlock in case the workqueue
has only one worker active or if the system in under memory pressure and
the rescue worker is in use. The latter explains the very rare and
random nature of the lockdep splats we have been seeing:

[ 52.730240] ============================================
[ 52.736179] WARNING: possible recursive locking detected
[ 52.742119] 4.14.0-rc3jiri+ #4 Not tainted
[ 52.746697] --------------------------------------------
[ 52.752635] kworker/1:3/599 is trying to acquire lock:
[ 52.758378] (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c4fa4>] flush_work+0x3a4/0x5e0
[ 52.767837]
but task is already holding lock:
[ 52.774360] (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
[ 52.784495]
other info that might help us debug this:
[ 52.791794] Possible unsafe locking scenario:
[ 52.798413] CPU0
[ 52.801144] ----
[ 52.803875] lock(mlxsw_core_driver_name);
[ 52.808556] lock(mlxsw_core_driver_name);
[ 52.813236]
*** DEADLOCK ***
[ 52.819857] May be due to missing lock nesting notation
[ 52.827450] 3 locks held by kworker/1:3/599:
[ 52.832221] #0: (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
[ 52.842846] #1: ((&(&bridge->fdb_notify.dw)->work)){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
[ 52.854537] #2: (rtnl_mutex){+.+.}, at: [<ffffffff822ad8e7>] rtnl_lock+0x17/0x20
[ 52.863021]
stack backtrace:
[ 52.867890] CPU: 1 PID: 599 Comm: kworker/1:3 Not tainted 4.14.0-rc3jiri+ #4
[ 52.875773] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
[ 52.886267] Workqueue: mlxsw_core mlxsw_sp_fdb_notify_work [mlxsw_spectrum]
[ 52.894060] Call Trace:
[ 52.909122] __lock_acquire+0xf6f/0x2a10
[ 53.025412] lock_acquire+0x158/0x440
[ 53.047557] flush_work+0x3c4/0x5e0
[ 53.087571] __cancel_work_timer+0x3ca/0x5e0
[ 53.177051] cancel_delayed_work_sync+0x13/0x20
[ 53.182142] mlxsw_reg_trans_bulk_wait+0x12d/0x7a0 [mlxsw_core]
[ 53.194571] mlxsw_core_reg_access+0x586/0x990 [mlxsw_core]
[ 53.225365] mlxsw_reg_query+0x10/0x20 [mlxsw_core]
[ 53.230882] mlxsw_sp_fdb_notify_work+0x2a3/0x9d0 [mlxsw_spectrum]
[ 53.237801] process_one_work+0x8f1/0x12f0
[ 53.321804] worker_thread+0x1fd/0x10c0
[ 53.435158] kthread+0x28e/0x370
[ 53.448703] ret_from_fork+0x2a/0x40
[ 53.453017] mlxsw_spectrum 0000:01:00.0: EMAD retries (2/5) (tid=bf4549b100000774)
[ 53.453119] mlxsw_spectrum 0000:01:00.0: EMAD retries (5/5) (tid=bf4549b100000770)
[ 53.453132] mlxsw_spectrum 0000:01:00.0: EMAD reg access failed (tid=bf4549b100000770,reg_id=200b(sfn),type=query,status=0(operation performed))
[ 53.453143] mlxsw_spectrum 0000:01:00.0: Failed to get FDB notifications

Fix this by creating another workqueue for EMAD timeouts, thereby
preventing the situation of a work item trying to flush a work item
queued on the same workqueue.

Fixes: caf7297e7ab5f ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9820355f 02-Aug-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Use correct EMAD transaction ID in debug message

'trans->tid' is only assigned later in the function, resulting in a zero
transaction ID. Use 'tid' instead.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5ec2ee7d 24-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: Query maximum number of ports from firmware

We currently hard code the maximum number of ports in the driver, but
this may change in future devices, so query it from the firmware
instead.

Fallback to a maximum of 64 ports in case this number can't be queried.
This should only happen in SwitchX-2 for which this number is correct.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9a32562b 23-Mar-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: Remove debugfs interface

We don't use it during development and we can't extend it either, so
remove it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a0e4761d 06-Feb-2017 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Queue work immediately instead of delaying it

We always use zero delay before queueing a work on the ordered workqueue
('mlxsw_owq'), so use work_struct directly instead of delayable work.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a3832b31 03-Dec-2016 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Create an ordered workqueue for FIB offload

We're going to start processing FIB entries addition / deletion events
in deferred work. These work items must be processed in the order they
were submitted or otherwise we can have differences between the kernel's
FIB table and the device's.

Solve this by creating an ordered workqueue to which these work items
will be submitted to. Note that we can't simply convert the current
workqueue to be ordered, as EMADs re-transmissions are also processed in
deferred work.

Later on, we can migrate other work items to this workqueue, such as FDB
notification processing and nexthop resolution, since they all take the
same lock anyway.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 523779c7 28-Nov-2016 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Change order of operations in removal path

We call bus->init() before allocating 'lag.mapping'. Change the order of
operations in removal path to reflect that.

This makes the error path of mlxsw_core_bus_device_register() symmetric
with mlxsw_core_bus_device_unregister().

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 81d4d728 28-Nov-2016 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Add missing rollback in error path

Without this rollback, the thermal zone is still registered during the
error path, whereas its private data is freed upon the destruction of
the underlying bus device due to the use of devm_kzalloc(). This results
in use after free.

Fix this by calling mlxsw_thermal_fini() from the appropriate place in
the error path.

Fixes: a50c1e35650b ("mlxsw: core: Implement thermal zone")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9d87fcea 25-Nov-2016 Nogah Frankel <nogahf@mellanox.com>

mlxsw: core: Change emad trap group settings

Currently, the emad trap init was done in the core. In the future we will
want to add some changes to the traps groups, according to device type.
This commit create a driver function to create the trap group for the
emad, so later it can be changed by devices. It also changes the emad
registration to use the new generic functions.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d570b7ee 25-Nov-2016 Nogah Frankel <nogahf@mellanox.com>

mlxsw: Change trap set function

Change trap setting function so instead of determining the trap group by
trap id, it gets it as a parameter (so later we can have different trap
groups for Spectrum and Switchx2).
Add "is_ctrl" parameter to the trap setting function. It control whether
the trapped packets wait in a designated control buffer or in their
default one. This parameter is ignored by Switchx2 and Switchib.
Add these parameters to the traps array in Spectrum, Switchx2 and
Switchib.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0791051c 25-Nov-2016 Nogah Frankel <nogahf@mellanox.com>

mlxsw: core: Create a generic function to register / unregister traps

We have 2 types of HW traps to handle, rx traps and events.
The registration workflow for both is very similar. So it only make
sense to create one function to handle both.

This patch creates a struct to hold the data for both cases. It also
creates a registration and an un-registration functions that get this
generic struct as input.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a50c1e35 22-Nov-2016 Ivan Vecera <cera@cera.cz>

mlxsw: core: Implement thermal zone

Implement thermal zone for mlxsw based HW. It uses temperature sensor
provided by ASIC (the same as mlxsw hwmon interface) to report current
temp to thermal core. The ASIC's PWM is then used to control speed
of system fans registered as cooling devices.

Signed-off-by: Ivan Vecera <cera@cera.cz>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4ebd00bc 16-Nov-2016 Vadim Pasternak <vadimp@mellanox.com>

mlxsw: Invoke driver's init/fini methods only if defined

We are going to add a minimal driver on top of the mlxsw core
infrastructure, which will be mainly used for hardware monitoring in
Baseboard management controller (BMC) installations.

Unlike the switch drivers (e.g., spectrum, switchx2), this driver does not
initialize the ASIC and therefore doesn't need to implement the init() and
fini() methods in its 'mlxsw_driver' struct.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c711e27a 16-Nov-2016 Vadim Pasternak <vadimp@mellanox.com>

mlxsw: Add bus capability flag

The mlxsw core infrastructure currently assumes that communication with
the ASIC is always possible using Ethernet management datagrams (EMADs),
but this is only possible when the PCI bus is used.

The bus capability flag is added to indicate EMAD support and make core
initialize EMAD communication only when it's set. Otherwise, register
access is done using command interface.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0c81ea5d 28-Oct-2016 Elad Raz <eladr@mellanox.com>

mlxsw: core: Add port type (Eth/IB) set API

Add "port_type_set" API to mlxsw core. The core layer send the change type
callback to the port along with it's private information.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d808c7e4 28-Oct-2016 Elad Raz <eladr@mellanox.com>

mlxsw: core: Add "eth" prefix to mlxsw_core_port_set

Since we are about to introduce IB port APIs, we will add prefixes to
existing APIs.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 67963a33 28-Oct-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: Make devlink port instances independent of spectrum/switchx2 port instances

Currently, devlink register/unregister is done directly from
spectrum/switchx2 port create/remove functions. With a need to
introduce a port type change, the devlink port instances have to be
persistent across type changes, therefore across port create/remove
function calls. So do a bit of reshuffling to achieve that.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1d20d23c 27-Oct-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: Move PCI id table definitions into driver modules

So far, mlxsw_pci.ko is the module that registers PCI table for all
drivers (spectrum and switchx2). That is problematic for example with
dracut. Since mlxsw_spectrum.ko and mlxsw_switchx2.ko are loaded
dynamically from within mlxsw_core.ko, dracut does not have track of
them and avoids them from being included in initramfs.

So make this in an ordinary way and define the PCI tables in individual
driver modules, so it can be properly loaded and included in dracut
initramfs image. As a side effect, this patch could remove no longer
necessary driver "kind" strings which were used to link PCI ids with
individual mlxsw drivers.

Suggested-by: Ivan Vecera <ivecera@redhat.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c1a38311 21-Oct-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: Convert resources into array

Since the number of resources is going to get much bigger, ease up the
addition by simly defining IDs. Convert the existing structure members
to a set array, one for validity, one for values. Introduce a set of
getters and setters for easy access.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ce0bd2b0 20-Sep-2016 Nogah Frankel <nogahf@mellanox.com>

mlxsw: spectrum: lag resources- use resources data instead of consts

Use max lag and max ports in lag resources as the result of resource query
instead of using const to save them.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 57d316ba 20-Jul-2016 Nogah Frankel <nogahf@mellanox.com>

mlxsw: pci: Add resources query implementation.

Add resources query implementation. If exists, query the HW for its
builtin resources instead of having them as consts in the code.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b38a75d2 12-Jul-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Trace EMAD messages

Trace EMAD messages going down to HW and up from HW. Devlink needs to be
registered before EMAD init so the trace function can be called
with valid devlink handle.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
v1->v2:
- Use trace_devlink_hwmsg directly
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3d5479e9 07-Jun-2016 Bhaktipriya Shridhar <bhaktipriya96@gmail.com>

mlxsw: core: Remove deprecated create_workqueue

alloc_workqueue replaces deprecated create_workqueue().

A dedicated workqueue has been used since the workqueue
mlxsw_wq is used for FDB notif. processing with workitems that are
involved in normal device operation && because it's a network device
which can be depended upon during memory reclaim.

Workitems &trans->timeout_dw and &mlxsw_sp->fdb_notify.dw,
map to mlxsw_sp_fdb_notify_work (processes FDB notifications from the
underlying device and resolves the netdev to which the entry points to
and notifies the bridge using the switchdev notifier) and
mlxsw_emad_trans_timeout_work (provides async EMAD register access)
respectively. They require forward progress under memory pressure and
hence, WQ_MEM_RECLAIM has been set.

Since there are only a fixed number of work items, explicit concurrency
limit is unnecessary here.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# caf7297e 14-Apr-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Introduce support for asynchronous EMAD register access

So far it was possible to have one EMAD register access at a time,
locked by mutex. This patch extends this interface to allow multiple
EMAD register accesses to be in fly at once. That allows faster
processing on firmware side avoiding unused time in between EMADs.
Measured speedup is ~30% for shared occupancy snapshot operation.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dd9bdb04 14-Apr-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Add mlxsw specific workqueue and use it for FDB notif. processing

Follow-up patch is going to need to use delayed work as well and
frequently. The FDB notification processing is already using that and
also quite frequently. It makes sense to create separate workqueue just
for mlxsw driver in this case and do not pollute system_wq.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1ceecc88 14-Apr-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Add devlink shared buffer occupancy callbacks

Add middle layer in mlxsw core code to forward shared buffer occupancy
calls into specific ASIC drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a6179bf0 14-Apr-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Add devlink shared buffer callbacks

Add middle layer in mlxsw core code to forward shared buffer calls
into specific ASIC drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b2f10571 08-Apr-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: Do not pass around driver_priv directly

Instead of that, pass mlxsw_core and use a helper to get driver priv
from driver code. Looks much cleaner that way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 307c2431 08-Apr-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: Pass mlxsw_core as a param of mlxsw_core_skb_transmit*

Instead of passing around driver priv, pass struct mlxsw_core *
directly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 932762b6 08-Apr-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: Move devlink port registration into common core code

Remove devlink port reg/unreg from spectrum and switchx2 code and rather
do the common work in core. That also ensures code separation where
devlink is only used in core.c.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 284ef803 26-Feb-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Add devlink port splitter callbacks

Add middle layer in mlxsw core code to forward port split/unsplit calls
into specific ASIC drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c4745500 26-Feb-2016 Jiri Pirko <jiri@mellanox.com>

mlxsw: Implement devlink interface

Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f4cee3af 22-Dec-2015 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Use devm_kzalloc to allocate mlxsw_hwmon structure

KASan reported use-after-free for the hwmon structure. So fix this by
using devm_kzalloc and let the core take care about freeing the memory
during device dettach.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Fixes: 89309da39 ("mlxsw: core: Implement temperature hwmon interface")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8060646a 02-Dec-2015 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Add support for packets received from LAG port

Lower layer (pci) has information if the packet is received via LAG port.
If that is the case, it fills up rx_info accordingly. However upper
layer does not care about lag_id/port_index for received packets so
convert it to local_port before passing it up. For that conversion, lag
mapping array is introduced. Upper layer is responsible for setting up
the mapping according to what is set in HW.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 89309da3 27-Nov-2015 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Implement temperature hwmon interface

ASIC provides access to temperature sensors. Implement their exposure to
userspace using hwmon.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ef743fdd 28-Oct-2015 Or Gerlitz <ogerlitz@mellanox.com>

mlxsw: Put constant on the right side of comparisons

Fixes those places where checkpatch complains that comparisons
should place the constant on the right side of the test.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f24af330 15-Oct-2015 Ido Schimmel <idosch@mellanox.com>

mlxsw: Simplify traps creation

The Host Trap Group Table (HTGT) register configures trap groups, which
are populated with trap IDs using the Host PacKet Trap (HPKT) register.
However, a trap ID can only be present inside one trap group (the last
configured).

Instead of passing both the trap group and ID for the function that
packs HPKT, pass only the trap ID and derive from it the trap group.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 18ea5445 15-Oct-2015 Jiri Pirko <jiri@mellanox.com>

mlxsw: core: Do not use EMADs in mlxsw_emad_fini

Be symmetric with mlxsw_emad_init and don't use EMADs in mlxsw_emad_fini
cleanup function. Use command interface instead.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 53ca376e 15-Oct-2015 Ido Schimmel <idosch@mellanox.com>

mlxsw: core: Fix race condition in __mlxsw_emad_transmit

Under certain conditions EMAD responses can be returned from the device
even before setting trans_active. This will cause the EMAD Rx listener
to drop the EMAD response - as there are no active transactions - and
timeouts will be generated.

Fix this by setting trans_active before transmitting the EMAD skb.

Fixes: 4ec14b7634b2 ("mlxsw: Add interface to access registers and process events")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 262df691 27-Aug-2015 Jiri Pirko <jiri@mellanox.com>

mlxsw: adjust transmit fail log message level in __mlxsw_emad_transmit

When transmit fails, it is an error, not a warning.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a585dabd 27-Aug-2015 Ido Schimmel <idosch@mellanox.com>

mlxsw: Remove duplicate included header

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3bfcd347 06-Aug-2015 Ido Schimmel <idosch@mellanox.com>

mlxsw: Use correct skb length when dumping payload

Do not use the length of the transmitted skb (which was freed), but
that of the response skb.

This issue was discovered using the Kernel Address sanitizer (KASan).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d003462a 06-Aug-2015 Ido Schimmel <idosch@mellanox.com>

mlxsw: Simplify mlxsw_sx_port_xmit function

Previously we only checked if the transmission queue is not full in the
middle of the xmit function. This lead to complex logic due to the fact
that sometimes we need to reallocate the headroom for our Tx header.

Allow the switch driver to know if the transmission queue is not full
before sending the packet and remove this complex logic.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4ec14b76 29-Jul-2015 Ido Schimmel <idosch@mellanox.com>

mlxsw: Add interface to access registers and process events

Ethernet Management Datagrams (EMADs) are Ethernet packets sent between
the host and the device in order to configure the available device registers.
Another use case is notifications sent from the device to the host,
letting it know about certain events, such as port up / down.

Add the ability to construct EMADs with provisions to construct and
parse the registers' payloads. Implement EMAD transaction layer
which is responsible for the reliable transmission of EMADs. Also, add
an infrastructure used by the switch driver to register for particular
events generated by the device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 93c1edb2 29-Jul-2015 Jiri Pirko <jiri@mellanox.com>

mlxsw: Introduce Mellanox switch driver core

Add core components of Mellanox switch driver infrastructure.
Core infrastructure is designed so that it can be used by multiple
bus drivers (PCI now, I2C and SGMII are planned to be implemented
in the future). Multiple switch kind drivers can be registered as well.
This core serves as a glue between buses and drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>